This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
People and Computers XIX The Bigger Picture Proceedings of HCI2005
Springer
Tom McEwan, MSc, PgCert, MBCS, CITP, Ceng, ILTM, SEDA-accredited teacher in HE School of Computing, Napier University, Edinburgh, UK Jan Gulliksen, MSc, PhD Department of Information Technology/HCI, Uppsala University, Uppsala, Sweden David Benyon, BSc, MSc, PhD School of Computing, Napier University, Edinburgh, UK
Typeset by WinderJ^
British Library Cataloguing in Publication Data A catalogue record for this book is availablefromthe British Library ISBN-10; 1-84628-192-X ISBN-13:978-1-84628-192-1
Printed in the United Kingdom by Athenaeum Press Ltd., Gateshead 987654321 Springer Science+Business Media springeronline.com
Contents Preface: The Bigger Picture
ix
H — HCI at the Human Scale
1
"Looking At the Computer but Doing It On Land": Children's Interactions in a Tangible Programming Space Ylva Fernaeus & Jakob Tholander
3
The Usability of Digital Ink Technologies for Children and Teenagers Janet C Read
19
PROTEUS: Artefact-driven Constructionist Assessment within Tablet PC-based Low-fidelity Prototyping Dean Mohamedally, Panayiotis Zaphiris & Helen Petrie
37
The Reader Creates a Personal Meaning: A Comparative Study of Scenarios and Human-centred Stories Georg Str0m
53
What Difference Do Guidelines Make? An Observational Study of Online-questionnaire Design Guidelines Put to Practical Use Jo Lumsden, Scott Flinn, Michelle Anderson & Wendy Morgan
69
Designing Interactive Systems in Context: From Prototype to Deployment Tim Clerckx, Kris Luyten <SL Karin Coninx
85
Using Context Awareness to Enhance Visitor Engagement in a Gallery Space Peter Lonsdale, Russell Beale & Will Byrne
101
Engagement with an Interactive Museum Exhibit Naomi Haywood & Paul Cairns
113
vi
Contents
User Needs in e-Government: Conducting Policy Analysis with Models-on-the-Web Barbara Mirel, Mary Maker & Jina Huh
131
Fit for Purpose Evaluation: The Case of a Public Information Kiosk for the Socially Disadvantaged B L William Wong, Suzette Keith & Mark Springett
149
A Visuo-Biometric Authentication Mechanism for Older Users Karen Renaud
167
C — HCI in the Greater Cultural Context
183
A Computer Science HCI Course Beryl Plimmer
185
Use and Usefulness of HCI Methods: Results from an Exploratory Study among Nordic HCI Practitioners Ida Bark, Asbj0rn F0lstad & Jan Gulliksen
201
Building Usability in India: Reflections from the Indo-European Systems Usability Partnership Andy Smith, Jan Gulliksen & LicLm Bannon
219
Visualizing the Evolution of HCI Chaomei Chen, Gulshan Panjwani, Jason Proctor, Kenneth Allendoerfer, Jasna Kuljis, Serge Aluker, David Sturtz & Mirjana Vukovic
233
"I thought it was terrible and everyone else loved it" — A New Perspective for Effective Recommender System Design Philip Bonhard &. M Angela Sasse
251
Rich Media, Poor Judgement? A Study of Media Effects on Users' Trust in Expertise Jens Riegelsberger, M Angela Sasse & John D McCarthy
267
Cultural Representations in Web Design: Differences in Emotions and Values Claire Dormann
285
Interaction Design for Countries with a Traditional Culture: A Comparative Study of Income Levels and Cultural Values Georg Str0m
301
Contents
vii
Researching Culture and Usability — A Conceptual Model of Usability Gabrielle Ford & Paula Kotze
317
I — HCI Down at the Interface
335
Distinguishing Vibrotactile Effects with Tactile Mouse and Trackball Jukka Raisamo, Roope Raisamo & Katri Kosonen
337
HyperGrid — Accessing Complex Information Spaces Hans-Christian letter, Jens Gerken, Werner Konig, Christian Griin & Harald Reiterer
349
Mixed Interaction Space — Expanding the Interaction Space with Mobile Devices Thomas Riisgaard Hansen, Eva Eriksson & Andreas Lykke-Olesen
365
Static/Animated Diagrams and their Effect on Students Perceptions of Conceptual Understanding in Computer Aided Learning (CAL) Environments Ruqiyabi Naz Awan & Brett Stevens
381
Media Co-authoring Practices in Responsive Physical Environments Carlo Jacucci, Helen Pain &. John Lee
391
Cognitive Model Working Alongside the User Ion Juvina & Herre van Oostendorp
409
Revisiting Web Design Guidelines by Exploring Users' Expectations, Preferences and Visual Search Behaviour Ekaterini Tzanidou, Shailey Minocha, Marian Petre & Andrew Grayson
421
Comparing Automatic and Manual Zooming Methods for Acquiring Off-screen Targets Joshua Savage & Andy Cockburn
439
Forward and Backward Speech Skimming with the Elastic Audio Slider Wolfgang Hiirst, Tobias Lauer, Cedric BUrfent & Georg Gotz
455
Design Patterns for Auditory Displays C Frauenberger, T Stockman, V Putz &. R Holdrich
473
via
Contents
Closing Keynote of HCI2005: The Bigger Picture
489
Grand Challenges in HCI: the Quest for Theory-led Design Alistair Sutcliffe
491
Author Index
507
Keyword Index
509
Preface: The Bigger Picture
Human-Computer Interaction was once a narrowly focused discipline — the study of the interaction between human and computer — one of a new breed of multi-disciplines with its roots in ergonomics, cognitive psychology and so on. Noone thinks like that now. At the very least, the discipline concerns the interaction of humans through computers, with the technology put in its rightful place: a mediating artefact, between human and human, and between human and information, and between systems of activities undertaken by groups of humans in their cultural context. In this preface we summarize the content and structure of this volume, in the context of the conference and our keynote speakers. We present here the bigger pictilre of HCI, a communal self-portrait of a multidiscipline that is now "all grown up" and making its way in the world. The relative youth and identity of HCI is a recurring theme. Mayes [1991] describes a discipline now past infancy, Shneiderman [2003] wonders whether HCI was child, adolescent or adult, Preece et al. [2002] sees Interaction Design as a discipline beyond HCI. It's twenty years since the first British HCI conference at the University of East Anglia. This is the 19th People and Computers volume. With two joint conferences (INTERACT'90 and INTERACT'99) this is our 21st birthday. We believe that that HCI has the "keys of door", is graduating from college andfindinga role in industry and society. It makes all sorts of partnerships, some ill-advised and temporary (though perhaps exciting and memorable), some enduring (but perhaps a little less exciting). This volume is snapshot of the best of current HCI. This is no longer just British HCI: the majority of the accepted papers are from overseas, with thirteen other countries represented here. (The Nordic nations are particularly present, a consequence of a conscious decision to involve the NordiCHI community in running this conference. This has been a delightful partnership — and many of its natives see Scotland as a Nordic nation anyway!). Herein are thefinestof over three hundred submissions for HCI2005: The Bigger Picture. This is substantially higher than in recent years, and left your editors and the programme committee with a considerable dilenMna. Ninety-two 'full-paper' submissions were each subject to an average of four reviews by carefully matched experts from the formidable list of reviewers on Page xv. We are delighted to thank reviewers publicly for their huge contribution. The reviews were of a very high standard, often running to several pages of thoughtful appraisal and almost all were completed within a tight deadline. The reviews were then meta-reviewed
X
Preface: The Bigger Picture
by members of the programme committee, who prepared a detailed analysis for discussion. Conferences cannot run without an army of unpaid reviewers and committee members, and it is they that you, and we, must thank for this volume. Collectively our reviewers rated sixty-two full papers worthy of publication and the programme committee had a hard time cutting these back to the thirty we have space for here. We'd also like to thank unsuccessful submitters and offer them every encouragement in their work. Past editors have wrestled with the challenge of straitjacketing papers into themes, and usually by the time of the conference we find little correlation between these themes and the conference session structure. In our call for papers, we mentioned a need to play around with the initials of HCI and so (very loosely) partition this volume into three sections — an 'H', a ' C and an T , reflecting three levels of focus and three tracks running through the conference plan. Of course, most of these papers could validly appear in any of the three tracks, so we'd like you to view this structure in the context of the life's work of our eminent and legendary keynote — Ted Nelson, of the Oxford Internet Institute (who makes the final presentation of the opening day of the conference): this volume is a nonlinear narrative best enjoyed in a hypermedia form (hence the accompanying CDROM and the post-it notes for your own annotation). Production schedules mean that our keynotes' papers generally appear in Volume 2 of the proceedings, this year published in http://eWiC.bcs.org, the emerging digital library of the British Computing Society, and Ted's own paper will appear there. We start, as we should, with the 'H', with the human aspects and actions at the human scale. Femaeus & Tholander (Sweden) discuss collaborative design using tangible interaction for children, while Read (UK) continues the child-centred theme, exposing usability flaws in digital ink for Tablet PCs, while identifying new opportunities for this emerging technology. Mohamedally et al. (UK) also report on the use of Tablet PCs, as well as the need for developers to use technology to mediate users' needs, describing tools that both permit lo-fi prototyping and allow designers to elicit knowledge from this process. This theme of listening continues with Str0m (Denmark), in the first of two contributions, who compares two other ways for software developers to listen to users' voices: stories and scenarios. We need smarter ways to engage with users and capture information efficiently for future use by developers and designers. At a more formal level of listening, Lumsden et al. (Canada) present a much-needed guide to using online questionnaires. Of course, increasingly, users don't want to, or can't, articulate their needs, and design for ambient intelligence is a recurring theme in the conference this year. In this vein, Clerckx et al. (Belgium) take a step towards defining an integrated design environment for context-sensitive user interfaces, while Lonsdale et al. (UK) also look at awareness of location in a museum gallery space. Haywood & Cairns (UK) look at interaction in museums too, focusing on engagement and learning for children. Similarly, Mirel et al. (USA) help us understand complex, hard-to-elicit, needs, in this case of experts and how they use online models to carry out knowledge work and advise and create policy in e-govemment. Wong et al. (UK) also address our need for a much deeper understanding of usability in the public sector with a case study on the fitness for purpose of a public information kiosk for those most at risk
Preface: The Bigger Picture in society. Some of these issues resurface in Renaud's (UK) study of visuo-biometric authentication for older users (which hopefully all of us eventually become). Centring on the human, means focusing on one part of our bigger picture at a time. Shifting focus and attention, and zooming out and considering the whole display are amongst the research areas in which the conference opening keynote, Dr Mary Czerwinski of Microsoft has a formidable track record. There is no-one more appropriate to launch our theme of the Bigger Picture, and her paper will also appear in eWiC. Our ' C might stand for canvas, composition or context, but perhaps culture is a more encompassing theme. The challenge for systems designers is to create solutions that continue to work across cultures. We can learn much about this by considering HCI's own various cultures and the different between theory and practice. From the other end of the earth, Plimmer (New Zealand) has a timely reflection on HCI's place amongst other disciplines, and in particular within a small country, and in the preparation of learners for practice. Bark et al. (Norway/Sweden) identify the techniques that Nordic HCI practitioners actually use, and how useful they find each. Smith et al. (UK/Sweden/Ireland) continue the global flavour, and reflect on the evolutionary state of HCI in India and the partnerships that foster development. Chen et al. (USA) literally track HCI's own evolution and relationships within itself, with a citation analysis of a selection of HCI channels. Social network analysis is also intrinsic to Bonhard & Sasse (UK) with an HCI approach to the design of recommender systems, while Riegelsberger et al. (UK) continue this search for expertise, examining the relative richness of different interaction media and how this affects the degree of trust in advisers' expertise. Three papers, linked by the theme of cultural dimensions, complete this section. Emotion and values are central to Dormann's (Canada) analysis of Web design, and she detects the position along Hofstede's MAS dimension of homepages in different countries. Str0m's (Denmark) second contribution compares interaction design decisions made in a low-income traditional country and in a high-income developed one, and identifies how to take different cultures' views of privacy and honesty into account. Ford & Kotze (South Africa) concludes this section by finding limitations in cultural dimensions and identifies additional variables to take into account. "What does the T stand for anyway?" was a (very) early morning question from Dan Diaper at HCI2004, and a stimulus to our hermeneutic approach to the letters H, C and I. Certainly the answer includes Industry Day, the central day of the conference. At the time of writing we are just appearing on the operational horizons of senior industrialists, so cannot name our industrial keynotes here, but the strong formal industrial representation at British HCI conferences is an enduring and effective part of our tradition. Here, T represents our home territory: interface, interactivity, interaction - aspects which other information technologists defer to us. Perhaps we take this for granted and forget that interface is where we often have the opportunity to hook stakeholders and keep their attention. It's the pixel level of our Bigger Picture. Every interface component has subtle shades of differentiation from previous elements, but the choice of the correct one provides the subtlety and shade required in analysis.
xi
xii
Preface: The Bigger Picture
We start our rational disassembly of the senses (apologies to Rimbaud), with the haptic, and Raisamo et al. (Finland) contrasting detection thresholds for mouse and trackball, depending on the variation in either frequency or magnitude of fedback vibration, finding mouse and magnitude to be the most effective combination. We zoom in on the big picture with letter et al. (Germany) who extend existing table visualizations by introducing HyperGrid. The navigation of interaction space continues with Hansen et al. (Denmark) and MIXIS, turning a mobile phone with camera into a 3D navigation device. We then look at the interface's effect on the user: Awan & Stevens (UK) contrast the effects of static and animated diagrams in learners accurately assessing their acquired knowledge; Jacucci et al. (UK) find, in children's use of a tangible interface in video authoring, opportunities to exploit constraints to achieve creative outcomes. Juvina & Van Oostendorp (Netherlands) contrast the visual and auditory modalities for navigation support and find gender differences. Tzanidou et al. (UK), delve deeper into the visual in an analysis of web navigation and what this should mean for web design and e-commerce. Savage & Cockburn (New Zealand) report improved performance and reduced subjective workload with speed-dependent automatic zooming, while Hurst et al.'s (Germany) elastic audio slider provides intelligible audio feedback. Frauenberger et al. (UK/Austria) also focus on auditory interfaces, but use this to demonstrate mode-independent patterns of navigation. As editors, we are especially pleased to welcome thefinalpaper in this volume, the keynote address with which Professor Alistair Sutcliffe will close our conference. This is the first keynote that we have been able to include in volume 1 for several years and a testament to his organization and close involvement with the British HCI Group. This paper leads our community forward from this conference to face the grand challenges of the future. This is the first return to Scotland for the conference this century, and the first since the Scottish parliament was restored. It coincides with a time when the contributions of the Edinburgh Enlightenment are widely re-evaluated. This period, roughly 1730-1780, coincided with enlightenments in other countries, but Edinburgh has a unique identity that still matters to HCI. This was a time when Scotland was free from the constraints of both church and crown, and before transportation, and then communication, enabled easy control from London. David Hume, Adam Smith and many others had space and time to think and could call upon the resources of four Scottish universities, as they formulated concepts fundamental to the modern world: economics, social sciences, affective components of technology and society. They were free to take a less orthodox view of the industrial revolution and to see the bigger picture of society — that people have a passion to achieve objectives, using whatever technology is at hand, and do so in a rich context of community, laws and division of labour. The religious police of the time certainly found these ideas heretical, yet were unable to prevent a growing social desire for tolerance and an acceptance of the right not to conform to accepted wisdom. We hope our conference will share this mindset. Fundamentalist 'doctrines' have impeded the success of too many information and communication technology (ICT) systems but it's just as bad to simply identify a
Preface: The Bigger Picture
xiii
failure to apply HCI knowledge. As HCI comes into maturity, pointing out what's wrong is no longer enough, we need to take responsibility for creating the climate for solutions to emerge. Each of this conference's three sub-themes are relevant: a deeper understanding of how the human body interacts with technology; taking a wide enough picture of the overall context and recognising the role of cultural factors in certain combinations of situations and people; at the character-level of our bigger picture — where actions and activities take place on a human scale, individually or in groups, and where technology offers part of the solution not the problem. Tom McEwan, Jan Gulliksen & David Benyon June 2005
References Mayes, T. [1991], Preface, in D. Diaper & N. Hammond (eds.), People and Computers VI: Usability Now! (Proceedings of HCI'91 )y Cambridge University Press, pp. 1-2. Preece, J., Rogers, Y. & Sharp, H. (eds.) [2002], Interaction Design: Beyond HumanComputer Interaction, John Wiley & Sons. Shneiderman, B. [2003], Foreword, in J. A. Jacko & A. Sears (eds.), The Human-Computer Interaction Handbook: Fundamentals Evolving Technologies and Emerging Applications, Lawrence Erlbaum Associates.
The Committee Conference Chair Technical Chairs Webmaster Short Papers Industry Day Tutorials Workshops Interactive Experiences
Doctoral Consortium
Posters
Laboratory & Organisational Overviews Panels HCI Educators Workshop Treasurer Social Progranmie Publicity Exhibition Manager Student Volunteers Technical Support Manager Conference Fringe British HCI Group Liaison Officer Previous Conference Chair
Tom McEwan Napier University Edinburgh, UK David Benyon Napier University Edinburgh, UK Jan Gulliksen Uppsala University, Sweden Marc Fabri Leeds Metropolitan University, UK Olav Bertelsen University ofAarhus, Denmark Nick Bryan-Kinns Queen Mary, University of London, UK Catriona Campbell The Usability Company, UK Lynne Coventry NCR, UK Shaun Lawson University of Lincoln, UK Lars Oestreicher Uppsala University, Sweden Paul Cairns University College London, UK Peter Wild University of Bath, UK Morten Borup Haming Open Business Innovation, Denmark Adrian Williamson Graham Technology pic, UK Ann Blandford University College London, UK Paul Curzon Queen Mary, University of London, UK Shailey Minocha Open University, UK Lynne Baillie Telecommunications Research Center, Austria Marianne Graves-Petersen University ofAarhus, Denmark Andy Dearden Sheffield Hallam University, UK Dimitris Rigas University of Bradford, UK Willem-Paul Brinkman Brunei University, UK Helen Sharp Open University, UK Janet Read University of Central Lancashire, UK Sandra Caimcross Napier University Edinburgh, UK Lachlan MacKinnon University ofAbertay Dundee, UK Euan Dempster University ofAbertay Dundee, UK Stephen Brockbank Solas, UK Greg LePlatre Napier University Edinburgh, UK Brian Davison Napier University Edinburgh, UK Dave Roberts IBM United Kingdom Ltd, UK Jane Morrison Consultant, UK Fintan Culwin London South Bank University, UK Janet Finlay Leeds Metropolitan University, UK
XV
The Reviewers Seffah Ahmed Liz Allgar Ghassan Al-Qaimari Francoise Anceaux Tue Haste Andersen Mattias Arvola Chris Baber Lynne Bailhe Sandrine Balbo Gordon Baxter Russell Beale Roman Bednarik Olav W Bertelsen Richard Boardman Inger Boivie Agnieszka Bojko Chris Bowerman Fiona Bremner Stephen Brewster Willem-Paul Brinkman Nick Bryan-Kinns Sandra Caimcross Paul Cairns Luigina Ciolfi Gilbert Cockton Karin Coninx Jane-Lisa Coughlan Lynne Coventry Fintan Culwin Daniel Cunliffe Paul Curzon Oscar de Bruijn Andy Dearden Euan Dempster Meghan Deutscher Jean-Marc Dubois Lynne Dunckley
Concordia University, Canada Leeds Metropolitan University, UK University ofWollongong in Dubai, United Arab Emirates University Valenciennes / CNRS-LAMIH-PERCOTEC, France University of Copenhagen, Denmark Linkbpings Universitet, Sweden University of Birmingham, UK Forschungszentrum Telekommunikation Wien (FTW), Austria University of Melbourne, Australia University of York, UK University of Birmingham, UK University ofJoensuu, Finland University ofAarhus, Denmark Google, USA Uppsala University, Sweden User Centric Inc, USA University of Sunderland, UK General Dynamics Canada, Canada University of Glasgow, UK Brunei University, UK Queen Mary, University of London, UK Napier University Edinburgh, UK University College London, UK University of Limerick, Eire University of Sunderland, UK Limburgs Universitair Centrum, Belgium Brunei University, UK NCR, UK London South Bank University, UK University of Glamorgan, UK Queen Mary, University of London, UK University of Manchester, UK Sheffield Hallam University, UK Heriot-Watt University, UK University of British Columbia, Canada Universite Victor Segalen Bordeaux 2, France Thames Valley University, UK
Mark Dunlop Alistair Edwards Maximilian Eibl David England Paul Englefield Sue Fenley Bob Fields Sally Fincher Peter FrOhlich Peter Gardner Claude Ghaoui Gautam Ghosh Joy Goodman Jan Gulliksen Morten Borup Haming Bo Helgeson Elliott Hey Hans-Juergen Hoffmann Kate Hone Kasper Hombaek Baden Hughes Poika Isokoski Judith Jeffcoate Julius Jillbert Timo Jokela Matt Jones Charalampos Karagiannidis Rene Keller Elizabeth Kemp Pekka Ketola Alistair Kilgour Palle Klante Paula Kotz^ Edward Lank Marta L^sd6ttir Effie Lai-Chong Law Shaun Lawson Linda Little Jo Lumsden Catriona Macaulay
University of Strathclyde, UK University of York, UK Technical University Chemnitz, Germany Liverpool John Moores University, UK IBM United Kingdom Limited, UK Reading University, UK Middlesex University, UK University of Kent, UK Forschungszentrum Telekommunikation Wien (FTW), Austria University of Leeds, UK Liverpool John Moores University, UK Norway University of Cambridge, UK Uppsala University, Sweden Dialogical ApS, Denmark Blekinge Institute of Technology, Sweden IBM United Kingdom Ltd, UK Darmstadt University of Technology, Germany Brunei University, UK University of Copenhagen, Denmark University of Melbourne, Australia University of Tampere, Finland University of Buckingham, UK Hasanuddin University, Indonesia University ofOulu, Finland University ofWaikato, New Zealand University of the Aegean, Greece University of Cambridge, UK Massey University, New Zealand Nokia Multimedia, Finalnd Open University, UK OFFIS, Germany University of South Africa, South Africa San Francisco State University, USA Reykjavik University, Icelatid ETH Zurich, Switzerland University of Lincoln, UK Northumbria University, UK National Research Council of Canada, Canada Dundee University, UK
Stuart MacFarlane Lachlan MacKinnon Robert Macredie Thomas Mandl Phebe Mann Masood Masoodian Dr Rachel McCrindle Tom McEwan David McGookin John Meech Shailey Minocha Sunila Modi David Moore David Morse Ah Asghar Nazari Shirehjini Stuart Neil Nina Reeves Lars Oestreicher Claire Paddison Volker Paelke Rodolfo Pinto da Luz Margit Pohl Simon Polovina Helen Purchase Roope Raisamo Chris Raymaekers Janet Read Irla Bocianoski Rebelo Karen Renaud Tony Renshaw Dimitris Rigas Dave Roberts Tony Rose Ian Ruthven Eunice Ratna Sari Robert Schumacher Helen Sharp Bhiru Shelat Sule Simsek Frances Slack Andy Sloane
University of Central Lancashire, UK Heriot'Watt University, UK Brunei University, UK Universitdt Hildesheim, Germany Open University, UK University ofWaikato, New Zealand University of Reading, UK Napier University Edinburgh, UK University of Glasgow, UK Human Factors Europe Ltd, UK Open University, UK University of Westminster, UK Leeds Metropolitan University, UK Open University, UK Fraunhofer-IGD, Germany University of Wales Institute Cardiff, UK University of Gloucestershire, UK Uppsala University, Sweden IBM United Kingdom Ltd, UK University of Hannover, Germany Universidade Federal de Santa Catarina, Brazil University of Technology Vienna, Austria Sheffield Hallam University, UK Glasgow University, UK University of Tampere, Finland Hasselt University, Belgium University of Central Lancashire, UK Federal University of Santa Catarina, Brazil Glasgow University, UK Leeds Metropolitan University, UK University of Bradford, UK IBM United Kingdom Ltd, UK Cancer Research UK, UK University of Strathclyde, UK TRANSLATE-EASY, Indonesia User Centric, Inc, USA Open University / City University, UK System Concepts, UK University Of Missouri-Rolla, USA Sheffield Hallam University, UK University of Wolverhampton, UK
XVlll
Georg Str0m Desney Tan Anthony Tang Adi Tedjasaputra Phil Turner Susan Turner Katerina Tzanidou Mark Upton CoUn C Venters Robert Ward Peter Wild Adrian Williamson Michael Wilson William Wong Panayiotis Zaphiris
University of Copenhagen, Denmark Microsoft Research, USA University of British Columbia, Canada TRANSLATE-EASY, Indonesia Napier University Edinburgh, UK Napier University Edinburgh, UK Open University, UK EDS, UK University of Manchester, UK University of Huddersfield, UK University of Bath, UK Graham Technology, UK CCLRC, UK Middlesex University, UK City University, UK
H — HCI at the Human Scale
''Looking At the Computer but Doing It On Land'': Children's Interactions in a Tangible Programming Space Ylva Fernaeus & Jakob Tholander Department of Computer and Systems Sciences, University, Forum 100, 164 40 Kista, Sweden Email: ylva@dsv,su,se,
Stockholm
jakobth@dsv,su,se
We present a tangible programming space designed for children's collaborative construction of screen-based interactive systems. The design is based on three goals for interaction and activity: supporting co-located collaborative activity, screen-based execution, and what we call behaviourbased programming. Further, we analyse the interactions within a group of 10 year olds who used the system to create a live fantasy world together. The results show how the tangible resources shaped the activity of programming so that bodily actions and positioning became prominent. This is conceptualized through the notion of embodied programming, which highlights how programming activity must be understood through its interlinking to external resources and context. Keywords: tangible interaction, physical programming, children and programming, interaction design and children, embodied interaction
1 Introduction In designing interactive systems for children a significant challenge is to address ways of efficiently integrating technology with children's social, cultural and physical circumstances. One way of addressing these issue is by designing systems that allow for increased bodily and social engagement around technology [Crook 1997]. In HCI in general, the notion of 'embodied interaction' has been proposed as a theoretical foundation for understanding how these aspects may be further taken into account in design [Dourish 2001]. This notion is largely based on studies of activity and social interaction, where the concept of embodiment in meaning making
4
Ylva Femaeus & Jakob Tholander
practices has increasingly been emphasized [Goodwin 2000; Heath & Hindmarsh 2000]. Especially, this has been the case in ethnomethodological studies, where focus is on what people actually *do' when interacting with computational artefacts [e.g. Suchman 1987; Heath & Luff 2000]. In such studies, 'embodiment' is used to refer to how bodily actions, such as gesture, gaze and physical positioning, in a significant sense play a part in meaning making and social interaction with technology. A basic assumption of such work is that representational forms are resources that structure people's actions and thereby shape the content of the activity that they engage in, and that activity changes along with changes of representations. In HCI, the concept of embodiment is used also to describe tangible user interfaces (TUI). In Ullmer & Ishii's [1997] original conceptualization of TUIs, embodiment refer to how interface elements may take the form of physical objects, and hence work simultaneously as devices for input and output of computational processes. Lately, this notion has been further elaborated and relaxed, to refer to the degree of physical coupling between input and output devices [Fishkin 2004]. Hence, the concept of embodiment has at least two usages in HCI. One use is theoretically based in studies of social action and has been used to describe aspects of people's interaction with technology. The other is to describe more straightforwardly how tangible user interfaces are physically manifested. In this paper, we are concerned with embodiment in the context of children's interaction when using new tools for building with computational media. This means that focus is on how children's everyday practices, as well as physical resources, may be incorporated into the activity oi programming, which traditionally has been regarded as occurring mostly 'inside the machine' or, 'in one's head' [Norman 1993]. We conceptualize this through the notion of 'embodied programming', exemplified by a tangible programming space designed to allow for collaborative programming actions by a group of children in informal and playful settings. The paper starts by laying out the goals for interaction that the system is based upon and a brief overview of the design activities out of which it emerged. Next is a short description of the programming space as it wasfinallyimplemented. Thereafter we present an analysis of the interactions within a group of 10-year olds while using the system to build an interactive fantasy world together. Finally, we discuss our results with respect to spaces for co-located interaction designed for children.
2 Designing for Children's Collaborative Dynamic Systems Construction Particular for the screen-based, computational media is that it allows for creation of dynamic and interactive systems such as games, simulations and animated fantasy worlds. Systems such as these are parts of children's culture, and the ability to express oneself in this particular media is often referred to as an important literacy skill in contemporary culture [di Sessa 2000; Femaeus et al. 2004; Snyder 2002]. Moreover, being able to create own dynamic systems, such as games, is something that many children would like to do. Yet, getting children to engage productively with the computational media has proven difficult to achieve in realistic school settings, as well as in more informal after-school clubs.
"Looking At the Computer but Doing It On Land" In studies involving children who build things on the computer, children are often encouraged to work in small groups [e.g. Kafai & Ching 2001; Suzuki & Kato 2002]. However, the resources that they have at hand are normally designed with the individual user in mind, rather than addressing the specific requirements of successful collaboration. In the analysis of children's interaction around computers, studies have shown that in collaborative settings children often spend considerable efforts in working around these circumstances [Crook 1997]. In our design work we aim to support the kind of interaction and sharing that is characteristic of children's everyday play activities, for instance when building and creating with materials such as sand, clay and Lego bricks. Important properties of such settings are that they allow for action and interaction to be performed concurrently, and that physical manipulation may be conducted jointly as well as individually. This relation between individual and collective activity for successful collaboration around technology has been emphasized for instance by Kaptelinin & Cole [2002]. Moreover, in their everyday play activities children continuously invent and reinvent the rules of the activity, and also reinterpret the meanings of the artefacts with which they play [Vygotsky 1976]. In the development of interactive systems for children, a current trend is to develop novel technologies that afford social and collaborative activity [Druin 1999]. In new interfaces for progranmiing, these aspects are addressed through tools that support networked [Tisue & Wilensky 2004], as well as bodily and tangible forms of interaction [Suzuki & Kato 1995; Eisenberg et al. 2002; Montemayor et al. 2002; Wyeth & Purchase 2003; McNemy 2004]. With tangible progranmiing tools, users are able to collaboratively make programs by manipulating physical objects that represent functions, objects and relations in the program. We contribute to this research by presenting a shared physical progranmiing space for children's collaborative and co-located creation of games, simulations and interactive fantasy worlds. The system is based on three basic goals for interaction and activity: To support co-located collaborative activity. The system aims to support children to collaboratively engage in a shared endeavour of co-constructing interactive systems. To get this to happen, children need support in developing a conmion sense of ownership of what they are building, and also that the situation provides a rich supply of external resources, to which access is not constrained by physical limitations. Instead, rules and constraints in the activity should to a larger extent be defined through social participation in the group. To allow for buUding of systems that run on a screen. We aim to support children's co-construction of screen-based systems, since it allows for a larger range of computational expressions than possible in tangible media. Other approaches to tangible progranmiing are to use screen-based interfaces to control tangible constructions (e.g. Lego Mindstorms), or to use tangible objects to program other physical devices [McNemy 2004], or to simply let the arrangement of tangible objects control the behaviour of the objects themselves [Wyeth & Purchase 2003].
5
6
Ylva Femaeus & Jakob Tholander
To support behaviour-based programming. In behaviour-based programming, programs are constructed by manipulating and reconfiguring elements that each represents parts of the behaviour of an object, for instance how it should move, or how it should act upon collision with other objects. This allows children to build systems that are rich in dynamic and interactive properties without having to engage too much with the details of the underlying codesyntax and algorithms. These three goals for interaction have emerged both through our theoretical commitments in viewing cognition as an embodied phenomenon as well as through our work with groups of children using visual programming tools in ordinary PCsettings. Central in this work has been design for and studies of children: 1. Making programs by combining and reconfiguring readily prepared behaviour objects to create the functionality in their systems [Tholander et al. 2002]. 2. Debugging and performance of existing programs using collaborative roleplay activities away from the computer [Femaeus & Tholander 2003]. Below is a short description of these two activities.
2.1 Staging Programming Activities with Children Our previous work has shown that building programs through combining and reconfiguring readily prepared behaviour objects is appropriate for the targeted age groups to be able to practically realize the systems they want to build. To make use of pre-built pieces of existing programming code, and to combine these to create more complex behaviours of objects in interactive applications is used in many progranuning environments. However, most progranuning systems for children use rule-based models of programming, for example StageCast Creator [Smith & Cypher 1999; Smith et al. 2001] and Agentsheets [Repenning & Perrone 2001]. We have explored behaviour-based progranmiing by letting children build systems in ToonTalk [Hoyles et al. 2002] and with paper-based progranuning prototypes [Femaeus & Tholander 2003]. The behaviours can be attached to anything on screen, for instance a home drawn picture or a photograph of a clayfigurecreated in school. The ability to have simple ways of controlling these pictures, for instance to make them jump with the arrow keys, is often more in line with what the children want to pursue with the programming activities than to engage with the lower levels of mle-based programming. An important part of this work has been to develop libraries of different kinds of behaviours and to package them as sets of mnning examples called Anima Gadgets (for game progranuning) and Animal Gadgets (for making eco systems simulations). The selection of predefined behaviours can be tailored for a range of different activities such as programming of games, dynamic simulations, or fantasy worlds. The expressiveness of the programming material then depends more on what pre-built behaviours that have been prepared and how these can be combined and reconfigured, than on the properties of the underlying progranmiing language [Tholander etal. 2002].
"Looking At the Computer but Doing It On Land"
7
2.2 Collaborative Role-play Activities An important part of our activities with children has been collaborative simulated execution of games away from the computer. These activities have turned out to be fun and sociable way of supporting children in externalizing their ideas and collaboratively think about behaviours, relations and interactions between objects in the system. In these activities a group of children collaboratively act out the functionality of a system that they are currently developing on the computer, as a way of discussing, test running, and debugging its functionality. The activities are modelled after the set of behaviours available in the programming environment, and all the materials used in the activity, such as behaviour cards and pictures, correspond to the resources that the children have available when programming on the computer. When conducting such an activity, clay figures or large paper elements representing objects in the computational system are arranged on an area such as the floor or a table serving as the background. One or several persons are assigned the role of users, and the rest get assigned the responsibility for the execution of one or a small set of programming rules in the system, for instance that of removing an object from a game if it collides with another object. The system is played by iteratively and collaboratively evaluating all the rules/behaviours in the system. In each iteration the children perform their actions if the conditions for their behaviours are fulfilled. The activities normally start out from games and systems running on the computer, and during such activities, the systems are often 'reprogrammed' in several variations.
3 Design of the Programming Space The basic concept for the tangible programming space is to allow for behaviourbased programming in a social setting, similar to that of the role-play activities described above. Our design process has involved explorations and user-studies of low-fidelity prototypes made with paper and clay as well as hands-on explorations of various resources for actually implementing the tangible system. This differs in many ways from participatory design approaches that have become dominant in research concerning interaction design and children [Druin & Fast 2002]. Instead, our approach to user involvement is to stage activities aiming to engage children in productively using technology. Based on analysis of such activities we further develop and refine our designs. The system takes the shape of a physical space, (see Figure 1) equipped with a number of tangible resources laid out on a surface on the floor. The tangible resources are used to manipulate the looks and the behaviour of visual objects displayed on a screen projected on a wall. Figure 1 shows the physical setup of the system, consisting of: 1. A large white plastic mat with 14x14 wirelessly identifiable position tags underneath. 2. A set of plastic programming cards.
Ylva Femaeus &. Jakob Tholander
Figure 1: The setup of the programming space.
Figure 2: Programming cards: Picture and behaviour cards placed on the creator blocks on top of the mat.
3. Several tangible creator blocks that are wirelessly connected to the software on the computer. 4. A visual display showing the system that is being built. The system includes a construction mode and an execution mode. In the construction mode, three types of basic actions can be performed: New objects (pictures) can be added to the display, behaviours can be added to existing objects on the display, and existing objects can be deleted. It is also possible to load and to save existing game configurations. In execution mode the objects on the screen start acting according to the behaviours that have been attached with them. The creator blocks connect the physical system with its virtual representation. When users interact with the system, they add objects and behaviours to the onthe screen representation by simply placing cards on top of the creator blocks (see Figure 2). A rectangle moves on the screen in correspondence to how the creator block is moved on the mat. The position of the creator blocks and the id of the cards
"Looking At the Computer but Doing It On Land"
Figure 3: Typical steps taken when adding things to the display.
placed on them are wirelessly communicated to the software running on the host computer. There are two kinds of programming cards: pictures and behaviour cards. Picture cards are used to place new objects at specific locations on the screen, while the behaviour cards are used to specify the functionality of objects that have already been added. To add a new picture at a specific location, a picture card is placed on top of the creator block. Behaviours are added to existing objects by first make sure the creator block is at a position where there is an object and then putting a behaviour card on top of it (see Figure 3). Behaviour cards consist of a set of behaviours for movement, collisions, user interaction, and for making objects belong to specific groups. Control cards are cards for controlling the system, such as running, stopping and saving a game or simulation. When placing an empty card on top of a reader, the current game setup is logically saved onto that card. Once this card is read again by a reader, the game stored *on' the card is displayed on the screen.
4 Children's Interaction in the Programming Space This section presents an analysis of a group of children interacting within the programming space. The study involved five 10-year-old children who worked for three two-hour sessions building an interactive world together. The study was setup to provide input on the design following our principles of staging productive activities with children. This has turned out to be a good way of understanding
10
Ylva Femaeus
and analysing design issues in settings where children become familiar with the technology, thereby attempting to avoid potential novelty effects. All the children came from the same school class and voluntarily signed up to participate during the autumn break. The study took place in an art gallery, and ended with a public event where the group presented their work to friends, family and others. When introducing the technology to the children, we explained that it could be used for a number of different scenarios such as making of games, simulations of food webs, and to make illustrations of other school oriented issues. After this introduction, the children were intentionally left to work mostly on their own. The role of the researchers was to make sure that the technology worked properly and to generally support the children whenever they needed assistance. The only given restrictions were that both the physical space on the floor and the screen should be part of what they created. A significant part of the work took place at the table where the children created characters, objects, and surfaces in modelling clay, cloth and plastic. The children built a world on top of the mat using the different materials they had at hand. A photo of the world was added as a background image for the visual display. The children also sketched out the central elements of the narrative on a whiteboard where one of them acted as script and lead the discussion. Photos of the physical objects that they created were added to the system so that the pictures were associated to corresponding new picture cards. The behaviour cards used in the study consisted of a small set of simple behaviours for movement, colours and collisions. The colour cards were used to logically group objects into different 'teams', as a way of specifying types of objects and how those would interact with other types of objects. These worked together with the collision behaviours for 'eating', e.g. adding the 'eat green' behaviour to an object would make another object with the colour green to disappear upon collision. There was also a 'wall'-behaviour, which made objects impossible to pass by other objects. The project that the children finally ended up with took the form of an interactive story that they named 'Desert City'. The story was set in a desert landscape with a jungle, sand dunes, a cave, an oasis, and a city surrounded by a wall. The plot of the story was that a baby Bedouin was chased by the evil Dracula who lived in a cave in the jungle. The baby was living inside the city and was guarded by its parents and other friendly people. There was only one place to enter the city through the city wall. In the jungle there were palm trees with coconuts and oranges that the people in the jungle lived on. There were also threats towards Dracula such as a leopard and a poisonous snake. Figure 4 shows the invitation that the children wrote to describe their project to the visitors of the exhibition. On the invitation they also included a screen-shot of the system.
5 Interaction Resources in Design and Programming Two digital video cameras were used to film the entire workshop. We analysed the video material using Interaction Analysis [Jordan & Henderson 1995] in which talk, interaction, and artefacts are focused upon. In the analysis of the children's activities with and around the system we investigated their use of the different interactive
"Looking At the Computer but Doing It On Land'* Come to Kista+Konst and play at 1-3 pm on Saturday 13th of November Desert City is almost like a game made by Carl, Ivan, Nawar, Niki, and Sebastian. In forth grade in Eriksbergsskolan. You will get to see a game present on both computer and on the floor Ylva and Jakob and company have helped us build this game world. We have worked with fabrics, modelling clay, a mat on the floor and computer thingamajigs. We look at the computer but we do it on land.
Figure 4: Children's description and a screen-shot of the Desert City when playing.
resources they had at hand. Issues that we found particularly prominent in their interactions were: first, the physicality of the programming cards, second, their use of the virtual and physical space for sharing and coordination of actions, and three, how they blended social rules of play with construction of computational rules.
5.1 The Physicality of Programming Resources The children made use of the programming cards in a range of different ways throughout the activity. The most obvious use was of course to program the objects and characters to be included in the system. However, the children also extensively incorporated their cards in a number of additional ways. For instance, when discussing and negotiating alternative designs of objects and of the physical space where the city was laid out, they often used the cards to demonstrate their ideas to each other. This could include the behaviours they wanted to use for a particular idea, or how they saw relations between different objects in their imagined city. For instance, in Figure 5, the girl is demonstrating to her friend how the two cards she is holding should be related in the game. Hence, the tangible forms of representation of the programming objects afforded a range of actions that were of a rather different kind than mere programming actions. Thereby, in thinking and negotiating about the design of the system, the cards were a primary resource. A related finding concerns the relation between design decisions regarding the story and the actions required to actually implement these in the system. Through the ways the children used the cards throughout the activity, the boundaries
11
12
Ylva Femaeus & Jakob Tholander
Figure 5: "This one is protected by this one" — demonstrating ideas by using progranuning cards.
between design, programming and implementation of objects became blurred. The fact that the cards were involved in brainstorming, negotiations, and discussions about the design, blurred the distinction between actually programming the system and designing it. In several cases when no creator block was available the children instead made design decisions and then collected the cards needed for the implementation in a stack that they placed on the mat. Thereby, all the necessary decisions regarding the design as well as programming of the objects had already been taken, even though the actual implementation of the objects still remained. The action of placing the cards on the creator blocks were often a mere practicality while the important work were conducted elsewhere. This stands in contrast to programming with tools such as Stagecast Creator and ToonTalk where the particular actions involved in implementing a system often plays a more significant role in the activity [e.g. Rader et al. 1997].
5.2 Shared Spacesfor Joint Activity Another resource for interaction that the children used throughout the workshop was their bodies and how they were physically positioned on the mat. The fact that the programming space was laid out as carpet on the floor that several children could sit around, and even walk on, significantly influenced the sense making practice that the children engaged in. The physicality of the programming space allowed the children to socially interact in ways that are hard to see when programming in traditional desktop settings. The snapshots in Figure 6 are taken from a sequence where three parallel activities had been going on for some time. In this example, one of the children stands up and walks over to the other side of the mat. By physically relocating himself, he also partly contributed to shifting the focus of the conversation and allowed for him to take part in another aspect of the fantasy world under construction. An observation related to the previous one regards how the children organized themselves throughout the activity. Repeatedly, they divided into 'subgroups' (consisting of one or more children) that performed parallel activities. However, these were not conducted in isolation from each other but extensively involved
"Looking At the Computer but Doing It On Land"
13
Figure 6: Moving from the 'desert' to the 'city wall',
Figure 7: All looking at the shared space (left) leading one of the boys to hand over a card to his friend (right).
interactions across the different subgroups. An important trigger to these interactions was the possibihty of actually seeing the actions performed by the others on the mat. Moreover, also the projection on the wall worked as a shared space where the programming actions of the others became available. References to the projection often triggered interactions such as telling someone that a behaviour was missing, discussing the rules of the game they were building, or simply handing over a card that one assumed that someone needed (see Figure 7). Shifting between and referring to these two different shared interactive surfaces became central in the activity. Another aspect important of our observations was the possibility of working individually as well as collectively. With a setting that does not allow for parallel
14
Ylva Femaeus & Jakob Tholander
Figure 8: Screen-shot of the display when in programming mode.
activities, it becomes difficult to distinguish the individual from the collective. The ability to make this distinction could be seen for instance in referring to parts of the system as *ones own'. We believe that this was used by the children both for developing a sense of personal ownership, and also to get the sense of being a part in a shared endeavour.
5.3 Negotiation of Social Rules Already a few minutes into thefirstsession when still just exploring the technology, the children divided into groups and agreed upon rules for the activity that each group had to follow. These included how many characters each team could construct, what behaviours they were allowed to use, and what colours their characters were allowed to have. The rules for the activity were stated as "You get three players, and we get three players" which they continuously monitored as indicated by statements like "He is cheating". The rules of the activity were also continuously refined as suggested by "Okay let's have five players instead" when somebody added more than the agreed upon number of characters. Hence, these social rules that the children created involved many aspects beyond the joint creation of a shared fantasy world. Figure 8 shows how the screen display of the final version of the Desert City looked when in programming mode. All the characters and objects in the system have been assigned a colour by the children (displayed as a surrounding square). This assignment of colours follows a strict system of rules that the children came up with on their own. *Good' characters and objects have been assigned the colour 'green', evil ones were given the colour *red', and *blue' was used for things considered as 'food' in their fantasy world. Black was the colour used for walls.
"Looking At the Computer but Doing It On Land"
15
The coloured squares on top of the characters represent different movement and coUision behaviours. Throughout the activity, while constructing the game and before executing a different version they checked that the characters and behaviour did not break these rules that they had set up for the activity. By creating social rules for how they should jointly interact with and around the system they made the interaction with the system highly collaborative. The social rules also enhanced the activity to go beyond that of only using the system. This allowed them to involve social elements of play into the construction activity itself. For instance, at several instances, the children hid cards from each other so that it would not be possible for the others to add a particular behaviour to some of *their' characters. By hiding a card, the children could actually hide a piece of programming code from their friends, which is an action that would be quite peculiar to support in a desktop application. However, this was not an immediate design choice on our behalf. Instead, the possibility of such actions arises as a consequence of giving the programming code a physical manifestation in the form of plastic cards. This contributed to the possibility for the children to define and negotiate their own social rules for how the activity should be conducted, which was important in creating the social grounds needed for achieving a truly collaborative activity.
6 Concluding Remarks In this paper, we have used the notion of 'embodied programming' to emphasize the importance of investigating and understanding new possibilities for action and social interaction that physical and tangible forms of computation may afford. We have presented a tangible programming space that allows groups of children to collaboratively create dynamic systems to run on a computer screen. The design is based on three basic goals for interaction and activity, all based on our experiences from working with children building systems in traditional PC settings. These goals were: 1. Supporting co-located collaboration. 2. To allow for screen-based execution. 3. To allow for what we call behaviour-based programming. By investigating how children actually 'do' programming when interacting in the space, a number of important elements were found that relate to key aspects for research in physical and tangible interfaces. In this case, this regards the involvement of bodily actions and social practices into the activity, which structured the character of the activity to become essentially different from similar tasks conducted in a traditional PC setting. We have been given particular emphasis to three aspects that we found important to the collaborative aspects of the activity: the tangible resources, the physical surfaces and shared spaces, and the involvement of social rules. The restricted and computationally active area on the floor together with the screen projection worked as a shared place around which the children could orient
16
Ylva Femaeus & Jakob Tholander
their actions. This, together with the physicality of the programming cards provided a richer set of resources for giving account of one's actions to others, which is often difficult in virtual collaborative workspaces [Heath & Hindmarsh 2000]. Hence these were central resources for achieving a sense of sharing and collaboration throughout the activity. Moreover, the physical properties of the design allowed children to create a highly personal interactive fantasy world. They extensively incorporated elements from their social play practices into the activity of constructing a system together, by blending rules of play with rules constructed in the computational system. Game construction thus became a sub-element of the larger activity where virtual, physical and social aspects played a part. The social setting in combination with the tangible properties of the system provided possibility for children to engage in a collaborative activity that would be difficult to achieve in a traditional PC setup. The particular focus of this paper has been on interaction related to the activity of programming. However, our results have implications for the design of technologies for co-located collaborative interaction in a more general sense as well. Through the interaction and construction available with tangible interfaces with this kind of properties we can provide possibilities for children to build their own bridges between physical and virtual objects through playful collaborative activity. Our work has illustrated how children's everyday play practices can be a valuable resource when designing systems for collaboration.
Acknowledgement We would like to thank Christopher Balanikas, Martin Jonsson and Johan Mattsson for helping us to implement and stabilize the different components of the programming space, and also Ulla West for inviting us to perform the study at the Kista+Konst art gallery, and also for assisting us in our work throughout the study.
References Crook, C. [1997], Children as Computer Users: the Case of Collaborative Learning, Computers & Education 30(3-4), 237^7. di Sessa, A. [2000], Changing Minds: Computers, Learning, and Literacy, MIT Press. Dourish, P. [2001], Where the Action Is: The Foundations of Embodied Interaction, MIT Press. Drain, A. (ed.) [1999], The Design of Children's Technology, Morgan-Kaufmann. Druin, A. & Fast, C. [2002], The Child as Leamer, Critic, Inventor and Technology Design Partner: An Analysis of Three Years of Swedish Student Joumals, The International Journal for Technology and Design Education 12(3), 189-213. Eisenberg, M., Eisenberg, A., Gross, M., Kaowthumrong, K., Lee, N. & Lovett, W. [2002], Computationally-enhanced Construction Kits for Children: Prototype and Principle, in G. Stahl (ed.). Proceedings of International Conference of the Learning Sciences, Lawrence Erlbaum Associates, pp.79-85.
"Looking At the Computer but Doing It On Land"
17
Femaeus, Y, Aderklou, C. & Tholander, J. [2004], Computational Literacy at Work, Children's Interaction with Computational Media, in D. Kinshuk, G. Sampson & P. Isaias (eds.), Proceedings oflADIS International Conference Cognition and Exploratory Learning in Digital Age (CELDA 2004), lADIS Press, pp. 181-8. Femaeus, Y. & Tholander, J. [2003], Collaborative Computation on the Floor, in B. Wasson, R. Baggetun, U. Hoppe & S. Ludvigsen (eds.), Proceedings of Computer Support for Collaborative Learning CSCL2003, InterMedia, pp. 65-7. Fishkin, K. P. [2004], A Taxonomy for and Analysis of Tangible Interfaces, Personal and Ubiquitous Computing 8(5), 347-358. Goodwin, C. [2000], Action and Embodiment within Situated Human Interaction, Journal of Pragmatics 32(10), 1489-522. Heath, C. & Hindmarsh, J. [2000], Embodied Reference: A Study of Deixis in Workplace Interaction, Journal of Pragmatics 32(10), 1855-78. Heath, C. & Luff, P. [2000], Technology in Action, Cambridge University Press. Hoyles, C , Noss, R. & Adamson, R. [2002], Rethinking the Microworld Idea, Journal of Educational Computing Research 27(1), 29-53. Jordan, B. & Henderson, A. [1995], Interaction Analysis: Foundation and Practice, Journal of Learning Science 4(1), 39-103. Kafai, Y & Ching, C. C. [2001], Affordances of Collaborative Software Design Planning for Elementay Student's Science Talk, Journal of Learning Science 10(3), 323-63. Kaptelinin, V. & Cole, M. [2002], Individual and Collective Activities in Educational Computer Game Playing, in T D. Koschmann, R. Hall & N. Miyake (eds.), CSCL 2: Carrying Forward the Conversation, Computers, Cognition & Work, Lawrence Erlbaum Associates, pp.303-16. McNemy, T. S. [2004], From turtles to Tangible Progranmiing Bricks: Explorations in Physical Language Design, Personal and Ubiquitous Computing 8(5), 326-37. Montemayor, J., Druin, A., Farber, A., Sinmis, S., Churaman, W. & D'Amour, A. [2002], Physical Programming: Designing Tools for Children to Create Physical Interactive Environments, in D. Wixon (ed.). Proceedings ofSIGCHI Conference on Human Factors in Computing Systems: Changing our World, Changing Ourselves (CHr02), CHI Letters 4(1), ACM Press, pp.299-306. Norman, D. A. [1993], Cognition in the Head and in the World: An Introduction to the Special Issue on Situated Action, Cognitive Science 17(1), 1-6. Rader, C , Brand, C. & Lewis, C. [1997], Degrees of Comprehension: Children's Understanding of a Visual Progranmiing Environment, in S. Pemberton (ed.). Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI'97), ACM Press, pp.351-8. Repenning, A. & Perrone, C. [2001], Progranmiing by Analogous Examples, in H. Lieberman (ed.). Your Wish Is My Command: Programming by Example, MorganKaufmann, pp.351-70.
18
Ylva Femaeus <Sc Jakob Tholander
Smith, D. C, Cypher, A. & Tesler, L. [2001], Novice Programming Comes of Age, in H. Lieberman (ed.), Your Wish Is My Command: Programming by Example, MorganKaufmann, pp.7-20. Smith, D. C. & Cypher, A. [1999], Making Programming Easier for Children, in Druin [1999], pp.202-21. Snyder, I. [2002], Silicon Literacies: Communication, Innovation and Education in the Digital Age, Routledge. Suchman, L. A. [1987], Plans and Situated Actions — The Problem of Human-Machine Communication, Cambridge University Press. Suzuki, H. & Kato, H. [1995], Interaction-level Support for Collaborative Learning: AlgoBlock — An' Open Programming Language, in J. L. Schnase & E. L. Cimnius (eds.). Proceedings of Computer Supported Collaborative Learning CSCL 1995, Lawrence Erlbaum Associates, pp.349-55. Suzuki, H. & Kato, H. [2002], Identity Formation/Transformation as a Process of Collaborative Learning of Programming Using AlgoArena, in T. D. Koschmann, R. Hall & N. Miyake (eds.), CSCL 2: Carrying Forward the Conversation, Computers, Cognition & Work, Lawrence Erlbaum Associates, pp.275-96. Tholander, J., Kahn, K. & Jansson, C.-G. [2002], Real Progranmiing of an Adventure Game by an 8-year-old, in P. Bell, R. Stevens & T. Satwicz (eds.). Keeping Learning Complex: Proceedings of Fifth Second International Conference on the Learning Sciences (ICLS 2002), Lawrence Erlbaum Associates. Available at http://www.dsv.su.se/research/kids/pdf/RealProgInICLSTemplate.pdf (last accessed 200606-11). Tisue, S. & Wilensky, U. [2004], NetLogo: A Simple Environment for Modeling Complexity, in Proceedings of International Conference on Complex Systems. Proceedings not yet published but paper avaialble at http://ccl.northwestem.edu/papers/netlogoiccs2004.pdf (last accessed 2005-05-31). Ullmer, B. & Ishii, H. [1997], Tangible Bits: Towards Seemless Interfaces Between People, Bits and Atoms, in S. Pemberton (ed.). Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI'97), ACM Press, pp.234-41. Vygotsky, L. S. [1976], Play and its Role in the Mental Development of the Child, in J. S. Bruner, A. Jolly & K. Sylva (eds.). Play: Its Role in Development and Evolution, Penguin, pp.461-3. Wyeth, P. & Purchase, H. C. [2003], Using Developmental Theories to Inform the Design of Technology for Children, in S. MacFarlane, T. Nicol, J. Read & L. Snape (eds.). Proceedings of Interaction Design and Children, ACM Press, pp.93-100.
The Usability of Digital Ink Technologies for Children and Teenagers Janet C Read Child Computer Interaction Group, University Lancashire, Preston PRl 2HE, UK
This paper describes an empirical study that considered the usability of digital pens, Tablet PCs, and laptop PCs for handwritten text input by young users. The study was carried out in two parts, firstly with young children aged 7 and 8, and then with older children aged 12 and 13. The study found that digital pens were particularly well suited to older children and that the both sets of children were able to use the Tablet PC without too many errors. Digital ink technologies are often evaluated by the calculation of recognition rates and this paper exposes some of the flaws in the process of estimating recognition rates from activities involving the copying of text. With particular reference to the personalization of text, possibilities for the use of digital ink for the task of writing are explored and a new interaction, digital doodling, is presented. Keywords: teenagers, children, usability, empirical study, digital pens. Tablet PC, handwriting recognition, digital doodles, evaluation.
1 Introduction In the book Human-Computer Interaction in the New Millennium, John Carroll describes HCI as: The study and practice of usability. It is about understanding and creating software and other technology that people will want to use, will be able to use and will find effective when used. [Carroll 2002]
20
Janet C Read
Text Entry Research - Timeline • HUMAN-COMPUTER INrERACTICM4 •GUIs •Mouse Input •Direct Manipulation Lots
Research Activity Little
i:ilaniiWtii|g-^itiGo|Mtii^^^
Figure 1: Text entry timeline [MacKenzie & Soukoreff 2002b].
This aligns closely with the three requirements for product success that are proposed by Dix et al. [2003], these being for products to be used (attractive, engaging, fun), usable (easy to use), and useful (accomplish what is required). These ^permanent' definitions of HCI effectively dictate the landscape upon which HCI is painted. The detail within the HCI community changes over time with different aspects of interaction gaining favour and momentum at the expense of others. Pattern languages, Web interaction, task analysis, and mobile technology are all examples of issues that have drifted in and out of favour over the years. The study of the usability of text input methods and specifically the effectiveness of different text entry methods is one area of research that has moved in and out of fashion. The timeline in Figure 1 shows how changes in interest in this area have primarily resulted from the arrival of new technologies [MacKenzie & Soukoreff 2002b]. The growth in sales of pen-based systems and the improved functionality of handwriting recognition are both important developments with respect to text entry research. When text is captured using a stylus on a Tablet PC or by the use of a digital pen, a new method for interaction is available. The handwritten text can be manipulated and stored without recognition or can be converted into ASCII text using handwriting recognition software. The people that interact with technology have also changed over time. Over the last 20 years, the user population has expanded to include a wide range of people including children, older people and people with sensory and motor disabilities. Children are an interesting user group, they represent the only user group that is ageing in a positive way, their acquisition of skills and knowledge is rapid and their motivation for using computers is very different from the work place adult user.
The Usability of Digital Ink Technologies for Children and Teenagers
21
the child as...
Figure 2: Roles of children in interactive product development [Druin 2002].
The investigation of children and their impact at different stages of product development as shown in Figure 2 has resulted in the emergence of a new discipline, Child Computer Interaction (CCI) [Read 2005]. This discipline takes its roots from a few early pioneers Frye & Soloway [1987], Solomon [1978], and Kafai [1990] and owes much of its current impetus to the design work by Druin [1999] and the vision of Bekker et al. [2002] in instigating a dedicated conference series.
/. /
Motivation for the Research
Twenty years ago it was highly unusual for a child to be doing text input at a machine but nowadays children spend a considerable time at the computer, inputting short and lengthier text via a range of different applications ranging from the search bar in Google, through the chat interface of MSN to the familiar Microsoft Word word processing package. It is common for children to be expected to word-process schoolwork, often creating the first draft using pen and paper and then typing up thefinalversion for assessment or display. For adult users, prolonged keyboard use is known to cause muscle ailments, stress injuries, and eyestrain [Thelen 1996]. Children are using the same technologies as adults with little regard for any long-term effects of computer text input by children, whether that is at a keyboard, on a mobile phone keypad, or by some other method. The effect of prolonged computer use on the eyesight and posture of children was known in the early nineties with Palmer [1993] reporting vision problems and Weikart [1995] detailing muscle disorders. More subtle affects, such as the impact of computer text creation on the language and understanding of children have been less well explored. The creation of text using digital pen and ink technologies may reduce some of these problems.
1.2
The Research Study
The work described in this paper is an exploration of the usability of three pen based digital ink text input methods for children. It begins with an overview of text entry and then goes on to explore some alternative text entry technologies including descriptions of handwriting with a stylus on a Tablet PC, handwriting with a graphics tablet and pen on a standard PC, and handwriting with a digital pen on digital paper. The paper then provides an overview of the methods that are commonly used for the evaluation of text input technologies.
22
Janet C Read
An empirical study is then presented that explores the usability of the three digital ink technologies with two distinct user groups; one was a group of seven and eight year old children, the other a group of twelve and thirteen year old children. The paper concludes with a discussion of the results, and a discussion of some emerging issues.
2
Computer Text Input for Children
There are good reasons for encouraging children to engage in computer text input. Writing text in emails, for instance is known to help children understand the notion of writing for an audience [Garvey 2002] and is also seen to be liberating as emails can be 'written in any style' and 'allow children to explore their inner voice' [Turrell 1999]. Written work produced at a computer can be made to look good, thus motivating poor writers [Day 1994], and computers allow the representation of ideas in dynamic forms, provide improved feedback to pupils, and allow information to be easily altered [Moseley et al. 1999]. Traditionally, text is input to a computer using an alphabetic keyboard. These keyboards can be arranged in different ways with the most common presentation being the QWERTY keyboard that lays the characters out in the same way as the early typewriters. The action of using a keyboard for text entry is occasionally referred to as keyboarding, but as the term keyboarding also refers to the mastery and use of electronic organs, in this paper, the action of entering text, is described as typing. The process of typing can be broken into five phases, these are, character recognition, storage, motor activity, keystroke and feedback [Cooper 1983]. Character recognition is when the typist recognizes the letter on the keyboard, storage is the process by which the typist is able to be reading ahead (possibly four to eight characters at a time for experienced adults), the motor activity is the movement of the fingers to the keys, the keystroke is the pressure needed to press the key and the feedback is essential for error detection and correction (this could be omitted or could be made to happen later, for example with blind users who may have the text read back to them at a later time). It is possible to become quite skilled at the alphabetic keyboard; but many people, and particularly children, find typing difficult [Norman & Fisher 1982]. The layout of the keyboard makes high demands on short-term memory and poor motor control can also limit keyboard efficiency as children may 'miss' the appropriate key, hold it down for too long, or fail to press it sufficiently.
2.1 Alternatives to the Keyboard The most commonly found alternatives to the alphabetic keyboard are the reduced keyboards (as seen on mobile phones) and the recognition technologies of speech and handwriting. Text entry at a mobile phone is a specialist area of research and is not explored here in any detail; readers are directed to the work by MacKenzie & Soukoreff [2002b] for a full treatment of this area. The two recognition technologies are essentially quite similar; the user communicates by speaking or writing and this is captured by the hardware and then digitized. The digitized speech or writing is then converted into ASCII (or
The Usability of Digital Ink Technologies for Children and Teenagers similar) representation by the application of recognition algorithms, sound, word or character matching, and in some instances, the application of language models. These recognition processes are error prone both at the point of capture and at the point of recognition [Plamondon & Srihari 2000]. Speech recognition is problematic for children as their speech is immature and young children are often unable to read the training text that is needed to individualize (train) the recognition algorithms. Work by the author has established that speech recognition without training is highly error prone with children [Read et al. 2001]. Handwriting recognition software is reasonably robust and can be used without individualization; earlier work by the authors has established that there is scope for its use with child users [Read et al. 2004]. To use handwriting recognition for text input there is a need for technology that can support the capture of the written text and software to carry out the recognition.
2.2
The Usability of Digital Ink Technologies
The effectiveness or usefulness of handwriting recognition interfaces is generally measured by determining the accuracy of the recognition process. This is only relevant if the handwritten text is to be converted into ASCII text before use. If no conversion is intended, the accuracy of the recognition algorithms is irrelevant. Research studies tend to report recognition error rates that are generally derived from information about what the user wrote and what the recognizer subsequently output. There is very little research that takes a holistic view of recognition-based systems. The value of the system to the user, and the effort saved by the user is seldom reported [Hartley et al. 2003; Huckvale 1994]. The accuracy of the recognition process for text entry is typically measured by apportioning a percentage score to text after it has been through the recognition process. Metrics that are used for this have been derived from those used for the accuracy of keyboard input, and accuracy (or error rate) scores are generated by comparing a string of presented text (input) (PT) with a string of transcribed text (output) (TT) [Prankish et al. 1995; MacKenzie & Chang 1999; Tappert et al. 1990]. The two strings are compared and the 'errors' in the transcribed text are classified as insertions (/), deletions (D) or substitutions (5). These are then totalled and used to calculate the Character Error Rate (CER): CER={S-^I + D)/N where N is the total number of characters in the presented text. To calculate the errors, the two phrases are aligned by the use of a minimum string distance (MSD) algorithm that generates a set of optimal alignments (those which result in the least error rate) between the two text strings [MacKenzie & Soukoreff 2002a]. An example is shown here: PT = The cat jumped over the moon TT = Then cat jumpd over he moon The MSD in this case is 3 and there is one optimal alignment which is: PT = The- cat jumped over the moon TT = Then cat jump-d over -he moon
23
24
Janet C Read
Once the optimal alignments are generated, it is possible to identify the individual errors by inspecting the two text strings. In this example there is an insertion after The (shown by a dash in the PT), a deletion after jump (shown by a dash in the TT and a deletion after o v e r (also seen in the TT) resulting in an error rate of 3/23 or 13%. As these alignments, and the resulting error rates, can be generated automatically the character error rate metric is an attractive choice for researchers [MacKenzie & Soukoreff 2002a]. Reported error rates for pen-based input devices vary according to the type of writing that is supported; a study by MacKenzie & Chang [1999] tested error rates with 32 subjects copying words of discrete characters onto a tablet, using a constrained grid and reported error rates of between 7% and 13%. Prankish et al. [1995] reported error rates for free form text (natural text) that averaged 13%, and fell to 9% when only lower case letters were used. The efficiency of text input is normally measured in characters per second or words per minute, and user opinions are obtained by asking the users for their views or by observing them as they use the technology.
3
Empirical Study
The study that is described here compared three methods for text input using digital ink technologies. The three methods were handwriting with a stylus on a Tablet PC, handwriting with a graphics tablet and pen on a standard PC, and handwriting with a digital pen on digital paper. The focus in the study was on the usefulness of the technologies with the assumption that the writing created on them would be required later in some ASCII form; therefore, recognition rates were important. It was hoped that the study would identify whether or not the technologies were useful, whether or not children of both ages could use the technologies and also to find out how recognition rates improved between the Tablet PC and the Wacom tablet and laptop presentation. The study was carried out over two sessions. The first session involved 15 children aged seven and eight; the second session was for a group of 25 twelve and thirteen year old children. The organization of both sessions was identical; the description that follows applies therefore to both groups of participants.
3.1
Apparatus
The apparatus that was used varied for the two sessions. In the session with the younger children, the children used either a Tablet PC (as shown in Figure 3) or a Digital Pen (as shown in Figure 4), hereafter referred to as the primary technologies. In the session with the older children, the children were also directed to one of these two technologies but were subsequently given the opportunity to use a Laptop PC with graphics tablet (referred to later as the secondary technology). This option was not offered to the younger age group as the author had used this extensively with that age group and was aware of the usability and the expected recognition rates for this product. The decision to not offer it to the younger children was also taken with the intention of improving the efficiency of the experiment given that children took quite a long time doing the experimental tasks.
The Usability of Digital Ink Technologies for Children and Teenagers
25
Figure 3: Child using the Tablet PC.
Figure 4: Child using the digital pen.
The Tablet PC that was used was a Toshiba Portege and this was used with Calligrapher handwriting recognition. The digital pen that was used was a Logitech USB pen and this was used with the digital paper notebook that was supplied with the pen. The writing appeared on the notebook, just as if it had been written with a biro. The writing from the digital pen was uploaded to a laptop once it was written and it was recognized by the software that was supplied with the MyScript software that supported the pen application. The standard PC was a Hi Grade Notino laptop with a Wacom graphics tablet attached at the USB port. The children wrote on the graphics tablet and their writing was displayed on the laptop screen. The writing was not visible on the graphics tablet. The recognition software that was used was the Calligrapher software, which was the same as that used in the tablet application.
3.2
Procedure
The experiments took place on a single day in a laboratory setting at the University. The younger children carried out the work in the morning, the older children in the afternoon. The children that took part in the experiments came from two local schools and were convenience samples in as much as they were from classes that the schools had chosen to bring to the experiment following a request from the researcher.
26
Janet C Read
Figure 5: Smileyometer used to rate the Applications.
The children entered the room in small groups and were brought to a table where the researcher allocated them to one of the two primary technologies. Before the children used the technologies, they were given an explanation of how they worked and were also told what the purpose of the study was. As the children completed each application, they rated their experience using a Smileyometer [Read et al. 2002] — see Figure 5.
3.3
Design
This was an exploratory study, designed to establish how usable the technologies were, whether or not the children would use them (given a choice) and what recognition rates could be expected for these technologies. 33,1 Design of the Text Phrases The children were presented with a single A4 sheet of text phrases for copying into the technology. These phrases had been taken from the text phrases published by MacKenzie & Soukoreff [2003], and were selected on the basis of their word familiarity for the younger children and for easy spellings. Both groups saw the same phrases. The phrases were displayed in a size 16 comic sans serif font with fivt phrases on each side of the paper. The phrases that were presented on the first side of the paper were: My watch fell in the water Time to go shopping You must be getting old The world is a stage Do not say anything The phrases that were on the rear of the paper were: Are you talking to me You are very smart All work and no play Did you have a good time Play it again Sam The order of the presentation of the first five phrases and of the second five phrases was different for each event; this meant that although the phrases followed one another in sequence, the first phrase that was written differed across the technologies and across the children. For instance. My watch fell in the water appeared either 1st, 2nd, 3rd, 4th or 5th. The researcher ensured that the presentation
The Usability of Digital Ink Technologies for Children and Teenagers
Number Average SD
Young Children
Older Children
8 0.181 0.125
11 0.072 0.082
27
Table 1: The error rates for the digital pens.
of these phrase sets was arranged to minimize the effect of learning on the recorded recognition rates and to provide a reliable set of results. 3.3.2 Design of the Interfaces The three technologies were presented in different ways. The digital pens were placed on a table and the children wrote with them into the digital paper notebooks that had been provided with the pens. These were A4 size, spiral bound and presented in a portrait layout. The Tablet PC was used with an experimental interface that was identical to the one on the laptop PC. This interface gave the children a space to write and when they were ready it displayed the results of the recognition process to them. They then cleared the interface and wrote their next phrases. At the end of the session, the digital ink from the pens was uploaded to the computer and recognized by the Logitech notes software. The writing on the Tablet PC and the laptop was recognized using calligrapher software. Both types of recognition software utilized a standard dictionary. 3.3.3 Design of the Evaluation Sheet An evaluation sheet was presented to the children after they had used the technologies. This required the children to give a rating for each of the technologies that they used.
3.4
Analysis
There were two analysis processes. The text that was generated from the recognition activities was aligned to the text that was copied by using an MSD algorithm, and a character error rate was derived as explained in Section 2.2. The ratings from the children with respect to the technologies were given numerical scores from 1 (awful) to 5 (brilliant).
3.5
Results
Not all the children wrote all write all ten phrases at each technology, but all completed the first five phrases. Because of this, the numerical results that are presented here only represent the error rates from thesefirstfivephrases. Optimally, these represent 88 characters; some children wrote less than 88 characters as they missed out letters or words, and some added letters or words to end up with more than 88 characters. The error rates were all measured against 88 characters; the implication of this is discussed later in the paper. Table 1 shows the error rate statistics forfirstfivephrases written on the digital pens.
28
Janet C Read Tablet PC Young Children Older Children
Wacom and Laptop Older Children
12 0.156 0.113
10 0.193 0.139
Number Average SD
7 0.170 0.118
Table 2: Error rates for the Tablet PC and the Wacom Tablet.
Digital Pen Error Rates
<10%
10%-20%
>20%
Error Rates
I Young Children • Teenagers
Figure 6: Error rate distribution for digital pens.
The error rates for the first five phrases on the Tablet PCs (used by both sets of children) and the first five phrases on the Wacom and laptop (used only by the older children) are presented in Table 2 — Error Rates for the Tablet PC and the Wacom Tablet. There is a significant difference (tn = 2.41, p < 0.05, two-tailed) between the results for the younger children and the older children in the error rates for the digital pens. Summary data from their writing is shown in Figure 6 where it can be seen that for many of the older children, error rates were very low; in fact, three children produced work that resulted in no recognition errors. There was not a significant difference for the error rates between the two user groups when the Tablet PC was being used (distribution shown in Figure 7), but for the older children, the results between the Tablet PC and the Digital Pens were significantly different, t2\ = 2.17, p < 0.05, two tailed). The average preference scores for each technology are shown in Table 3. The results for the digital pens are particularly interesting as there is a significant difference (p < 0.05) between the recognition rates for the younger and the older children. As shown in Figure 6, very few of the younger children had well recognized writing. For one of the younger children, a portion of writing was not captured even though it was clearly seen in the notebook. The reason for this was not discovered but it may be that the way the child held the pen interfered with its operation.
The Usability of Digital Ink Technologies for Children and Teenagers
29
Tablet PC Error Rates
<10%
10%-20%
I Young Children • Teenagers
Error Rates
Figure 7: Error rate distribution for Tablet PC.
Digital Pen
Tablet PC
Laptop PC
4.182
4.429 4.273
N/A 3.417
Young Children Older Children
3.733
Table 3: Average preference scores for each technology.
It is interesting to note that there is not a significant difference between the writing at the tablet for the younger and older children, neither was there any significant difference between the tablet and the Wacom for the older children. The Tablet PC was generally preferred by the children, but the digital pen also gained a high score for user choice, especially from the younger children (who had had relative success with that technology).
4 Findings from the Work The findings from the work are considered in three sections; the first looks at the usefulness of the technologies, the second at how usable the technologies were and the third explores whether or not the children would use them.
4.1 How Useful was the Technology? Recognition Rates Revisited The major determinant of usefulness in this study was the recognition accuracy of the process. The accuracy rates seemed quite high in some instances (older children using the digital pens for example) and even with the tablet technologies the recognition rates are reasonable when compared to other similar studies [Read et al. 2003]. It may be that with real use (i.e. composed text), these recognition rates would be higher; in the studies reported here, the children copied phrases rather than composed their own words and in Read et al. [2004] it was shown that copied text resulted in more errors than composed text. The reason for there being a difference between copied and composed text is partially explained as follows. Figure 8 shows the four text strings, IT, WT, PT and TT, that are present in a recognition process.
30
Janet C Read
Figure 8: The text strings in a system.
L ^ p e of error Wrote in text speak Spellings incorrect Missed words Substituted words
Number of instances 3 5 2 1
Table 4: Reasons for variance in intended and written text.
The first string (IT) is the 'intended text' and this may have been presented to the users for subsequent copying or may be thought text that exists only in the users head. This text is then written by the user to create a second text string that is the written text (WT). When the user is composing text (rather than copying), this written text is inspected and interpreted for use as the presented text (PT). When text has been copied, the intended text is generally, but not always, used as the presented text (PT). In the example shown in Figure 8, it can be seen that the user intended to write w r i t e t h i s down but in fact did not write the e in w r i t e and so using the intended text as the presented text (as was done in this experiment reported here) will result in a worse recognition rate (1 substitution, 1 deletion) being recorded than perhaps should have been (1 substitution). To determine the size of the effect of using the intended text as the presented text, the writing that the children did using the digital pens was investigated for those cases when the intended and the written text varied. This investigation included a
The Usability of Digital Ink Technologies for Children and Teenagers
Figure 9: Writing displaying personalization.
look at the second five phrases (not included in the summary data in the results). The reasons for the intended and written text being different are summarized in Figure 4. The child that wrote in text speak used r u t a l k i n g 2 me instead of a r e you t a l k i n g t o me (this resulted in three instances of text speak and one spelling, which it could be argued was also a text speak!). This would result in an almost 33% error rate even before a single character was recognized. Incorrect spellings and substituted words can have varying effects on recognition rates depending on the distance from the intended text. Missed words are generally small words and their impact is often small; in this study words that were missed were both unimportant words, an a and an i s .
4.2 How Usable was the Technology? Errors Examined All the children were able to quickly use the technologies presented to them and the technologies were all suitable for the task. There were a couple of instances where the child needed some assistance, three of the younger children needed to be shown how to write on the Tablet PC and four of the older children needed help with the Wacom tablet but this they provided for each other as a number of these children had used Wacom tablets and pens in their artwork at school. With the digital pens, aside from the problem with the pen not capturing the digital ink, the most common error was with children starting too near the top of the page. This caused poor (or absent) recognition of the first phrase and happened with three of the children (all in the younger group). One boy wrote all his phrases with the book upside down and these were subsequently not recognized at all by the software — his results were not included in the summary in Table 1. All three technologies supported the children's individuality as they allowed for different writing styles, but in some instances, this reduced recognition. Figure 9 shows an example from one girls writing that clearly shows how she embellished her writing with irregular descenders and circles in place of dots. Remarkably, this writing was recognized quite well, probably due to the fact that the embellishments were added as she wrote; children that added embellishments after they had written created more problems for the recognition process. It is clear when looking at the writing in this form (as shown in Figure 9) that children, (and in this small experiment, this was notably the older girls) see their writing as both individual and as an artistic product, seven of the girls in the study wrote with embellishments. The conversion of this writing into ASCII text seems to be almost an act of vandalism and so the possibilities for manipulation of digital ink, especially for this user group, are worth further exploration.
31
32
Janet C Read \
Figure 10: Digital doodling.
4.3 Would the Technology be Used? Digital Doodles The high ratings that the children gave to the technologies suggest that were they available to children in schools these digital ink tools might be used. For users with high levels of discretion (and children are such a user group) technology is only adopted if it makes things easier, faster, or more fun than the present alternative. One particular aspect that was seen in the work of this study was the interplay between art and writing, especially with the older children, and this cannot be easily enabled in a QWERTY writing environment. Pen and paper provides a very creative medium that children explore from an early age. When they are very young, they use drawings to express ideas and convey meaning but as age, they draw less and write more [Kress 1997]. During the teenage years it is common for children to add art forms to their writing and to their writing artefacts (books etc.) in the form of doodles. In a small investigation of the prevalence of doodles among older children, the researcher found that over 85% of children of this age added doodles to over 50% of their standard pen outputs. These doodle behaviours are enabled by digital ink; an example from the work of this study can be seen in Figure 10 which shows how one child added her own symbols to her writing. It is perhaps unsurprising that digital doodling might happen with pen technologies, as the nature of the pen is very different from the nature of the keyboard. Pens are used for both art and writing whereas the keyboard is simply a text creation tool. Microsoft have recently acknowledge this by providing a pen writing space to their recent MSN chat application. The older children saw the potential in the technology; one child remarked that she could write letters in secret using the digital pen, destroying the paper version, but keeping the digital version safe in the technology, another suggested that the pen could double as a mobile phone and be used to store everything! One challenge for digital ink recognition technology is to be able to discriminate between doodles and writing so that only the writing is recognized. In the study described here, the recognition software that was supplied with the digital pens coped well with doodles but the software on the Tablet PC tried to recognize the drawings (and failed!).
5 Further Work These results indicate that children can use both the novel technologies of digital pens and tablet PCs. The results show that when children copy text into these technologies, recognition rates of around 80% can be expected for most children, but these may be higher for composed rather than copied text. The results for the
The Usability of Digital Ink Technologies for Children and Teenagers
33
younger children using Tablet PCs (average error rate 17%) compare favourably with the results reported for children using Wacom tablets and PCs (average error rate 34%) [Read et al. 2004] and suggest that there is a measurable improvement when the problem of separation between writing surface and screen is removed. The children using the technologies in this study were all enthusiastic and the older children were keen to offer suggestions for the possibilities for the technology use in the classroom. The involvement of older children in the envisioning and testing of future technologies is an area that is worth further investigation. Observing the author writing this paper using a QWERTY keyboard, one teenager remarked 'just think, in about ten years time someone will invent ink and say ' Hey that's a good idea you can use it with paper and stuff". The author intends to carry out further work with digital text and digital doodling for older children. This work will focus on personalization of text, both presented as digital ink and as ASCII representations. Other work will determine the recognition rates that might be possible for composed text and a longitudinal study of the usability of digital pens.
Acknowledgements The author wishes to acknowledge the co-operation and assistance of the pupils and teachers from English Martyrs Junior school and Archbishop Temple High school.
References Bekker, M. M., Markopoulos, P. & Kersten-Tsikalkina, M. [2002], Interaction Design and Children, Shaker Publishing. Carroll, J. M. [2002], Human-Computer Interaction in the New Millenium, Addison-Wesley. Cooper, W. E. (ed.) [1983], Cognitive Aspects of Skilled Typewriting, Springer-Verlag. Day, J. [1994], Is Good Looking Writing Good Writing?, in C. Singleton (ed.). Computers and Dyslexia: Educational Applications of New Technology, Dyslexia Computer Resource Centre, University of Hull, pp.26-36. Dix, A., Finlay, J., Abowd, G. D. & Beale, R. [2003], Human-Computer Interaction, third edition, Prentice-Hall. Drain, A. [2002], The Role of Children in the Design of New Technology, Behaviour & Information Technology 21(1), 1-25. Drain, A. (ed.) [1999], The Design of Children's Technology, Morgan-Kaufmann. Prankish, C , Hull, R. & Morgan, P. [1995], Recognition Accuracy and User Acceptance of Pen Interfaces, in I. Katz, R. Mack, L. Marks, M. B. Rosson & J. Nielsen (eds.). Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHr95), ACM Press, pp.503-10. Frye, D. & Soloway, E. [1987], Interface Design: A Neglected Issue In Educational Software, in J. M. Carroll & P P Tanner (eds.). Proceedings ofSIGCHI/GI Conference on Human Factors in Computing Systems and Graphics Interface (CHI-\-GI'87), ACM Press, pp.93-7.
34
Janet C Read
Garvey, J. [2002], Authenticity, Modelling and Style: Writing and ICT, in M. Williams (ed.), Unlocking Writing, David Fulton Publishers, pp.77-91. Hartley, J., Sotto, E. & Pennebaker, J. [2003], Speaking versus Typing: A Case-study of the Effects of Using Voice — Recognition Software on Academic Correspondence, British Journal of Educational Technology 34(1), 5-16. Huckvale, M. [1994], Purpose: The Missing Link in Speech and Handwriting Recognition, Paper presented at the AISB Workshop on Computational Linguistics for Speech and Handwriting Recognition. http://http://www.phon.ucl.ac.uk/home/mark/papers/spwrite.htm (last accessed 2005-06-07). Kafai, Y. B. (ed.) [1990], From Barbie to Mortal Kombat, Gender and Computer Games, MIT Press. Kress, G. [1997], Before Writing — Rethinking the Paths to Literacy, Routledge. MacKenzie, I. S. & Chang, L. [1999], A Performance Comparison of Two Handwriting Recognizers, Interacting with Computers 11(3), 283-97. MacKenzie, I. S. & Soukoreff, R. W. [2002a], A Character-level Error Analysis for Evaluating Text Entry Methods, in O. W. Bertelsen, S. B0dker & K. Kuuti (eds.), Proceedings ofNordiCHI2002, ACM Press, pp.241^. MacKenzie, I. S. & Soukoreff, R. W. [2002b], Text Entry for Mobile Computing: Models and Methods, Theory and Practice, Human-Computer Interaction 17(2), 147-98. MacKenzie, I. S. & Soukoreff, R. W [2003], Phrase Sets for Evaluating Text Entry Techniques, in G. Cockton, P. Korhonen, E. Bergman, S. Bjork, P. CoUings, A. Dey, S. Draper, J. Gulliksen, T. Keinonen, J. Lazar, A. Lund, R. Molich, K. Nakakoji, L. Nigay, R. Oliveira Prates, J. Rieman & C. Snyder (eds.), CHI'03 Extended Abstracts of the Conference on Human Factors in Computing Systems, ACM Press, pp.754-5. Moseley, D., Higgins, S., Bramald, R., Hardman, P., Miller, J., Mroz, M., Tse, H., Newton, D., Thompson, I., Williamson, J., Halligan, J. & Bramald, P [1999], Ways forward with ICT: Effective Pedagogy using Information and Communication Technology for Literacy and Numeracy in Primary Schools, Technical Report, Newcastle University. Norman, D. A. & Fisher, D. [1982], Why Alphabetic Keyboards Are Not Easy To Use: Keyboard Layout Doesn't Much Matter, Human Factors 24(5), 509-15. Palmer, S. [1993], Does Computer Use Put Children's Vision at Risk?, Journal of Research and Development in Education 26(2), 59-65. Plamondon, R. & Srihari, S. N. [2000], On-line and Off-line Handwriting Recognition: A Comprehensive Survey, IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1), 63-84. Read, J. C. [2005], The ABC of CCI, Interfaces 62, 8-9. Read, J. C, MacFarlane, S. J. & Casey, C. [2001], Measuring the Usability of Text Input Methods for Children, in A. Blandford, J. Vanderdonckt & P. Gray (eds.). People and Computers XV: Interaction without Frontiers (Joint Proceedings ofHCIlOOl and IHM200I), Springer-Verlag, pp.559-72.
The Usability of Digital Ink Technologies for Children and Teenagers
35
Read, J. C , MacFarlane, S. J. & Casey, C. [2003], A Comparison of Two On-line Handwriting Recognition Methods for Unconstrained Text Entry by Children, in R Gray, P. Johnson & E. O'Neill (eds.), Proceedings of HCI'03: Volume 2, Research Press International for British Computer Society, pp.29-32. Read, J. C , MacFarlane, S. J. & Horton, M. [2004], The Usabihty of Handwriting Recognition for Writing in the Primary Classroom, in S. Fincher, P. Markopoulos, D. Moore & R. Ruddle (eds.). People and Computers XVIII: Designing for Life (Proceedings of HCr04), Springer, pp. 135-50. Read, J., MacFarlane, S. & Casey, C. [2002], Endurability, Engagement and Expectations: Measuring Children's Fun, in M. M. Bekker, P. Markopoulos & M. Kersten-Tsikalkina (eds.), Interaction Design and Children, Shaker Publishing, pp. 189-98. Solomon, C. [1978], Teaching Young Children to Program in a LOGO Turtle Computer Culture, ACM SIGCUE Outlook 12(3), 20-9. Tappert, C. C , Suen, C. Y. & Wakahara, T. [1990], The State of the Art in On-line Handwriting Recognition, IEEE Transactions on Pattern Analysis arui Machine Intelligence 12(8), 787-808. Thelen, E. [1996], Motor Development, American Psychologist 51(11), 1134-52. Turrell, G. [1999], Email — Punching Holes in Classroom Walls, in R. Selwyn & R. Dick (eds.), MAPE Focus on Communications, MAPE Publications, pp.Section C, 1. Weikart, P S. [1995], Purposeful Movement: Have We Overiooked the Base?, Early Childhood Connections: The Journal for Music and Movement-based Learning 1(4), 6-15.
PROTEUS: Artefact-driven Constructionist Assessment within Tablet PC-based Low-fidelity Prototyping Dean Mohamedally, Panayiotis Zaphiris & Helen Petrie Centre for HCI Design, City University Square, London ECIVOHB, UK Email: fcp496,zaphiri,
London,
Northampton
hlpetri} @ soL city, ac. uk
Low-fidelity prototyping is a widely used HCI knowledge elicitation technique. However, empirical evaluation methods for low-fidelity prototyping have remained relatively static even with the development and use of software prototyping tools. In this paper, we describe a framework based on constructionism theory to model design artefacts as measurable constructs within low-fidelity prototypes. This provides a novel approach to acquiring further cognitive user metrics within software based low-fidelity prototyping in the HCI domain. We describe two mobile software tools, PROTEUS and PROTEUS EVALUATOR, developed for the Tablet PC platform, which use our framework to aid our understanding of prototypes during their temporal construction. Results of using the tools in two scenario experiments are reported, each conducted with 40 HCI postgraduate students. Keywords: low-fidelity prototyping, constructionism, knowledge elicitation. Tablet PC software, HCI software tools.
1 Introduction Knowledge elicitation methods in HCI are a critical function to the success of requirements and design gathering stages [Maiden et al. 1995], usability testing and user evaluation stages of software development [Zaphiris & Kurniawan 2001]. Examples of this start with initial questionnaire feedback, requirements task walkthroughs, interviews techniques, and focus group debates. It can rapidly scale
38
Dean Mohamedally, Panayiotis Zaphiris & Helen Petrie
upwards to more complex psychometric and design and evaluation processes such as various fidelities of prototype construction, direct and indirect observation practices for monitoring user actions and response time comparisons, and methods for eliciting mental categorization models e.g. in distinguishing expert and non-expert technology usage patterns. A wide variety of tested and proven experimental user-based techniques exist for practitioners [Burge 2001] to utilize. However, as HCI specialists will know from experience, knowledge acquisition and analysis of data from traditional user-based methods is time consuming and usually requires experts in their respective fields. As Kidd [1987] defined, knowledge acquisition of experts involves the following processes: 1. Deploying a technique to elicit data from the expert users. 2. Interpreting verbal data and infer the underlying knowledge and reasoning of the users. 3. Utilising this interpretation to construct a model or language that exemplifies the user's knowledge and performance. 4. Interpreting further data by an iteratively evolving model until the knowledge domains are complete. 5. The principle focus for the knowledge acquisition team should be in constructing models, in domain definition, or problem identification and problem analysis. For HCI practitioners working as part of development teams whereby their results can lead to significant changes in design, it is important to define and incur the highest quality of empirical data captured. By adopting digital processes, analysis of such data can be enhanced with software tools that incur faster data acquisition and processing times than humanly possible, along with large data storage and retrieval capabilities. Digital tools can therefore raise the quality of user centred knowledge elicitation and analysis. In this paper we present an approach to acquiring further cognitive user metrics within low fidelity prototyping in the HCI domain, through the use of software tools. This paper continues as follows; in Section 2 we briefly describe low-fidelity prototyping and current software tools that are widely used. Section 3 presents a background to existing constructionist methodology, for the reader to grasp the dynamics of how constructionism links to design artefacts. As part of our framework, constructionist metrics and event patterns over the timeline history of prototyping construction are also proposed. In Section 4 we describe the iterative design and development of our tools to facilitate our framework. Section 5 outlines our experiment scenarios to validate our framework. Section 6 goes on to describe the findings of our scenario tests in order to validate our framework. Section 7 discusses our participatory design (PD) sessions to evaluate and improve our software tools, for use by HCI practitioners and educators, andfinallySection 8 concludes our work with suggestions for future research in this area.
2 Low-Fidelity Prototyping The practice of low-fidelity prototyping in HCI uses simple materials and equipment to create a paper-based simulation of views to an interface or system with the aim of exploring early user requirements and visualizing layout, accessibility and potential aesthetic approval of design ideas. Over the years, strategies and uses of prototyping methods [Hardgrave & Wilson 1994] have grown to become a key asset in the HCI toolkit. With traditional paper and pen based approaches, it is common to denote features of a user interface with visual artefacts metaphorically described on paper, e.g. menu bars with triangles on either end, or rectangular buttons for actions. If it is being constructed with movable separate pieces of paper these artefacts allow members of a prototyping team to interact with it and easily reach a consensus on the effectiveness of position, size and purpose. It is also common to label features and visual artefacts, with annotation descriptions of their purposes and links to other artefacts. There are numerous software painting and drawing programs available such as Windows Paint, Macromedia Flash, Adobe Illustrator, Microsoft Visio and PowerPoint, and the GIMP, to name a few in no order of preference. These artistic and diagrammatic tools can be utilized for low-fidelity prototyping and sketching of user interface designs. Recently, more HCI practitioner-orientated low-fidelity prototyping tools have been developed, including software from the GUIR team at Berkeley [Walker et al. 2002] which has produced Denim and Silk to facilitate prototyping of early stage website design. Denim allows low fidelity prototype sketches of website designs to be 'run' associated with hyperlink navigation to other prototype sketches akin to storyboarding. Also their Suede system [Klemmer et al. 2000] is a powerful speech based Wizard of Oz Prototyping tool based on speech dictation interfaces.
3 Constructionism in User-Centred HCI 3.1 Constructionist based Artefacts Modelling Constructivist learning theory [Piaget 1973; Vygotsky 1978] argues that knowledge is not just transmitted, but is constructed. Thus we refer to the construction of new knowledge by learners themselves with sensory information and the behaviour of self-constructed knowledge that is built up through experience [Jonassen 1994]. This theory branched in the form of constructionism. Constructionism [Papert 1991; Resnick 1996] in the HCI-applicable sense is an epistemological view concerned with the reinforcement of existing user knowledge and creation of new knowledge. This is critically achieved through the use of tangible artefacts and metaphors that users can affiliate with from their sensory information and their past experience and intuitions. Several knowledge elicitation techniques in HCI can be argued to elicit usercentred data through the use of tangible artefacts. Low-fidelity prototyping is noted, by use of paper materials and sketching individual artefacts. In addition; card sorting, affinity diagramming, brain storming and perhaps others are also constructionist. They reinforce and create new user-centred knowledge domains through iterations
40
Dean Mohamedally, Panayiotis Zaphiris & Helen Petrie Linear Approaches
Non-Linear Approaches
Figure 1; Linear and Non-linear approaches in artefact-driven construction.
of activity, consensus and refinement of tangible and metaphorically identifiable artefacts e.g. cards and physical objects. These new knowledge domains are created by the users themselves constructing visibly representative artefacts during the knowledge elicitation activity. They may use what they create to further define new artefacts or redefine existing ones, and so forth. This chaining effect has already been described previously as part of the principles within Activity Theory [Leont'ev 1978; Vygotsky 1978]. In simplest terminology, 'activity' is defined as "the engagement of a subject towards a certain goal or objective" [Luria 1981; Ryder 1998]. Vygotsky contributed to Activity Theory by describing activity mediated through artefacts. In general, artefacts are both a set of constructed initial activities but they can also be a product of an activity, and can be modified throughout the timeline of an activity. As Bertelsen [2000] denotes "Using Star's [1989] terminology, design artefacts are boundary objects because they adapt to different situations of application and at the same time maintain identity, thereby mediating divergent needs and viewpoints." Bannon & B0dker [1991] describe this format of mediation as a critical part to understanding artefacts and distinguishing them from each other. Beguin & Rabardel [2000] similarly uses this idea of mediation to explain the artefacts within the cycle of construction as a combined result of generative activity, mediation and refinement stages.
3.2
Constructionist Metrics
Here we describe an internal cognitive design cycle that demonstrates how a single artefact is created from several key stages (Figure 1): 1. Decisions (Generative Activity and first innovation). 2. Mediation (Backtracking, pausing for reflection). 3. Refinement (Assessment and innovation, leading to modifications). Upon refinement several artefacts can become reinforced by further Decision stages, leading to subsequent branching of Mediation and Refinement within (recursive constructions). Decision making as an activity can thus branch into 3 dimensions:
PROTEUS: Artefact-driven Constructionist Assessment. . . 1 Name 1 Type
Value
Explanation
Vldi Tb
total time total time
shorter time longer time
Aa
addition
Ab
addition
Ac
addition
Ad
addition
Ma
modifications
Mb
modifications
lots, in a short time few, in a short time lots, in a long time few, in a long time few, in a short time lots, in a short time
confident but lacking mediation or in a hurry strong mediation but not necessarily confident (could be indecisive) strong confidence, instinct and implies using personal domains of knowledge not confident at task, relies on mediation
Da Db
deletions deletions
few lots
AM
additions and modifier pair modifiers and deletion pair
lots
MD AD
additions and deletion pair
lots lots
41
strong confidence, attention to detail (methodological approach) not confident at task, doesn't rely on mediation strong confidence, weak mediation (possibly pre-final refinement) mediation and refinement stages, either indecisive or debating, strong output on agreement consensus | confident in output either non-confident, or understanding/ expertise is being corrected under mediation | atomic expertise — strong sense of refinement /perfectionist | suggests mediation and resolution towards positive refinement stage | either non-confident, or understanding/ expertise is being corrected under mediation |
Table 1: Proposed event patterns within an internal design construct.
• Addition (First Set). • Modification (Mutate or Get and Set). • Removal (Delete). In addition to these, a fourth variable exists, a 'Mediation Point' which we can describe as a point in time when a generative activity (Decision) halts for an arbitrary period (like a rest), and then continues onwards in the timeline with mediation and refinement either leading on to a new artefacts construction or to modification of the existing. This mediation point is important to us to distinguish sums of artefacts from a single artefact in a construct. For example, sketching a prototype view of a DVD movie menu interface may show one artefact collection as a navigation block which has icons, labels and a button style; a mediation point will separate this as one artefact before the user has considered a next artefact to be created e.g. a background menu image. Thus several events within a generative activity can become measurable either on their own or as clusters. In Table 1 we describe several possible event patterns as
42
Dean Mohamedally, Panayiotis Zaphiris & Helen Petrie
part of our framework to describe artefact driven constructionism within the temporal view of generative activities.
4 Design and Development of Framework Tools There is a depth of user knowledge beyond which paper based methods can acquire that software tools can assist with. Existing desktop tools provide assistance to this. However in addition, mobile devices e.g. PDAs and Tablet PCs are already becoming "part of the HCI practitioner's toolkit" — expensive toys yes, but they are slowly becoming more prevalent for multiple serious uses by HCI practitioners and educators, especially when on-demand and on-site activities require it. Therefore our approach is from the ground up designed to augment them with the techniques in software using state of the art capabilities, and enabling such techniques to be effectively taken to client-side location domains, store and process client-side HCI data efficiently. We designed and developed two software tools with the aim of: 1. providing as close a simulacra as possible to the existing practical methodology of low fidelity prototyping in software; and 2. provide a technique for electronically automating the evaluation of our framework with the software solution provided in (1).
4.1 Software Development Platform As a visually rich representation of low fidelity prototyping, the form of digital inking meant the need for a visually rich and high resolution interface, with strong mimicking of the existing interaction level with novel advantages in being able to maintain portability for on-site uses; a key advantage in updating current-day HCI knowledge elicitation methods given the nature of wireless networking and mobile computing options. The Tablet PC platform from Microsoft was specifically chosen as the mobile platform to host our framework in the form of the PROTEUS tool (P/?Orotyping Environment for f/ser-interface Studies), with a second tool, PROTEUS EVALUATOR, to assist in the analysing of artefact creation over time within PROTEUS based low fidelity designs. The Microsoft Tablet PC Software Development Kit (SDK) with Visual Studio.Net 2003 provides rich API libraries to developing for new pen interactivity models, post normal laptop usability patterns. As a hybrid mobile device between PDA and PC, a Tablet PC's inking facilities in particular feature well with pressure sensitivity in onscreen pen motion, pen gestures for user metaphor based event firing and real-time recognition of handwriting on visual user interface components.
4.2 Expert Iterative Design Participatory design (PD) sessions with 4 HCI practitioners were conducted for the development of the software tools, firstly with a pre-understanding survey to elicit requirements and request for features by priority. After debate it was clearly understood and agreed that standard paper and software approaches are great for eliciting basic informal requirements. However, it was also understood that being
Figure 2: Expert prototyping design session of the PROTEUS tool.
Figure 3: Example lowfidelitypaper prototype of the PROTEUS user interface.
able to visualize a design as changes over time and seeing decisions through the constructionism model as artefacts would reveal user knowledge and cognitive abilities that are not otherwise easily and conveniently acquired directly. The practitioners were introduced to the concept of Tablet PC-based pen gesturing actions and onscreen handwriting recognition as a potential interface to the tool. They were also introduced to the notion of temporal analysis capturing key design artefacts being created, as per requirements of our framework. Using these requirements attributes, low fidelity paper prototypes were then created by the practitioners to elicit potential user interface designs (Figures 2 & 3) which aided us with consistency of options, navigation of incorporated tools, and investigate potential routes for minimizing user actions. The interaction model mimics the physically tangible model of paper based prototyping as close as possible; e.g. pen sensitive drawing/selecting/erasing (pen
Dean Mohamedally, Panayiotis Zaphiris <Sc Helen Petrie
44 b?.s?^l
\ rx
\J\ci9.
iJi::!.^..-.....l~E.C?Li8u^..J
Figure 4: PROTEUS with a lowfidelityprototype of a website design scenario.
down with depth for drawing), auto-selecting connected elements (pen double click motion), picking up and moving visual elements (pen drag motion on selected elements), and quick erasing (pen eraser button). These digital pen actions have the added advantage of providing a resource of data for digitally logging all motions and actions, and record-keeping of the artefact formation in progress, as well as being fast and convenient for using a pen in a onehanded mobile or stationed environment. It also enables us to calculate in software arbitrary mediation point delays between artefacts, originally set at a default of 10 seconds. Thus after 10 seconds of inactive use, further constructs are considered to be a new artefact, mediation or a refinement. The uses of this interaction model were explored further in individual interviews with the practitioners to gain solely their personal opinions and feature requests. This enabled us to inquire additional requirements which were agreed during a follow-up focus group session as well as (after consensus of the four HCI experts) removing and minimizing less used and potentially obstructive features. Post evaluations of the tools were conducted with design practitioners through a number of postquestionnaires which found that the users were happy for the tools to be deployed in HCI scenarios. PROTEUS version 1.0 (Figure 4) simulates the actions of a low-fidelity paper prototype being constructed with the addition of all user events being recorded, including every pen stroke and user interface choice via SDK QUID calls. Using this data it constructs temporal roll-back views of the prototypes creation so that every action of manipulation of the virtual paper prototype can be evaluated at a later date to elicit potential weaknesses or strengths at prior stages of the prototype design process. The time indexed ink-encoded GIF file output (serialized from the Tablet PC SDK) can be shared with others and imported into existing designs as prototype
11/11/Z004 lfi:I3:22<56536 ^ T1/ll/Z0(MTie:t3:5i<56e67byr f 11/11/2004 ie:1*«<7t13?b^ : 11/11/aHM 1615:38 p7361b>^ Wi : 11/11/200411:16:52 (8738t by^ m '•• 11/11/M«M1S:t7:4«(B7>«3r " ^
.^^1 ^
;::;?:;•;;>:•: S:x;i::ft^^^^^
3^
J ^
£fe4^„
IliiiiiliiM^
Figure 5: PROTEUS EVALUATOR Tool analyses prototype artefacts over time.
element templates. This is in addition to applying now standard ink manipulation and interaction modifiers such as selection, scaling and moving of ink strokes and collections of strokes, applying transparency, and colouring to highlight and distinguish artefacts, and page zooming for refining ink details. Ink-encoded GIFs, which are serialized by the Tablet PC SDK, retain their added editable information including their time stamps, even though they can be read by any graphical image editor and Web-browser supporting the standard GIF file format. This makes them very useful for sharing prototypes quickly with others but also in maintaining the integrity of editable features with PROTEUS users. All activity in the form of decisions are tracked and can be rolled back to prior times, e.g. to compare what users activities occurred in the decision making of a group of artefacts at different temporal instances via the PROTEUS EVALUATOR tool. This allows the practitioner to review the mediation point stages such as those leading to mediation and refinement facilitation (Figure 5). This interactive reviewing method was requested as a feature for use with on-site experiment sessions, by allowing a practitioner to inquire further details in post interviews and focus groups with participants, allowing them to visually refer to any point of the original design timeline with the history of actions re-playable. Examples of this include erasing off parts to an artefact or moving artefacts around. A pre-test questionnaire and walkthrough trials were conducted with the HCI experts to: • Present them with our Tablet PC tools and enable them to the digital inking methods in their experimental practices. • Evaluate their understanding of the tools. • Engage them in contributing ideas for enhancing the scope on any further requirements for use in their field operations.
46
Dean Mohamedally, Panayiotis Zaphiris & Helen Petrie
Figure 6: PROTEUS on a Tablet PC in use during the student scenarios.
Throughout these expert trials they could raise any points of interest or complaints. Finally a post-test Quality of User Interface Satisfaction questionnaire (QUIS) Chin et al. [1988] was given to collect information about their general impressions about the tools and any modifications they thought were necessary. From the data collected, a number of key user interface issues such as menu options, accommodating appropriate pen-sized interface actions, interface terminology, button styles and integrated help requirements were modified into subsequent builds.
5 Experimental Scenario Testing In order to better understand the application of constructionist assessment in prototyping, it was decided that scenario testing of the tools and the framework would be undertaken with 40 postgraduate MSc participants recruited from an Advanced HCI class module. This would enable us to compare the constructionist framework of artefacts analysis with an existing expert HCI marking methodology. The participants were invited to utilize the PROTEUS tool in two scenarios for the design cycle of an online language learning website, and a novel interface to a train ticket machine. Working in groups of 5-7, they utilized 4 Tablet PCs (lGHz4-, each with 512Mb RAM running Windows Tablet PC 2005) in turns in a classroom location. Whilst one half of the class used the well-established paper format for one scenario, the same scenario was being completed with the software tools by the other half of the class. At the second scenario, the students switched methods from paper to software and vice versa (Figure 6). Each scenario was given 20 minutes to complete the task. Upon completion of their prototypes, the different groups were shown the others solutions to demonstrate the variety of prototyping ideas that groups can give using low fidelity prototyping in practice. For HCI lecturers, practitioners and researchers, the methodologies for evaluating practical paper and software based forms of low fidelity prototyping are fairly similar in acquiring key user requirements, eliciting more of the conceptual basis and creativity of ideas than precision in style. Understanding the user's
Use of colours and variety of pens to distinguish elements Demonstrating a sense of proportions and scale
3 4 5 6
Use of simple shapes to denote complex objects Representation consistency in reuse of shapes and colours Use of contextual language, annotation and terminologies Ease of 3rd party understanding of the users' representations
7 8 9 10
Aesthetics awareness and use of layout, usability design Ideas and innovation presented to the domain proposed Context of design and accessibility to domain proposed Overall quality of effort
mental models also gives us useful data centring on the usage of appropriate design metaphors and the achieving key tasks for functionality. Table 2 shows our department's expert assessment criteria as used to evaluate paper prototype courseworks from an Advanced HCI MSc class. These expert criteria can only determine metrics of user mental models with a final prototype design view and does not elicit qualitative measures that may have been significant within the duration of the prototype's timeline. Hence, it is this data which can be compared with that of our framework, which can give additional qualitative measures over the temporal construction of the prototype design.
6 Scenario Results Expert marking evaluations of the prototypes were conducted with two HCI practitioners marking the software based views, and a different two HCI practitioners marking the paper views, such that bias of software against paper expert comparisons could be avoided. No statistical significance was found between the expert quality marks of paper version prototypes to the expert quality marks of software version prototypes {t\\ = 1.68,p > 0.05). This indicates that using the Tablet PC software was not a negative influence on the practical methodology of low fidelity prototyping. However, a second r-test of Scenario 2's outputs alone (software vs. paper) demonstrates ^5 = 3.13, p < 0.013. This indicates the software method was preferred than the paper method and this may show that the effectiveness of using such digital tools is potentially dependent on the scenario of use. As shown in Figure 7 mean marks for the second scenario (ticket machine interface) were higher than those of the first scenario (language learning website), with inferences to using more creative scenario ideas. We suspect the variety of pen options in the software aided in eliciting this higher creativity quality. To evaluate the constructionist data, our PROTEUS EVALUATOR tool generates Excel compatible spreadsheets directly from the time encoding of events that occurred within the creation of the ink-encoded GIF files from the software sessions.
48
Dean Mohamedally, Panayiotis Zaphiris & Helen Petrie
Figure 7: Paper vs. Software expert marking for the two scenarios.
This allows us to compare electronically generated results of the software based prototypes in terms of artefact-driven confidence vs. the independent expert markers evaluations; Figure 8 shows some of the automated results of artefact-driven constructs assessment vs. expert HCI marking. The expert marks were awarded out of 100% based on the assessment criteria defined in Table 2. The use of the PROTEUS EVALUATOR tool to analyse the constructions of artefacts within the users PROTEUS based prototype designs determined several key points in our experiments: • Confident groups spent less time in refinement stages. • Low values of generative activities obviously imply either non-enthusiasm or inability to construct confident and positive design artefacts. • The average interval time between mediation points involving modifications has been found to be considerably shorter than additions and deletions, we suspect due to mediation (reflecting on choices made) and refinement (assessing possible outcomes) giving a clearer idea of what to manipulate. • Successive generative activities indicate sources of innovation. Weaker teams in terms of expert marking criteria measured longer decisionmediation-refinement cycles before considering creating new artefacts.
7 Users Evaluation of the Tool After the scenario testing, a full scale post questionnaire based on the User Satisfaction QUIS [Chin et al. 1988] questionnaire was filled by the 40 student participants. This QUIS is based on a 0 to 9 Likert scale for a variety of categories, as shown in Figure 9. Subsequently, an ease-of-use post questionnaire based on the
PROTEUS: Artefact-driven
Constructionist
49
Assessment.
a 6
0
100
200
300
400
500
600
700
800
0
100
200
Tiin«inS«c«.
100
200
300
400
500
400
500
600
700
800
ThiMinSscs.
(a) Scenario 1, Team 2: experts awarded 46%; software detected 5 mediation points, over a short time period, with httle mediation/ refinement time in between.
0
300
600
700
TinMinS«cs.
(c) Scenario 2, Team 1: experts awarded 70%; 6 mediation points detected, with little mediation time in between rapid generative activity, possibly indicating confidence in design.
(b) Scenario 1, Team 5: experts awarded 61%; 7 mediation points detected in software, with long periods of generative activity indicating thoughtful team consensus.
800
100
200
300
400
500
600
700
800
Tim* in S«cs.
(d) Scenario 2, Team 5; experts awarded 52%, 4 mediation points detected, little generative activity but long mediation, an indecisive team.
Figure 8: Artefact constructs detected with PROTEUS EVALUATOR vs. Expert marking.
50
Dean Mohamedally, Panayiotis Zaphiris & Helen Petrie
Ovaral RMcUon To
SCTMH: 7-0
Tarmlnalasy wid •yttom
LMming: 16-21
Syitem capaMMlM: 22-25
Figure 9: QUIS results on user interface satisfaction.
CSUQ questionnaire Lewis 1995 was conducted with additional questions to indicate personal preferences between software and paper methods. In both of QUIS and CSUQ questionnaires, the PROTEUS tool was rated above average in all questionnaire categories. A mean of 82% of the users stated that they found the tool was comparable, if not better than the existing paper practice. 69% of users felt that the tool gave them new capabilities in the sense they only expected these features within a mouse-desktop paint/diagrammatic program and not within the natural feel of a pen-based direct inking tool. 89% of participants felt that the tools and software options provided covered the interaction level sufficiently to facilitate productive low-fidelity prototyping in the HCI context.
8 Conclusions and Further Work We have provided an example of our framework for electronically evaluating otherwise complex user confidence and capability issues from observing the construction of artefacts in prototyping, as an additional measure to existing low fidelity evaluation. Our contributions are as follows: • We have constructed a specific HCI software solution, PROTEUS, which enables the major functional requirements of low fidelity prototyping to be captured via a mobile device, in the form of a Tablet PC; this enables on-site elicitation and digital recording and sharing of prototypes. • We have developed a second tool, PROTEUS EVALUATOR, to assist in analysing design artefacts and constructionist data in the timeline of PROTEUS designs. This has led to the exploration of using the artefactdriven constructionist theory to create a working model of artefact driven assessment metrics. This forms a research opportunity for mapping potential
cognitive decision making characteristics within the temporal creation of user knowledge elicitation. • Finally we have conducted two scenario driven experiments using our framework with PROTEUS, and compared data with expert formal assessment guidelines, to demonstrate links with the existing methodology. For HCI practitioners and educators, existing paper based prototyping methods are obviously quick, cheap and practical if the space and materials are available. Low-fidelity ideas are captured efficiently this way. However, their outputs are not effectively recorded on paper in a consistent and shareable form, and intermediate decisions made are not easily recorded on paper (e.g. how they arrived at particular artefacts that are consistently used metaphorically in their design and what potential decisions did they make along the way), nor is the final output completely indicative of the personal confidence, initiative and capability of the user(s) involved. Automating the collection of this data in readily usable formats is seen as a beneficial capability. For the advancement of practical HCI methodology there is an advantage especially so for HCI educators to be realized here with such tools. We would like to see this work expanded further and applied to other scenarios. In particular we intend to continue to map further cognitive measures within artefacts, e.g. expanding on the clustering of multiple constructionist event types (Table 2). We believe similar techniques can be applied to other knowledge elicitation methods such as affinity diagramming and card sorting which also use constructionist ideals.
References Bannon, L. & B0dker, S. [1991], Beyond the Interface: Encountering Artifacts in Use, in J. M. Carroll (ed.). Designing Interaction: Psychology at the Human-Computer Interface, Cambridge University Press, pp.227-53. Bertelsen, O. W. [2000], Design Artefacts — Towards a Design-orientated Epistemology, Scandinavian Journal of Information Systems 12(1-2), 15-27. Burge, J. E. [2001], Knowledge Elicitation Tool Classification, PhD thesis, Worcester Polytechnic Institute, USA. B6guin, P. & Rabardel, P [2000], Designing for Instrument Mediated Activity, Scandinavian Journal of Information Systems 12(1-2), 173-90. Special Issue: Information Technology in Human Activity. Chin, J. P., Diehl, V. A. & Norman, K. L. [1988], Development of an Instrument Measuring User Satisfaction of the Human-Computer Interface, in J. J. O'Hare (ed.). Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI'88), ACM Press, pp.213-8. Hardgrave, B. C. & Wilson, R. L. [1994], An Investigation of Guidelines for Selecting a Prototyping Strategy, Journal of Systems Management 45(4), 28-35.
52
Dean Mohamedally, Panayiotis Zaphiris & Helen Petrie
Jonassen, D. H. [1994], Thinking Technology: Towards a Constructivist Design Model, Educational Technology 3(4), 34-7. Kidd, A. [1987], Knowledge Acquisition: An Introductory Framework, in A. Kidd (ed.), Knowledge Acquisition for Expert Systems: A Practical Handbook, Plenum Press, pp. 1-15. Klemmer, S. R., Sinha, A. K., Chen, J., Landay, J. A., Aboobaker, N. & Wang, A. [2000], SUEDE: A Wizard of Oz Prototyping Tool for Speech User Interfaces, in M. Ackerman & K. Edwards (eds.). Proceedings of the 13th Annual ACM Symposium on User Interface Software and Technology UIST'OO, CHI Letters 2(2), ACM Press, pp. 1-10. Leont'ev, A. N. [1978], Activity, Consciousness arui Personality, Prentice-Hall. Luria, A. R. [1981], Language and Cognition, John Wiley & Sons. Maiden, N. A. M., Mistry, P. & Sutcliffe, A. G. [1995], How People Categorise Requirements for Reuse: A Natural Approach, in P. Zave & M. D. Harrison (eds.), Proceedings of the 2nd IEEE International Symposium on Requirements Engineering (RE'95), IEEE Computer Society Press, pp. 148-57. Papert, S. [1991], Situating Constructionism, in I. Harel & S. Papert (eds.), Constructionism, Ablex, pp.1-12. Piaget, J. [1973], To Understand is to Invent, Grossman. Resnick, M. [1996], Distributed Constructionism, in D. C. Edelson & E. A. Domeshek (eds.), Proceedings of the Second International Conference on the Learning Sciences (ICLS-96), Association for the Advancement of Computing in Education. http://llk.media.mit.edu/papers/archive/Distrib-Construc.html (last accessed 2005-06-06). Ryder, M. [1998], Spinning Webs of Significance: Considering Anonymous Conmiunities in Activity Systems, in M. Hedegaard & S. Chaiklin (eds.). Proceedings of the Fourth Congress of the International Society for Cultural Research arul Activity Theory: Activity Theory and Cultural Historical Approaches to Social Practice. http://carbon.cudenver.edu/~mryder/iscrat_99.html (retrieved 2004-10-06). Star, S. L. [1989], The Structure of Ill-structured Solutions: Boundary Objects and Heterogeneous Distributed Problem Solving, in L. Grasser & M. Huhns (eds.). Distributed Artificial Intelligence, Pitman, pp.37-54. Vygotsky, L. S. [1978], Mind In Society: The Development of Higher Psychological Processes, Harvard University Press. Edited by Michael Cole, Vera John-Steiner, Sylvia Scribner, Ellen Souberman. Walker, M., Takayama, L. & Landay, J. A. [2002], High-fidelity or Low-fidelity, Paper or Computer Medium?, in Proceedings of the Human Factors and Ergonomics Society 46th Annual Meeting, Human Factors and Ergonomics Society, pp.661- 5. Zaphiris, P. & Kumiawan, S. H. [2001], User-centered Web-based Information Architecture for Senior Citizens, in N. Avouris & N. Fakotakis (eds.). Proceedings of Panhellenic Conference with International Participation on HCI: Advances in Human-Computer Interaction, T^porama, pp.293-8.
The Reader Creates a Personal Meaning: A Comparative Study of Scenarios and Human-centred Stories Georg Str0m DIKU, University of Copenhagen, Universitetsparken 1, DK'2100 Copenhagen O, Denmark Email: [email protected] Different types of written textual descriptions are often used in interaction design. Tliis paper describes an empirical study of how conventional scenarios and stories with emotional and dramatic elements may contribute to software developers' understanding of interfaces, of contexts and situations of use. The results show first, that software developers create a personal understanding of written descriptions by combining parts of them with their personal experiences. Second, that both scenarios and stories improve their understanding of technical information. Third, that stories with emotions and dramatic elements improve their understanding of contexts and situations of use substantially more than conventional scenarios. Fourth, that software developers may find it comparatively easy to write stories with emotional and dramatic elements. Keywords: stories, human-centred stories, scenarios, software development, emotions, requirements, conceptual design.
1
Written Texts are an Important Medium in Industrial Software Development
In most cases it is not possible for software developers to be in continuous contact with users or customers who can describe their situations of use and what they need. In addition most software development is so complex that it is necessary in advance to describe and agree on what is going to be developed. My own experience indicates that mainly is done through the use of written verbal descriptions; in particular in the first phases of a software project where major decisions are taken. Written verbal descriptions are used to communicate the context of use and requirements from
54
Georg Str0m
customers, usability or marketing people to the software developers, and they are used to communicate the suggested goals of the development back to customers and users. Rosson & Carroll [2002] describe how conventional scenarios can be used in software development and Hertzum [2003] and Nielsen [2004] describe empirically and in details how scenarios can be used and improve the communication during software development. However, it is almost impossible from such studies to determine whether descriptions with different characteristics, for instance scenarios with deeper and better descriptions of the motivations of the users, may be more useful. In particular when preceding events in a project and the status and position of the writer influence how and when a written description is used in a development project. Another problem is that each of the present studies focus on the use of just one textual genre. Within a specific genre it is only possible to express a certain range of thoughts and emotions. Because of these limitations, it is recommendable to use a range of genres in industrial software development. It is therefore necessary to do comparative studies that can reveal which genres that are most suitable for specific purposes. Scenarios is one of the genres that can be used to describe the interaction with and context of use of an interface. Some proponents of scenarios indicate that scenarios and stories are almost synonymous [Erickson 1995; Rosson & Carroll 2002] but an empirical study of actual scenarios reveal that scenarios are a much more restricted genre than stories in general. Conventional scenarios are driven by the interface, their plots focus on demonstrating different functions in it, their descriptions of the characters are superficial compared to stories in fiction literature, and they describe no serious conflicts [Strom 2003b]. Some scenarios consist only of lists of seemingly unrelated events, they are not narrative or proper stories at all, as defined by White [1981]. In contrast, a fiction short story normally includes at least one serious conflict, the emotions of the characters in it are shown through dialogue and through specific descriptions of their actions, and the plot is driven by the characters' efforts to succeed in conflicts or to overcome obstacles [Knight 1985]. In contrast to conventional scenarios, such stories tend to involve and engage the emotions of the reader. Clausen [2000] argues that stories using methods from fiction writing are better than technical descriptions when system developers shall communicate with users, and he found that they could be written and used by computer science students in software design projects. Stories similar to fiction writing can in particular be used to describe how people use a computer system [Clausen 2000]; they are Humancentred, in contrast to scenarios that primarily are driven by the interface [Strom 2003a]. The following is an introduction from a conventional scenario: Marissa was not satisfied with her class today on gravitation and planetary motion. She is not certain whether smaller planets always move faster, or how a larger or denser sun would alter the possibilities for solar systems. [Rosson & Carroll 2002]
The Reader Creates a Personal Meaning
55
The introduction describes clearly what the designers want to include in the interface. In contrast, there is nothing about the motivations of the main character: Nothing that indicates why she wants to learn more about the topic, why she will use the interface described in the scenario to learn more, or how she will use it to get more information. In contrast, a human-centred story may describe a user of a similar interface like this: When Marissa was small she used to watch television with her older brother and ask him "Why don't the moon fall down?" or "What would happen if we lived on the sun?" She is now in high school. She is interested in astrophysics, but she is afraid to be considered a brainy girl and to become unpopular. (This is inspired by Carl Sagan [1985,1996].) The quotes or pieces of dialogue show the emotions and thoughts of the main character, such that it is possible to become engaged in the conflict she is caught in and to imagine why and how she will use a Web-based interface to get in contact with other students who share her interests. It is necessary to distinguish betweenfictionand the use of methods from fiction writing. It is possible to describe a real, non-fiction, situation of use by using dialogue and other methods from fiction writing, and to show how the emotions and motivations of the participants drive the events. In contrast, even though a requirement specification is written without any methods from fiction writing, and even when it is based on careful studies of users and their needs, it is normally a work offiction:It describes something that does not exist, and that indeed may never come into existence. In addition to scenarios or other types of stories it is necessary to use technical descriptions of the functions and interfaces of the software to be produced. Such descriptions can give a more complete and compact information, and they can be organized more systematically, which make them easier to use as references. I will therefore investigate how conventional scenarios and human-centred stories affect how software developers perceive situations of use and technical information when they are read together with technical descriptions, and I will try to identify some aspects that affect the perception.
2 Method In 2003 and 2004 I taught courses in the use of different textual descriptions in software development to computer science students. Thefirststudy was done during these courses (the participation was voluntary). It consisted of an analysis of 30 conventional scenarios and 28 human-centred stories written in the course. The participants appeared to have above average technical writing skills, but none of them had any previous fiction writing training or skills. Even though they were not selected in a manner that favoured readers of fiction literature, 18 out of 26 participants replied that they had read fiction literature within the last two months. During the course each participant wrote first a scenario and then a humancentred story describing a situation when an application for processing digital images were used. The participants were given identical written instructions for the writing of the scenario and the human-centred story.
56
Georg Str0m
As part of the course I did an evaluation of the human-centred stories. It was based on what I had learned in a creative writing course where I had participated in the evaluation of about fifty stories and on what might be considered the goal of fiction writing: That the characters and dialogues in the story are believable, that words and rhythms of the language are consistent through the story, that there is an apparently plausible plot that progress through the story until it reaches a conclusion, and that all these elements contribute to a consistent reader experience. (This definition is based on the work by Knight [1985] and Sharpies [1999].) When the courses were completed, I counted the number of new ideas that were mentioned in the scenarios and human-centred stories. I defined a new idea as a function or a usage problem where the solution was obvious, and where the usage problem or function was not mentioned in the written instructions for the assignments. The second study was conducted in 2003-2004. Eight software developers with a programming or computer science background and five with an engineering background participated in the study. Even though they were not selected in a manner that favoured readers of fiction literature, seven of the 13 participants told that they had read fiction literature within the last two months. The participants told that they had spent from half an hour to three hours on reading the texts. They had on average spent about four minutes reading each page. There was no relation between the types of texts they had read and the time they had spent reading them. The texts were assigned randomly to the participants; the scenarios and humancentred stories were not mixed because the participants' opinions about one type of text then might influence their evaluation of another: • Three of the participants read only technical descriptions of four different applications. • Five read both technical descriptions and conventional scenarios that described the same four applications. • Five read technical descriptions together with human-centred stories that described the same four applications. The texts described applications of the following types: Project management software, PDA based time-registration for social assistants, a call-centre system and a mobile phone with built-in camera. I had written the technical descriptions myself and used them in an earlier study [Strom 2003a]. They were based on applications I were familiar with and similar to good technical descriptions I had seen when working in private companies. I had also written the human-centred stories and used them in the earlier study [Strom 2003a]. Some of them had been evaluated in the creative writing course in which I participated, and based on the feedback I had received they can be described as of an almost publishable standard. The scenarios were based on the human-centred stories. That was done in order to ensure that they described exactly the same events as the human-centred
The Reader Creates a Personal Meaning
57
Stories. (Otherwise it would be almost impossible to make a valid comparison). They were similar to good scenarios from private companies and public sources that I had evaluated in an earlier study [Strom 2003b]. In conclusion, the technical descriptions, stories and scenarios were of a uniform quality, with only minor defects, but not of an outstanding or superior quality. The technical descriptions equalled on average 4 typewritten pages. That is similar to a technical summary but smaller than a normal design specification. The scenarios each had a length on about 1 page, which is slightly more than most scenarios I have seen, whereas the human-centred stories had an average length on about 4 pages, which is close to the normal minimal length of fiction short stories [Knight 1985]. (It is difficult to engage the reader and resolve a conflict in a story that is much shorter.) In the second study I used a combination of quantitative and qualitative methods with a concurrent triangulation strategy [Creswell 2003]. This means that I combined qualitative and quantitative methods in the same study, and that I collected the qualitative and quantitative information at the same time. I did that in order to capture as much information as possible within the time spent with each participant. I conducted semi-structured interviews with standardized questions that made it possible to make quantitative comparisons between the replies of the participants. For each application the interviews included the following groups of questions: 1. Open questions where the participants were asked to describe how each described application supported one specific function, which benefits the application offered and possible problems during its use. 2. Two questions about the usability or usefulness of the application. The participants were asked to select a value on a 1-5 scale that included a verbal description of each value, and encouraged to argue for their choice. 3. One question about how good an impression the technical description gave of how the application would function during actual use, and one question about the credibility of the scenario or story (for those participants who had read one of those). The participants were asked to rate the texts on a 1-5 scale that included a verbal description of each value, and encouraged to argue for their choices. The interviews were recorded and transcribed. In order to identify the multitude of different aspects that might occur in the interpretation of a written text, it was necessary to extract qualitative information. Using principles based on Kvale [1997] I identified recurrent themes in the transcripts, collected and evaluated the parts in different interviews that might be related to each theme, interpreted the parts and in some cases selected a single or a few quotes to illustrate it. In order to do a reliable comparison between scenarios and stories, it was necessary to extract additional quantitative information from the interviews. The descriptions of how each application supported one specific function (from the first group of questions) were ranked from the one that gave the most correct information
58
Georg Str0m
and most precisely expressed information about the function, to the one that gave the least correct and least precise information. The ranking was used instead of a grading according to preset criteria, because such a grading could not reliably take into account any correct information that was not expected when the grading was made, or how precisely the information was described in the reply. The quantitative results from the questions in the second and third groups could be directly tabulated. The number of different aspects in the participants' interpretation of the texts, for instance the number of misunderstandings made, was determined by first making a set of definitions of the different aspects, and then by using these to identify their occurrences in transcripts of the interviews. I calculated both normal averages and weighted averages where the different backgrounds of the participants were taken into account (the difference was minimal), and I identified statistical significant differences by testing in the normal distribution. The quantitative and qualitative results were finally triangulated. This means that they were compared and that the conclusions were based on a combination of them.
3 Computer Science Students are Capable of using Methods from Fiction Writing It is almost impossible to find a fiction writer who can fit into a development team, who knows interaction design and who quickly can acquire the required domain knowledge. It is therefore only feasible to use human-centred stories in software development, if they can be written by people who already participate in and know about such projects. The first study shows that computer science students after a single lesson in fiction writing could write human-centred stories, in some cases of an almost publishable standard. They were capable of writing realistic dialogue, they described convincing characters, and they were capable of writing a consistent language that fitted the style and the topics of their stories. The largest problem was that a significant proportion of the stories included dramatic plots with infidelity, serious crime or other events that moved the focus away from the interaction with the interface. However, when the students were made aware of that problem, it was fairly easy for them to solve.
4 Writing Human-centred Stories Create New Ideas When designing interfaces it is often difficult to identify new needs and to invent features that fulfil them. Therefore it is valuable if the writing of human-centred stories can facilitate both. Thefirststudy shows that the human-centred stories compared to the scenarios described substantially more new ideas for the interface. The writing of a scenario resulted on average in 1.5 new ideas, whereas the writing of a story on average resulted in 2.1 new ideas (p = 0.05). In addition, when the lower number of stories is taken into account, the writing of stories resulted in 60% more different ideas. There were only a few cases where the same participant had included the same idea in a scenario and a story. This indicates that the writing of the stories and the ideas generated during it were independent of the writing of the scenarios.
The Reader Creates a Personal Meaning
59
In contrast to the scenarios, the stories described the characters' emotional relations to the use of different functions, for instance that they could express anger through the use of a function or hesitate before deleting an item. These relations cannot be explored through conventional scenarios, and they are important when designing an emotionally satisfying interface. When a scenario is driven by the interface, it is easy to exclude anything that is difficult to implement or not already part of the interface. In contrast, when a story describes a situation where a user has a realistic need of accomplishing a specific result, the writer is almost forced to describe how it can be done. In order to progress with the story, he or she must invent something that overcomes the problems.
5 The Reader Creates His or Her Own Personal Understanding of the Topics of the Texts Successful software development requires that the developers understand what they are supposed to develop. This requires that their understanding of the written descriptions their work is based on is consistent with the understanding of the people who have written and approved them. The second study revealed large differences in how the same text was understood; in some cases the understanding contradicted parts of the text. This was not because the participants did not understand the words of the texts. It was because each participant created his or her personal understanding by combining parts of the text with his or her personal experiences. When doing that the participant often discarded parts of the text that did not fit his or her personal experiences. One of the participants had earlier worked at a help-desk, and he had read a description of a general system for a call-centre (as with the other excerpts from interviews, this is translated by the author from Danish): Q: What are the advantages of the call-centre system? A: You get registered what comes in. You can see what each person is doing, if any tasks are hanging, if there is something that does not get solved. The participant creates a situation of use based on his own experiences, even though it does not fit parts of the text. The interview continues: Q: How is it possible for the same operator to handle calls to different companies? A: He shall give a customer number or something like that. And the number will indicate whether it is company A, B and C. This fits the help-desk experienced by the participant, but not the call-centre described in the text. The text describes how the company is identified automatically based on the number that is called, and how the system then provides the operator with the necessary information to handle the call. The text does not mention any customer number. Later in the interview the same participant is asked: Q: What are the consequences of operators being warned before angry calls ... ? [Calls from a number where a previous caller has been abusive.]
60
Georg Str0m
A: It depends on the number of calls. A call-centre as TDC [Danish Telecom] with thousands of calls, there the customers are impersonal. The centre I thought about had maybe fifty customers that you knew and who came back. The present situation, the question that is asked, changes the understanding that the participant has created. Instead of talking about a help-desk and a small callcentre, he is now talking about a large general call-centre. The new understanding is confirmed in the following: Q: What are the consequences for the operator, when he or she shall play different roles, for instance... handle calls to a travel agency or a complaint because of a missing newspaper. A: I have all the time thought about a specific IT-system. But it is right, as you say, that they might as well handle travel agencies, car reservations or others. So it requires a large flexibility ... it is different worlds ... they will probably feel stressed. The participant has now totally changed the understanding he has created of the topics of the text and the situation of use. The errors in the understanding are not consequences of an insufficient or superficial reading of the texts. They appeared in particular to occur when participants actively thought about the text and made an effort to create an understanding of it. This indicates that reading can be almost as active a process as writing, and that the participants create their personal understandings based on: • Parts of the texts; their understanding may contradict other parts of it. • Parts of their knowledge and personal experiences that best fit the text. • Aspects of the situations in which they create their own understanding. The process is similar to the blending of different objects that is described by Fauconnier & Turner [2002]. It seems that the reader places impressions or mental images generated by parts of the text on top of different personal experiences, accepting these that fit together and discarding those that do not. The process is similar to a design-process as described by Boden [1990] and to Schachter's [1996] description of how memories are re-created when recalled, and it appears to be a common aspect of human thinking. The results show only eight cases (out of more than 150 comments) where a participant expressed that he or she was conscious about the personal experiences that were used to create an understanding and how they influenced his or her interpretation of the text. One said (about the project management application): I have been brainwashed with Microsoft Project, so it is possible it is the basis [of my evaluation]. There were no cases among the more than 150 comments where a participant mentioned alternative or multiple interpretations of the same piece of text. It was as if the participants in each moment felt compelled to choose only one interpretation.
The Reader Creates a Personal Meaning
6 Readers Create an Evaluation of the Usability It is not uncommon that readers try to evaluate the usabihty or other aspects of the use of an interface based on a description of it. It is of course important that the evaluation is as reliable as possible. In addition it is important that the basis of it is known: Otherwise it is difficult to determine how reliable it is. None of the participants in the second study evaluated the interface and possible problems by going through the description in a systematic manner. Some commented on general aspects as this reader of a human-centred story: ... because it must be so flexible, it must also be complex ... when you have learned it, it is easy, but it takes time to learn and get used to Some based their evaluation on the manner in which the interaction was described, as this reader who concluded that the software was difficult to use, because the story gave an extended description of the operation: The text indicates it is not that easy. There are many things, many menus to navigate. You shall skim different places, then go back in another menu and maybe enter something. Other participants created an evaluation of the usability by blending the description with a known interface as in this example: ... [Sending multimedia messages] appears to be very easy. Like writing an SMS or an e-mail. I know it, because I have a heavy Nokia — 9210 — which can transmit such stuff. Some participants evaluated the usability by blending the described interface with a specific user, as this reader of a technical description: It is possible that a social helper who lack routine need time to become familiar with it ... social helpers have different educations, some can use it directly, others are — PC-imbeciles. None of the participants evaluated the usability by blending the description with a specific situation of use; none of them considered how the usability was dependent on the situation of use. When the participants were asked about other issues than usability, they sometimes blended the description with a specific situation of use, as with this reader of a scenario who blended it with his personal experiences when asked how useful a project management tool was for the software developers: ... project management tools are for project managers, not developers. For developers it is not important that the figures are right. You do not ask the project manager to make changes in your editor. You are doing two different tasks.
61
62
Georg Str0m Read only technical description Referred explicitly to the technical description Misunderstandings Referred to an imaginary user or situation of use Referred to an existing user Referred to own experiences Referred to known interface
Read also human-centred stories
Read also scenarios 0.7
1.6
1.6
2.3 2.2
2.2 2.2
0.0 0.7
2.7 1.0 1.7
0.6 0.8 1.6
1.0 0.5 1.0
Table 1: Different aspects in the interpretation of the texts: Averages for each group of participants. Statistically significant differences (p = 0.05) are highlighted.
The blendings were often critical towards the texts, and the contents of the texts were questioned as shown by this reader who compared a technical description and a human-centred story: It is described in the story as if it is very difficult, but in the specification it appears that he uses only two or three menus, and it appears to be logical with the information he shall enter. Another said: This is a sunshine story. But in real life, people make mistakes.
7 Both Conventional Scenarios and Human-centred Stories Improve the Understanding of Technical Information The participants in the second study were asked to describe how each application supported one specific function. The replies were ranked with 1 as the best and 13 as the lowest ranking, and the results showed a substantially better understanding when the participants read a scenario or a story together with a technical description: average rankings of 7.0 and 5.6 vs. 10.8 (p = 0.05 between technical descriptions only and technical descriptions read together with scenarios). The results also show that those who had read a scenario or a story substantially more often during the interview referred to the technical description (it appears they were more aware of its contents): On average 1.6 times (for scenarios and stories), vs. 0.7 times (p = 0.05 between technical descriptions only and technical descriptions with scenarios). See Table 1. It was not possible to identify any relation between the time each participant reported to have spent reading the texts and his or her understanding of them. An earlier study [Strom 2003a] shows that stories read without technical descriptions does not give a better understanding than technical descriptions. It appears that it is the combination of technical descriptions and scenarios or stories that gives a better understanding.
The Reader Creates a Personal Meaning
8 Human-centred Stories give a Better Understanding of a Situation of Use When decision makers shall decide whether a feature shall be included in the requirements, and when the software developers shall decide how it shall be implemented, it is important that they understand its purpose and the expected situations of use. In addition, it is likely that the designers are more motivated and take more care if they are aware of when and how a feature shall be used. The second study shows that reading of human-centred stories substantially reduced the number of misunderstandings of the situation of use, both compared to when only a technical text was read, and when a technical text was read together with a conventional scenario. There were no misunderstandings among the readers of stories, whereas the participants in the two other groups on average had 2.2 misunderstandings (p = 0.05 between scenarios and stories). See Table 1. (Misunderstandings are here defined as personal understandings that contradict at least one part of the text.) It is possible that readers of stories use the descriptions of the emotions and motivations of the participants as an additional reference that helps them to understand how the interface is used. They blend their understanding of the interaction with an interface with the descriptions of the users' emotions and motivations and their own knowledge about human motivations and emotions. Some misunderstandings can be attributed to a lack of background information in the scenarios. One participant concluded that a young man in one of the scenarios was paedophile. The scenario mentioned that he took a picture of two girls, but did not mention that they were approximately his own age. (The interpretation was made even though the following part of the scenario made it unlikely that he was a paedophile.) Other misunderstandings can be attributed to the fact that readers of only technical texts or of scenarios substantially more often indicated that they made their own story with imaginary users or situations of use and used it to create their personal understanding (p = 0.05 between scenarios and stories). See Table 1. The following is from a participant who had read a technical description of how a social helper used a PDA: ... this means, if Mrs Jensen for some reason is not at home, what shall I do then? You may get the services recorded, I can imagine that you have a handful of standard services, and that you put a mark in the proper box. If the developers do not have any personal experience with the domain, their personal stories may have little to do with the actual users or situations of use. In addition, when different members of a project group base their understanding of the users and context of use on personal stories that are not known by other members of the group, it may easily lead to misunderstandings. Participants who had only read technical descriptions referred substantially more often to how they expected that an existing user would use the interface. See Table 1. In some cases themselves; in other cases people they had met:
63
64
Georg Str0m
Some of the call-centre workers I have met have been of such a type that this could not work. There were no significant differences between how often readers only of technical descriptions, of scenarios and of human-centred stories referred to their personal experiences. See Table 1. It is possible that the personal experiences are so vivid and strong that their use when creating an understanding is not affected by the reading of a single scenario or story. The second study shows that scenarios contribute to the understanding of the interaction with an interface and the situation of use, but that human-centred stories can contribute substantially more.
9 Software Developers want Emotions and Dramatic Elements in the Stories A number of participants in the second study made precise conmients about how the texts were written. They clearly noticed the style and other characteristics of the texts they read. Both readers of scenarios and human-centred stories expressed most often that they wanted stories with more dramatic elements (compared to that they wanted stories with less): On average 1.9 times vs. 0.7 times {p = 0.05). They expressed that stories with emotions and dramatic elements seemed more real: When you make some drama when you tell, it makes you believe that this product exists and is in use. An analysis of the conmients shows that the participants wanted emotions and dramatic elements, but that they reacted when the dramatic elements were so strong that they moved focus away from the use of the interface. Scenarios and human-centred stories were rated as equally credible: 2.2 vs. 2.3 on a 1-5 scale. This indicates that stories with emotions, conflicts and dramatic elements, and even with humour, are not regarded as less serious than scenarios without such elements.
10 Discussion Every evaluation of a story and probably also of a non-fiction article include some subjective elements. However, even when that is taken into account the results of the first study demonstrates that computer science students are capable of writing human-centred stories that may be used in software development. In the first study there have been a training effect from the beginning of the course to the writing of scenarios. However, the scenarios and stories were both written later in the course, reducing the training effect between them. The results also show that even though the stories were written after the scenarios, they were not based on them. The results indicate that it is common that computer science students and software developers in Denmark read fiction literature. However, that may not be the case in other countries. Computer science students and software developers in
The Reader Creates a Personal Meaning Other countries may be less familiar with fiction literature and therefore find it more difficult to read and write stories with methods from fiction writing. The technical descriptions, scenarios and stories used in the second study were of a similar good quality, they were not outstanding, and they had only minor defects that did not affect the results. Comments made by the participants indicate that this is in agreement with their evaluations of the texts. The human-centred stories were substantially longer than the scenarios used in the study. However, both scenarios and stories described the same events, it is normally easier to get an overview through a shorter text, and conmients I have received to descriptions of interactions in other stories indicate that longer conventional scenarios may be tedious and difficult to read. This suggests that the benefits of human-centred stories shown in the second study are in spite of them being longer than the scenarios; the benefits must be attributed to how the stories include conflicts and show motivations, emotions and settings of the events. The comparison was made between technical descriptions alone, scenarios read together with technical descriptions and human-centred stories read together with technical descriptions. These are probably the most conmion situations in software development; it is unlikely that software developers only will be given a scenario or a story without some sort of structured technical description. The qualitative and quantitative results were extracted from the transcripts in a consistent manner; the quantitative and qualitative results and the theoretical model lead to the same conclusion, which confirm the reliability of the results. However, the study probably underestimates the amount of misunderstandings and reading problems that may occur in actual software development. The technical descriptions were substantially shorter than many used in system development (on average 4 pages, whereas texts on more than 30 pages are conmion in industrial software development). It was therefore easier for the participants to get an overview of each technical description. It is also likely that most of those who volunteered to participate had above average reading skills, and that they read the texts more carefully because they knew they would be asked questions about them. It shall be taken into account that the scenarios were based on the humancentred stories, so the actions described in them were plausible given the emotions and motivations of the characters. If scenarios are written without such a realistic background, it is more likely that their plots are perceived as implausible [Strom 2003b]. This means that the reported differences between scenarios and humancentred stories probably are smaller than what can be expected when scenarios without such a realistic background are used. The participants in the second study had not contributed to the writing of the texts used in the study. If, as in the development project described by Nielsen [2004], the developers had spend time discussing what the contents of the scenarios or stories should be and participated in the writing, it is possible that the advantages of humancentred stories as compared to conventional scenarios would be smaller. As long as the participants remember the background and motivations of the characters from their discussions, they can imagine their emotions in specific situations and may therefore need only brief descriptions of the specific events to support their memory.
65
66
Georg Str0m
11 Conclusion The two studies give a valid and reliable description of how scenarios and humancentred stories (stories that are driven by the emotions and motivations of the characters in them) may contribute to software developers understanding of interfaces and the contexts they are used in. The results of the first study demonstrate that computer science students can learn to write human-centred stories of a quality that is sufficient for use in industrial software development, in particular if they attend one of the numerous short courses in fiction writing (or creative writing). Compared to the writing of conventional scenarios, the writing of humancentred stories generated substantially more new ideas for the interface. They gave a more realistic description of the situation experienced by the user, and the built-in conflicts stimulated the identification of new needs and functions. The second study demonstrates that reading is an active process; the reader does not absorb the contents of a text, but creates his or her own understanding based on his or her personal experiences together with parts of the text. This means that the reader's understanding may contradict other major parts of the text. This has nothing to do with a lack of comprehension or reading skills; in contrast, it appears to be an essential part of reading, and it is a process that is similar to and almost as creative as the process of writing. One particular problem is that readers tend to select the first understanding that fits part of a text. They may avoid some mistakes by accepting that different understandings of a text are possible and by discussing them openly. The reader uses his or her personal experiences and imagination to create what cannot be found in the text. The reader may imagine situations where an interface is used, and based on them create an understanding of the interface and how it is used. In addition, it appears that the reader often is unaware of how he or she uses personal experiences to create an understanding, and it is likely that other persons do not know the specific experiences that the reader uses to create his or her understanding. This may lead to misunderstandings in software development. In order to ensure that the members of a development group have a similar understanding of an interface and the possible situations of use, it is therefore advisable that they take time to discuss what they have read, and to share the experiences and stories they have imagined and used to create their understanding of what they have read. The participants created understandings of the usability and of other aspects of an interface by blending the description of the interface with their own experiences, for instance with an interface of the same type. They assumed for instance that a described mobile phone in general was as easy or difficult to use as other mobile phones they had encountered. In daily life that is an effective and sensible method: It gives a good indication of whether an interface can be used by a particular user or in a particular situation of use. However, this method may be misleading if the goal is to evaluate the usability of an interface compared to other interfaces of the same type or to identify specific problems in it. In such cases it is important that the readers are aware of and discuss which examples of interfaces they use as comparison and which differences they notice.
The Reader Creates a Personal Meaning
67
Compared to a technical description, a combination of technical descriptions and scenarios or human-centred stories gave a better understanding of the structure of a system and the specific elements in the interface. This confirms the value of scenarios in software development. The results indicate that the use of human-centred stories gave a better understanding of situations of use, both compared to technical descriptions alone and to technical descriptions combined with conventional scenarios. It appears that the readers use the emotions and motivations in the human-centred stories as an additional reference when they create their understanding of the situation of use. That will probably also be the case for users and other stakeholders. It is therefore likely that their understanding also will benefit from the emotions and background described in the stories. Stories with emotions and dramatic elements are more readable, making it more likely that they actually are read (not only by software developers, but probably also by user representatives and other stakeholders). In addition, software developers in general prefer stories with emotions and dramatic elements. However, they complained if the events in the story were so dramatic that focus moved away from the interaction with the interface. This means that stories that are based on everyday problems or conflicts are the most useful and acceptable in software development.
Acknowledgements Thanks to the participants for their time and effort, to Jesper Hermann, University of Copenhagen, for his comments and to the reviewers for their conmients.
References Boden, M. A. [1990], The Creative Mind: Myths and Mechanismsy Weidenfeld and Nicolson. Clausen, H. [2000], Informationsteknologiens menneskelige grundlag, Teknisk Forlag A/S. Creswell, J. W. [2003], Research Design, Qualitative, Quantitative and Mixed Methods Approaches, Sage Publications. Erickson, T. [1995], Notes on Design Practice: Stories and Prototypes as Catalysts for Communication, in J. M. Carroll (ed.), Scenario-Based Design: Envisioning Work and Technology in System Development, John Wiley & Sons, pp.37-57. Fauconnier, G. & Turner, M. [2002], The Way We Think: Conceptual Blending and the Minds Hidden Complexity, Basic Books. Hertzum, M. [2003], Making Use of Scenarios: A Field Study of Conceptual Design, International Journal of Human-Computer Studies 58(2), 215-39. Knight, D. [1985], Creating Short Fiction, Digest Books. Kvale, S. [1997], Interview, Hans Reitzels Forlag. Nielsen, L. [2004], Engaging Personas and Narrative Scenarios, PhD thesis, School of informatics, Copenhagen Business School.
68
Georg Str0m
Rosson, M. B. & Carroll, J. M. [2002], Usability engineering: Scenario-based Development of Human-Computer Interaction, Morgan-Kaufmann. Sagan, C. [1985], Contact, Simon and Schuster. Sagan, C. [1996], The Demon-haunted World, Random House. Schachter, D. L. [1996], Searching for Memory, the Brain the Mind and the Past, Basic Books. Sharpies, M. [1999], How We Write, Routledge. Strom, G. [2003a], Perception of Human-centered Stories and Technical Descriptions when Analyzing and Negotiating Requirements, in M. Rauterberg, M. Menozzi & J. Weeson (eds.), Human-Computer Interaction — INTERACT '03: Proceedings of the Ninth IFIP Conference on Human-Computer Interaction, lOS Press, pp.912-5. Strom, G. [2003b], Using Creative Writing for developing Realistic Scenarios, in C. Stephanidis & J. Jacko (eds.), Human-Computer Interaction, Theory and Practice. Proceedings of Human-Computer Interaction International 2003, Lawrence Erlbaum Associates, pp. 15-6. White, H. [1981], The Value of Narrativity in the Representation of Reality, in W J. T. Mitchell (ed.). On Narrative, University of Chicago Press, pp. 1-24.
What Difference Do Guidelines Make? An Observational Study of Online-questionnaire Design Guidelines Put to Practical Use Jo Lumsden, Scott Flinn, Michelle Anderson & Wendy Morgan National Research Council of Canada, IIT e-Business, 46 Dineen Drive, Fredericton, New Brunswick E3B 9W4, Canada Tel: -^1 506444 0544 Fax: +1 506444 6114 Email: {jo.lumsden,
scott.flinn}@nrc-cnrc,gc,ca
As a new medium for questionnaire delivery, the Internet has the potential to revolutionize the survey process. Online-questionnaires can provide many capabUities not found in traditional paper-based questionnaires. Despite this, and the introduction of a plethora of tools to support onlinequestionnaire creation, current electronic survey design typically replicates the look-and-feel of paper-based questionnaires, thus faUing to harness the full power of the electronic delivery medium. A recent environmental scan of online-questionnaire design tools found that little, if any, support is incorporated within these tools to guide questionnaire designers according to best-practice [Lumsden & Morgan 2005]. This paper briefly introduces a comprehensive set of guidelines for the design of online-questionnaires. Drawn from relevant disparate sources, all the guidelines incorporated within the set are proven in their own right; as an initial assessment of the value of the set ofguidelines as a practical reference guide, we undertook an informal study to observe the effect of introducing the guidelines into the design process for a complex online-questionnaire. The paper discusses the qualitativefindings— which are encouraging for the role of the guidelines in the ^bigger picture' of online survey delivery across many domains such as e-govemment, e-business, and e-health — of this case study. Keywords: online-questionnaire, design guidelines, evaluative case study.
70
Jo Lumsden, Scott Flinn, Michelle Anderson & Wendy Morgan
1 Introduction As a new medium for questionnaire delivery, the Internet has the potential to revolutionize the survey process. Online (Web-based) questionnaires provide several advantages over traditional survey methods in terms of cost, speed, appearance, flexibility, functionality, and usability [Bandilla et al. 2003; Dillman 2000; Kwak & Radler 2002]. Online-questionnaires can provide many capabilities not found in traditional paper-based questionnaires: they can include pop-up instructions and error messages; they can incorporate links; and it is possible to encode difficult skip patterns making such patterns virtually invisible to respondents. Despite this, and the emergence of numerous tools to support online-questionnaire creation, current electronic survey design typically replicates the look-and-feel of paperbased questionnaires, thus failing to harness the full power of the electronic survey medium. A recent environmental scan of online-questionnaire design tools found that little, if any, support is incorporated within these tools to guide questionnaire design according to best-practice [Lumsden & Morgan 2005]. This paper briefly introduces a comprehensive set of guidelines for the design of online-questionnaires. It then focuses on an informal observational study that has been conducted as an initial assessment of the value of the set of guidelines as a practical reference guide during online-questionnaire design.
2 Background Online-questionnaires are often criticized in terms of their vulnerability to the four standard survey error types: namely, coverage, non-response, sampling, and measurement errors. Although, like all survey errors, coverage error ("the result of not allowing all members of the survey population to have an equal or nonzero chance of being sampled for participation in a survey" [Dillman 2000, p.9]) also affects traditional survey methods, it is currently exacerbated in onlinequestionnaires as a result of the digital divide. That said, many developed countries have reported substantial increases in computer and Internet access and/or are targeting this as part of their immediate infrastructure development [OECD 2(X)1]. Indicating that familiarity with information technologies is increasing, these trends suggest that coverage error will rapidly diminish to an acceptable level (for the developed world at least) in the near future, and positively reinforce the advantages of online-questionnaires. Non-response errors occur when individuals fail to respond to the invitation to participate in a survey or abandon a questionnaire before completing it. Given today's societal trend towards self-administration [Dillman 2(XK)] the former is inevitable, irrespective of delivery mechanism. Conversely, non-response as a result of questionnaire abandonment can be relatively easily addressed [Dillman 2(X)0]. For example, by incorporating a range of features into the design of an onlinequestionnaire, it is possible to support respondents' estimation of the length of a questionnaire — and to provide respondents with context sensitive assistance during the response process — and thereby reduce abandonment while eliciting feelings of accomplishment [Crawford et al. 2(X) 1 ].
What Difference Do Guidelines Make? For online-questionnaires, sampling error ("the result of attempting to survey only some, and not all, of the units of the survey population" [Dillman 2000, p.9]) can arise when all but a small portion of the anticipated respondent set is alienated (and so fails to respond) as a result of, for example, disregard for varying connection speeds, bandwidth limitations, browser configurations, monitors, hardware, and user requirements during the questionnaire design process. Similarly, measurement errors ("the result of poor question wording or questions being presented in such a way that inaccurate or uninterpretable answers are obtained" [Dillman 2(XX), p. 11]) will lead to respondents becoming confused and frustrated. Sampling, measurement, and non-response errors are likely to occur when an online-questionnaire is poorly designed (note that coverage errors, on the other hand, are orthogonal to good-questionnaire design; mixed-mode delivery is suggested as a means to combat these errors). Individuals will answer questions incorrectly, abandon questionnaires, and may ultimately refuse to participate in future surveys; thus, the benefit of online-questionnaire delivery will be not fully realized. To prevent errors of this kind, and their consequences, it is extremely important that practical, comprehensive guidelines exist for the design of online-questionnaires. Many design guidelines exist for paper-based questionnaire design [e.g. American Statistical Association 1999; Belson 1981; Brewer 2001; Fink 1995; Jackson 1988; Lindgaard 1994; Oppenheim 1992; Taylor-Powell 1998] but the same is not true for the design of online-questionnaires [Dillman 2000; Norman et al. 2003; Schonlau et al. 2001]. The guidelines introduced in this paper, and their subsequent study, help address this discrepancy.
3 Comprehensive Design Guidelines In essence, an online-questionnaire combines questionnaire-based survey functionality with that of a webpage/site. As such, the design of an onlinequestionnaire should incorporate principles from both contributing fields. Hence, in order to derive a comprehensive set of guidelines for the design of onlinequestionnaires, we performed an environmental scan of existing guidelines for paper-based questionnaire design [e.g. American Statistical Association 1999; Belson 1981; CASRO 1998; Fink 1995; Jackson 1988; Lindgaard 1994; Oppenheim 1992; Taylor-Powell 1998] and website design, paying particular attention to issues of accessibility and usability [e.g. Badre 2002; Brewer 2001; Coyne & Nielsen 2001, 2002; Hinton 1998; Kothari Sc Basak 2002; Lynch & Horton 1997; National Cancer Institute 2(X)2; National Institute on Aging & National Library of Medicine 2(X)1; Stover et al. 2002; Stover & Nielsen 2002; W3C 1999]. Additionally, we reviewed the scarce existing provision of online-questionnaire design guidelines [Dillman 2000; Norman et al. 2003; Schonlau et al. 2001]. Principal amongst the latter is the work of Dillman [2000]: expanding on his successful Total Design Method for mail and telephone surveys [Dillman 1978], Dillman introduced, as part of his Tailored Design Method [Dillman 2000], fourteen additional guidelines specifically aimed at directing the design of online-questionnaires. Albeit seminal, Dillman's guidelines do not incorporate much of the relevant guidance uncovered as part of our environmental scan. We therefore propose — after collating, filtering, and
71
72
Jo Lumsden, Scott Flinn, Michelle Anderson & Wendy Morgan
welcome
j
1.
i
registration/log-in j
1 introduction
1
.
i
additional info/links K M questionnaire questions
Q]
*
screening test
• thank you
j
Figure 1: Organizational structure of online-questionnaires (arrows show progression, a double-barred arrow indicating choice in the structure).
General [ Organization Welcome Page Registration/: Login Page Introduction Page Screening Test Page Questionnaire Questions Additional Info/Links Thank You Layout Frames Forms & Fields Navigation Buttons Links Site Maps Scrolling
Formatting
Question Type & Phrasing
General Technical Issues
Text Colour Graphics Flash Tables & Frames Feedback Miscellaneous Response Formats Matrix Questions Drop-Down Boxes Radio Buttons Check Boxes
General Guidance Sensitive Questions Attitude Statements Phraseology Types of Question Open-Ended Closed-Ended Rank-Order Categorical/Nominal Magnitude Estimate Ordinal Likert Scale Skip
Table 1: Organization of the guidelines, showing topics covered.
integrating the disparate guidelines — a comprehensive set of guidehnes for onlinequestionnaire design that are more encompassing than Dillman's. Approximately 33% of the resulting set of guidelines stem directly from paper-based questionnaire design guidelines; the remainder (67%) are derived from webpage design guidelines as they apply to questionnaire design. This paper will only highlight the key elements of the guidelines; the full set of guidelines is available on request.
3.1 Overview of the Guidelines Although the guidelines provide minimal support for other aspects of the design process for online-questionnaires, their main focus is on the design and implementation stages associated with online-questionnaire creation. They describe the general organizational structure that should be adopted by the majority of online-
What Difference Do Guidelines Make?
73
There are a number of issues of importance when designing the textual content of an onlinequestionnaire: a. Fonts used should be readable and familiar, and text should be presented in mixed case or standard sentence formatting; upper case (or all capitals) should only be used for emphasis; b. Sentences should not exceed 20 words, and should be presented with no more than 75 characters per line. If elderly respondents are anticipated, then this limit should be reduced to between 50 and 65 characters per line. Paragraphs should not exceed 5 sentences in length; c. Technical instructions (those being instructions related to the basic technical operation of the website delivering the questionnaire) should be written in such a way that nontechnical people can understand them; d. Ensure that questions are easily distinguishable, in terms of formatting, from instructions and answers; e. For each question type, be consistent in terms of the visual appearance of all instances of that type and the associated instructions concerning how they are to be answered. In particular, keep the relative position of the question and answer consistent throughout the questionnaire. Where different types of questions are to be included in the same questionnaire, each question type should have a unique visual appearance; f. When designing for access by users with disabilities and the elderly, employ a minimum of size 12pt font and ensure that the font colour contrasts significantly with the background colouring. Text should be discernible even without the use of colour. It is advisable to test font colours and size with a screen magnifier to ensure usability prior to release; g. If targeting an elderly audience, provide a text-sizing option on each page, use bold face but avoid italics, and left-justify text. It is also advisable to increase the spacing between lines of text for ease of reading by this respondent group; h. Make sure that text is read (by screen readers) in a logical order. Specifically, set the tab order on the pages. This is especially true for actual questions in the questionnaire — think carefully about the order in which a visually impaired user will hear the elements of a question, including the instructions and response options. Table 2: An excerpt from the online-questionnaire design guidelines.
questionnaires (see Figure 1) and then progressively refine the guidance according to the issues shown in Table 1. Since it is not possible to include the comprehensive set of guidelines in this paper, excerpts are shown in Tables 2 & 3 to provide a 'flavour' for the guidelines as a whole; the guidance in Table 2 relates to the formatting of text in online-questionnaires whilst that in Table 3 relates to the layout of form and field components commonly used to construct online-questionnaires. When reading the examples, it is important to note that none of the guidelines are particularly innovative in their own right; each has been drawn from the aforementioned sources covered by the environmental scan. What is novel, however, is the fact that applicable guidelines from these disparate sources have been collated into a unified set which is presented methodically as a comprehensive, practical guide to online-questionnaire design; webpage design concepts such as visual design
74
Jo Lumsden, Scott Flirm, Michelle Anderson & Wendy Morgan Layout:: Forms and Fields: By their very nature, questionnaires include elements common to forms — that is, layout and the use of fields for data entry. Users with disabilities canfindforms andfieldsproblematic, and so it is important that the following guidelines — which are relevant across all respondent groups — be taken into consideration when laying out these elements of a questionnaire: a. Locate field labels close to their associated fields so that respondents can easily make the association; this also prevents labels becoming lost when a screen is magnified by users with visual impairment. b. A 'submit' (or similar) button should always be located adjacent to the last field on any given page so that it is easily identified by respondents at the point at which they have completed the question responses; this is again especially important for users of assistive technology since it goes some way to ensuring that such a button will not be overlooked when the screen is magnified. c. The tab order for key based navigation around the fields in a questionnaire should be logical and reflect the visual appearance as far as is possible. d. Fields are most easily read if stacked in a vertical column and any instructions pertaining to a given field should appear before and not after the field if it is to be understood by users of assistive technology. Table 3: Another excerpt from the online-questionnaire design guidelines.
principles have been integrated with the large body of knowledge on paper-based questionnaire design principles to provide practical support for designers of onlinequestionnaires.
4 An Observational Study Having established the set of guidelines, we set up an informal study to observe the extent of the influence of the guidelines on the design of an online-questionnaire. This is thefirstin a series of planned evaluations to determine the effect of applying the guidelines to online-questionnaire design, and in turn, the effectiveness of the set of guidelines as a practical support measure during the design process.
4.1 The Observational Procedure A contract software developer was hired to create an online-questionnaire for the purpose of surveying public awareness of security issues when using the Internet. We established the following process by which we could observe the use of the guidelines when applied to this real, substantial online-questionnaire development project. The software developer (henceforth referred to as M) employed to develop the electronic survey had no previous experience of questionnaire development (paperbased or electronic). We specifically selected a developer without prior experience since we did not want previous exposure to online-questionnaire design to influence the design process under observation. M did, however, have extensive experience with website design. We felt that a developer with this profile may be representative of many of the users of the online-questionnaire design and delivery tools on the market and/or of the typical developers of online-surveys within business environs.
What Difference Do Guidelines Make? M was provided with a plain text list of the 29 questions (including response sets where applicable) that were to be incorporated in the survey; no indication was given as to the question type/style nor to layout. For our purposes, we did not want to assess M's ability to appropriately phrase the survey questions; instead, experts in questionnaire design and the domain being studied formulated the questions and response options independently of this observational study. As a 'warm up' exercise, M was asked to develop afirstprototype of the onlinequestionnaire. M was not provided with any advice on questionnaire design at this point. The aim of this exercise was simply to familiarize M with the technology and protocols of the organization as well as the survey questions; the resulting prototype was effectively a 'throw away' prototype and as such is not the focus of this discussion. After the 'warm up' exercise, M was furnished with a set of guidelines for the development of paper-based questionnaires. These were drawn from the same sources for paper-based questionnaire design as were used to generate the comprehensive set of guidelines discussed in the previous section. M was then asked to design and develop an online-questionnaire with reference to the guidelines for paper-based questionnaire design. A screen shot of part of the resulting onlinequestionnaire (known hereafter as Ql) is shown in Figure 2. Finally, upon completion of Ql, M was provided with the complete, comprehensive set of guidelines for the design of online-questionnaires (of which the paper-based design guidelines formed a subset). M was asked to design and develop another online-questionnaire with reference to the complete set of guidelines. A screen shot of part of the resulting online-questionnaire (known hereafter as Q2) is shown in Figure 2. During each of the above design and development exercises, M was asked to maintain a log of design issues and their resolution relative to the guidelines available at the time. Although one might argue that learning had a significant effect on the final design of Q2, and to a certain extent this will be true, we had no option but to deliver the guidelines in this order. Since the paper-based guidelines are a subset of the comprehensive guidelines, we could not have isolated the influence of the paper-based guidelines had the comprehensive set been used by M prior to the paper-based set. Additionally, we essentially wanted to use the observed experience with the paper-based guidelines as a 'control' against which to compare the effect of the comprehensive set of design guidelines; we felt that it would have been an unfair comparison to have simply compared the 'warm up' version of the online-questionnaire to Q2 — the paper-based guidelines are readily available to questionnaire designers and so their use as a 'control' was considered more realistic for this study. Once both versions of the online-questionnaire were fully tested, the survey was made publicly available online. All respondents were required to complete an identical click-through consent form; thereafter, in an alternating pattern, respondents were presented with either Ql or Q2 — for example, if respondent x was allocated Ql, then respondent jc + 1 was allocated Q2. Using this allocation, each questionnaire had an equal exposure rate; no respondent was aware of the
liAeeiMQ* i(M«n» o r N « N H xiiMi^ fTjii uMlfm^!« ^ ft«4*«; ! ^ ?
H9\i> ?
o o o a o
0
o <5
c • • - ; ^ -
o o
f>•••o
0
0
o 0
•
^c^
0
131 lEI Figure 2: Screen shot of question 8 from both versions of the online-questionnaire, labelled accordingly.
What Difference Do Guidelines Make ?
77
S««urtiy On-line Swrv«y 2004
SCI«UJ1^ On-iino Survty 200
Figure 3: Progress indicatorsfromboth versions of the online-questionnaire, labelled accordingly.
existence of the alternative version of the questionnaire. For the purpose of the actual survey being conducted, the wording and type of each question was identical across both Ql and Q2; what differed between Ql and Q2 were the aesthetics, structure, provision of help, and level of automation.
4.2 Comparing the Two Designs Consider the difference in aesthetics and structure between Ql and Q2. Ql comprised 4 long scrolling pages; in Q2, the questionnaire was split up into a maximum of 15 pages (depending on skip question responses) with, as far as possible, minimal need for scrolling. Q2 made more use of block shading to enhance the readability of the questionnaire and to differentiate between questions, instructions, and response options. Questions within Ql were typically quite condensed within each page — each page had a 'cluttered' feel; in Q2 more use was made of white space between questions to enhance readability. Conversely, as shown in Figure 2, response options in matrix questions were brought into closer proximity in Q2 to support easier visual association between radio buttons and response labels and ensure that response labels were never widowed, as a result of scrolling, from the radio buttons. It is in the scope of help and level of automation that the two versions of the questionnaire differ the most. Although Ql does provide some indication of progression through the questionnaire, given the extent of questions per scrolling page of the questionnaire, the scale provides for only a very high level judgement of progress. In contrast, Q2 uses a progress indicator which, given the lesser extent of questions on each page, is better able to support a more accurate assessment of progress. Both indicators are shown in Figure 3. The online-questionnaire included four skip questions. Ql provided written instruction to the respondents to direct them to the next applicable question, as determined by their response to the skip question (see Figure 4). In contrast, all skip questions (and thereby patterns) were encoded within Q2 such that the cognitive load associated with skip patterns was removed entirely from the respondents who were subsequently wholly unaware that they were following skip patterns (see Figure 4).
78
Jo iMmsden, Scott Flinn, Michelle Anderson & Wendy Morgan
Ql
il yMi MiMMd MM n w afrtMi jKtMt IMTMII tMMd «i!4«Mn M l m i <«•«•«« afaoM thwnta««j : PAMN* p i iltivctir «•
!f4S4»fiW)J*<*«<*i<»**tn«** H y N f « a « ' « * » ^
1
t o g MMtWUi4k#»«**?*»>»y«»*«»^
1
tin tm ^mm ^th 0 r^ mm
1
iMKi<^4Kt»»it(»>^
1
Ai>»<«uvi
4?)>
D
C:
0
0
1
«>»?*
0
D
Ci^
C
3S
1
Acamx
0
€>
0
C?
0
1
Ai^«r4i^
0
Ci
0
^
0
V^e^* CttiM^ tirt»
•> > > %
N M C
r^
i
l««w»l
1
*;
•"
friiljllfiyijpiilliiipil^^ 1
:;«*« •
fitoH
i
:i i:
0 0 0 0 0 0
H
,.,.,.,..,.....,::. j^miiMstim,-
* ..•...::.•:,. \
Figure 5: Help facility in Q2 — clicking on the help links in a page brings up context-sensitive help to assist respondents in the mechanics of responding.
What Difference Do Guidelines Make? Finally, Q2 included a pop-up help facility which was entirely absent from Ql. At the top of each page, a link to a pop-up help screen was always available (see Figure 5) and provided help about the mechanics associated with answering all of the question types in the questionnaire; adjacent to each question there was also a 'help' link which provided context sensitive help relative to the mechanics of that particular question type — Figure 5 shows an example of help for a matrix style question. As can be seen, there were a number of substantial differences between the two versions of the questionnaire in terms of general aesthetics, structure, automation, and help. While the guidelines for paper-based questionnaire design that were applied to Ql prompted effective question formatting (this was the focus of most entries in M's log for Ql), it was the comprehensive set of guidelines for onlinequestionnaire design that appeared to prompt M to make substantial changes to the aforementioned aspects of Q2. On the basis of our observations, it would seem that without the comprehensive set of guidelines, M's design was restricted to following the traditional paper-based model — the comprehensive guidelines seemed to encourage M to *think out of the box' — or more laterally — and embrace the functionality available in the electronic delivery mechanism (a process which did not seem to happen 'intuitively' without prompting from the guidelines). They also appeared to encourage M to give considerable thought to each functional design decision (e.g. tabular presentation and the accessibility issues associated with such presentation) since extensive discussion of rationale concerning such decisions formed the basis of most comments in M's log for Q2. Anecdotally, M seemed to appreciate the support of the guidelines — as indicated in the following quotation taken from a post-development interview: "By using Web-based guidelines, it encompasses the practicality of paper-based guidelines as well as guidance for a Web medium [... ] the advantages of having such guidelines are countless. [... ] The guidelines offered solutions for problems I hadn't even considered [... ] The [comprehensive set of] guidelines improve[d] the questionnaire."
4.3 Comparing the Responses to the Two Designs In terms of functionality and look-and-feel, the comprehensive set of guidelines appears to have had a substantial influence on the design of the questionnaire (based on a qualitative comparison of Ql and Q2). We therefore decided to look for qualitative differences in the 'responses' returned for the two versions of the questionnaire. That is, we wanted to see what, if any, impact the above noted differences in functionality and look-and-feel had on the manner in which the respondents completed the two versions of the questionnaire; we were not concerned with the semantics of their responses. It is important to reiterate that both Ql and Q2 asked exactly the same questions and presented the response options to these questions using the same question style; thus, any differences between the quality and/or quantity of responses between Ql and Q2 can essentially be attributable to the differences discussed in Section 4.2. A total of 236 questionnaires were completed: Ql achieved a completion rate of 64.6% which was only slightly less than the 65.7% rate attained by Q2. There
79
80
Jo Lumsden, Scott Flinn, Michelle Anderson & Wendy Morgan
was no significant difference in the average time taken to complete each version of the questionnaire. Based on the hypothesis that Q2, as a result of its aesthetic and functional enhancement, would be 'easier to use' and therefore less frustrating, we anticipated that respondents using Q2 would complete more of the open-ended questions than respondents using Ql; that is, Q2 respondents would be more inclined to invest the necessary additional effort required to complete this type of question. This was not found to be the case and, in fact, there was no real difference in the average length of such responses between the two versions of the questionnaire. That said, albeit not statistically significant, 14.5% of Ql respondents left questions unanswered (not counting open-ended questions or those which should have been left unanswered by virtue of applicable skip patterns) whereas this figure was only 10.1% for Q2. This would suggest that the aesthetics and functionality of Q2 were more conducive to supplying a response to questions. For all matrix style questions, we assessed the extent to which respondents relied on neutral responses and/or exhibited response set behaviour. There was no significant difference between Ql and Q2 in this regard. Perhaps the two most interesting findings concern respondents' handling of skip questions/patterns and abandonment behaviour. Consider, first, skip questions. Obviously, skip questions and their associated patterns were completely automated in Q2, and thereby hidden from the respondents. As such, it was impossible for respondents to Q2 to waste effort answering questions that should have been skipped. In contrast, Ql required respondents to comprehend written skip patterns and manually skip the applicable questions. Consequently, 11.6% of respondents who should have skipped at least one question, answered questions that they should have skipped. On average, these respondents answered 85% of the questions that they should have skipped (ranging from a minimum of 25% to a maximum of 1(X)% of such questions). Naturally, this represents a significant waste of respondents' time and effort and is likely to lead to irreparable levels of frustration. It highlights the benefit of automating this aspect of online-questionnaire delivery. Approximately 35% of respondents who started the survey failed to complete it; this figure was essentially the same for both Ql and Q2. We hypothesized that, as a result of its enhanced functionality and aesthetics, Q2 would hold the attention of such respondents for considerably longer than Ql; that is, respondents would complete more of Q2 before abandoning it than Ql. To test this hypothesis, for each of the respondents who abandoned the survey part way through, we calculated (taking into account skip patterns) the last possible question that they could have seen on the basis of the last webpage requested. For ethical reasons, we could not record the precise questions which respondents actually answered since they had abandoned the survey and so we too had to abandon their partial results; the only data we could ethically access was the series of webpage request patterns from the server log. From this, we calculated the extent of completion for each respondent as a percentage of the total possible given their path thus far. On average, respondents who abandoned Ql completed 32.4% of the questionnaire before giving up; in contrast, respondents to Q2 went significantly further (on average 42.4%) before
What Difference Do Guidelines Make ? abandoning the survey (^122 = 1-82, p = 0.035). Had we been able to assess the precise question at which the respondents abandoned their versions of the survey, we feel that the difference in extent of completion would have been even more pronounced; each of the four pages in Ql contain considerably more questions than the fifteen pages in Q2 and so in essence our calculations were potentially giving the respondents to Ql a large benefit of doubt — it was highly unlikely that respondents were on the last question of a page when they abandoned the survey. Given the importance, yet associated difficulty, of achieving high response rates for questionnaire-based surveys, this finding is important; it would suggest that there is demonstrated potential for the set of comprehensive design guidelines to assist online-questionnaire designers to develop questionnaires which encourage respondents to persevere with a questionnaire.
4.4 Discussion On the basis of the aesthetic and functional disparity between the two versions of our questionnaire, we had (somewhat naively perhaps) anticipated finding more significant differences in terms of quality and quantity of responses between Ql and Q2. With hindsight, however, we feel this study has helped raise interesting questions concerning what constitutes success in this domain. How can two (or more) designs for the same online-questionnaire be fairly and effectively compared and evaluated? What are the dimensions of a successful and effective online-questionnaire? To what extent can these dimensions be addressed through the online medium as opposed to simply being a facet of questionnaire topic, question wording, scale choice etc.? That said, the results outlined in the previous section suggest that application of the comprehensive set of guidelines to the design process for an online-survey may be extremely beneficial in tackling two of the most complex issues associated with questionnaire-based surveys: respondent perseverance and handling of skip patterns. In this regard, we feel that our study has yielded positive results.
5 Conclusions Albeit this was an initial observation of the merit of the guidelines for onlinequestionnaire design, we feel that some interesting and valuable findings have come to light. It would appear that the guidelines encourage more lateral thinking in terms of online-questionnaire design while, at the same time, promoting careful consideration of design issues that effect accessibility and thereby usability. This is reflected in the aesthetic and functional differences between Q2 and Ql. It would have been advantageous to have been able to *tag' a post-questionnaire questionnaire to the online-survey in order to elicit information about respondents' subjective impressions of their allocated questionnaires. However, we felt this would have been too much on top of what was already a complex questionnaire and would ultimately have been detrimental to the survey itself. The extent of progress prior to abandoning the online-survey is therefore our only insight into respondents' subjective assessment of their allocated questionnaire: the results suggest that the comprehensive set of guidelines for online-questionnaire design has the potential to improve subjective reaction to surveys delivered online.
81
82
Jo Lumsden, Scott Flinn, Michelle Anderson & Wendy Morgan
Finally, as mentioned in Section 2, online-questionnaires have the potential to reduce non-response errors as a result of questionnaire abandonment but only when appropriate measures are incorporated within the design of online-questionnaires. The significant improvement in extent of completion prior to abandonment for Q2 indicates that the comprehensive set of guidelines has demonstrated potential for combating this error type for online-questionnaires. We feel that the observed qualitative results of this study are encouraging for the further development of the guidelines, for the development of mechanisms for their inclusion in the design process for online-questionnaires, and ultimately for their role in the 'bigger picture' that is online survey delivery across many domains such as e-govemment, e-business, and e-health.
References American Statistical Association [1999], American Statistical Association Series: What is a Survey?, http://www.amstat.org/sections/srms/brochures/designquest.pdf (retrieved 200306-07). Badre, A. N. [2002], Shaping Web Usability: Interaction Design in Context, Pearson Education. Bandilla, W., Bosnjak, M. & Altdorfer, P. [2003], Self Administration Effects? A Comparison of Web-Based and Traditional Written Self-Administered Surveys Using the ISSP Environment Module, Social Science Computing Review 21(2), 235-43. Belson, W. A. [1981], The Design and Understanding of Survey Questionsy Gower Publishing. Brewer, J. [2001], How People with Disabilities Use the Web, W3C Working Draft. W3C. See http://www.w3.org/WAI/E0/Drafts/PWD-Use-Web/ for current and previous versions. CASRO [1998], Guidelines for Survey Research Quality, http://www.casro.org/ guidelines.cfm (retrieved 2(X)3-06-07). Council of American Survey Research Organization. Coyne, K. P & Nielsen, J. [2001], Beyond ACT Text: Making the Web Easy to Use for Users with Disabilities, Technical Report, Nielsen Norman Group. Coyne, K. P. & Nielsen, J. [2002], Web Usability for Senior Citizens, Technical Report, Nielsen Norman Group. Crawford, S. D., Couper, M. P. & Lamias, M. J. [2001], Web Surveys: Perceptions of Burden, Social Science Computing Review 19(2), 146-62. Dillman, D. A. [1978], Mail and Telephone Surveys: The Total Design Method, John Wiley & Sons. Dillman, D. A. [2000], Mail and Internet Surveys: The Tailored Design Method, second edition, John Wiley & Sons. Fink, A. [1995], How to Ask Survey Questions, Sage Publications.
What Difference Do Guidelines Make?
83
Hinton, S. M. [1998], From Home Page to Home Site: Effective Web Resource Discovery at the ANU, in H. Ashman & P. Thistlewaite (eds.), Proceedings of the Seventh International World Wide Web Conference (WWW7), Vol. 30(1-7) of Computer Networks and ISDN Systems, Elsevier Science, pp.309-16. See also http://www7.scu.edu.au/. Jackson, W. [1988], Research Methods: Rules for Survey Design and Analysis, PrenticeHall. Kothari, R. & Basak, J. [2002], Perceptually Automated Evaluation of Web Page Layouts, in Paper Presented in an Alternative Track of the Eleventh International World Wide Web Conference (WWW2002). http://www2002.org/CDROM/altemate/688/index.html. Kwak, N. & Radler, B. [2002], A Comparison Between Mail and Web Surveys: Response Pattern, Respondent Profile and Data Quality, Journal of Official Statistics 18(2), 257-74. Lindgaard, G. [1994], Usability Testing and System Evaluation: A Guide for Designing Useful Computer Systems, Chapman & Hall. Lumsden, J. & Morgan, W [2005], Online Questionnaire Design: Establishing Guidelines and Evaluating Existing Support, in M. Khosrow-Pour (ed.). Managing Modem Organizations with Information Technology: Proceedings of the 16th Information Resources Management Association International Conference (IRMA 2005), IRM Press, pp.407-10. Lynch, P J. & Horton, S. [1997], Web Style Guide, Yale University Press. http://www.webstyleguide.com/.
See also
National Cancer Institute [2002], National Cancer Institute's Research Based Web Design & Usability Guidelines, http://usability.gov/guidelines/index.html (retrieved 2003-06-10). National Institute on Aging & National Library of Medicine [2001], Making Your Web Site Senior Friendly, http://www.nlm.nih.gov/pubs/checklist.pdf (retrieved 2(X)3-06-19). Norman, K. L., Lee, S., Moore, P., Murry, G. C , Rivadeneira, W., Smith, B. K. & Verdines, P. [2003], Online Survey Design Guide, http://lap.umd.edu/survey_design/tools.html (retrieved 2003-06-17). OECD [2001], Bridging the "Digital Divide": Issues and Pohcies in OECD Countries, http://www.oecd.Org/dataoecd/10/0/27128723.pdf (retrieved 2003-06-03). Oppenheim, A. N. [1992], Questionnaire Design, Interviewing and Attitude Measurement, Pinter Publishers. Schonlau, M., Fricker, R. D. & Elliott, M. N. [2001], Conducting Research via E-mail and the Web, http://www.rand.org/publications/MR/MR1480/ (retrieved 2003-06-16). Stover, A., Coyne, K. P & Nielsen, J. [2002], Designing Usable Site Maps for Websites, Technical Report, Nielsen Norman Group. Stover, A. & Nielsen, J. [2002], Accessibility and Usability of Flash for Users with Disabilities, Technical Report, Nielsen Norman Group. Taylor-Powell, E. [1998], Questionnaire Design: Asking Questions with a Purpose, Technical Report G3658-2, University of Wisconsin. W3C [1999], Web Content Accessibihty Guidelines 1.0, http://www.w3.org/TR/1999/WAIWEBCONTENT-19990505 (retrieved 2003-06-08).
Designing Interactive Systems in Context: From Prototype to Deployment Tim Clerckx, Kris Luyten & Karin Coninx Limburgs Universitair Centrum — Expertise Centre for Digital Media Universitaire Campus, B-3590 Diepenbeek, Belgium Email: ftim.clerckx, kris.luyten, karin,coninx}@luc.ac.be URL: http://www.edm.luc.ac.be The possibility of communicating with the (in) direct environment using other devices and observing that same environment allow us to develop ambient intelligent applications which have knowledge of the environment and of the use of these applications. Despite the support for software development for this kind of application, some gaps still exist, making the creation of consistent, usable user interfaces more difficult. This paper discusses a technique that can be integrated into existing models and architectures and that supports the interface designer in making consistent context-sensitive user interfaces. We present an architecture and methodology that allows context information to be used at two different levels — dialogue and interdialogue levels — and ensures that the consistency of the interface is always maintained in the event of context changes during use of the software. Keywords: user interface design, context aware user interface, model-based user interface development.
1
Introduction
The spread of the ISTAG^ Ambient Intelligence scenarios has clarified future technological needs and developments. These scenarios indicate what the industry is looking for and how new technologies are applied, in which the user is the focal point. Although various components of these scenarios are still pipe dreams, the technology for other components is now ready to be applied to the situations discussed. ^ Information Societies Technologies Advisory Group, http://www.cordis.lu/ist/istag.htm.
86
Tim Clerckx, Kris Luyten & Karin Coninx
In order to familiarize the reader with our objectives, we introduce the following brief scenario, based on the test case envisaged in Schmidt et al. [1999]. A mobile telephone can react appropriately to various situations: in the example, use is made of light and heat sensors in combination with the buttons on the device. Thus, the interface will show the information differently in strong light (the telephone is 'visible') than in low-intensity light (the telephone is hidden in the bag). The telephone can thus assume, if there is no light and a higher temperature, that the mobile is buried in the inside pocket. By integrating context information with the design of the user interface, as proposed in this paper, the designer can for example easily ensure that, if the light is weak and the temperature is low, the dialogues contain less information, which can be shown larger because this may be desirable in that situation. Unlike the experiment described in Schmidt et al. [1999], the emphasis here is not on context gathering by abstraction of the sensors, but on making context information applicable in the user interface. In this paper, context in the software engineering process and, more specifically, in User Interface engineering, is integrated into model-based user interface development. Applications interrogate their environment using sensors and by communicating with other applications in their environment. For instance, an application builds up the context from all the data gathered. We concentrate not on the way in which context is gathered and combined, but on how the context information data can be used during both design and use of the user interface. Context can be defined in various ways [Dey et al. 2001; Coutaz & Rey 2002; Schmidt et al. 1999]. According to Dey, context is only relevant if it influences the user's task: A system is context-aware if it uses context to provide relevant information and/or services to the user, where relevancy depends on the user's task. This is why attention should be devoted, at the design stage, to the link between the tasks of the user on the one hand and to the type of context influencing these user tasks on the other hand. Here, we use the following definition, based on Dey's definition of a context-sensitive system [Dey et al. 2001] and with the focus on sensing the environment in order to pursue uniformity and clarity in this work: Context is the information gathered from the environment which can influence the tasks the user wants to, can or may perform. Despite extensive research concerning the acquisition, integration and interpretation of context information in software applications, as far as we know no other initiatives exist which integrate context at all stages of the design and use of the interface. In this context, we introduce DynaMo-AID, part of the Dygimes [Coninx et al. 2003] framework. DynaMo-AID is both a user interface design process and a runtime architecture which makes use of extensions of conventional models such as task, dialogue, presentation and application models, in order to support the design of context-aware applications. The Dygimes framework provides support for combining and further processing these different models. Thus, the framework contains a renderer that converts a high-level device-independent XML description into a concrete user interface, a module for combining these XML descriptions with a task model and an algorithm that can calculate a dialogue model from the temporal relationships in the task model [Luyten et al. 2003].
Designing Interactive Systems in Context: From Prototype to Deployment
87
DynaMo-AID was introduced in Clerckx et al. [2004a] where the models to design the user interface were discussed. Clerckx et al. [2004b] elaborates on the context-sensitive task model which is used in the design process. This paper describes how context can be applied in the design and creation of the user interface with the emphasis on the runtime architecture and the support for prototyping. The implementation is entirely written in Java so as to make it available on as many platforms as possible. The rest of this paper will discuss this in more detail. The following section gives a summary of related work, so that this work can be positioned within existing research. We then show that our approach can be regarded as complementary to various other initiatives. In the subsequent sections, we show how context can be taken into account in the design phase of the user interface. One way this is supported is to automatically generate prototype user interfaces enabling the genuine look and feel of the context-aware interface to be experienced at an early stage of interface design. Following the discussion of current achievements, future work is discussed and appropriate conclusions drawn.
2 Related Work Writing software that can adapt to the context of use is an important component of present-day applications. Support for these types of application is still a major topic for research. The Context Toolkit [Salber et al. 1999; Dey & Abowd 2004, 2000] is probably the best known way of abstracting context and making it 'generally' usable. A clear division of components that embed, aggregate and interpret context information is envisaged. A method is proposed in Dey et al. [2001] to give end users the opportunity to develop context-aware applications. However, less emphasis is placed on the effects and use of context on and in the user interface; instead they are targeting software engineers for the development of context aware applications. An abstraction of context is useful for processing the data from sensors in a generic way in an application. The use of sensors is the ideal way to obtain information from the immediate physical environment. Thus, we see that procuring and interpreting the right information from the sensors can be dependent on the application that is built [Schmidt et al. 1999]. A thorough survey of context sensing systems and their use in an interactive system is given in Schmidt [2002]. In contrast to what Schmidt proposes in his work, we concentrate less on the hardware component and more on the software component: conventional user interfaces that are influenced by context changes. Despite the increasing penetration of computer systems into physical objects, it is also still important to provide current devices (PDA, mobile phone. Tablet PC, ...) with context-sensitive user interfaces, see Hinckley et al. [2000], Schmidt et al. [1999]. Contextors by Coutaz & Rey [2002] are geared more towards the user. In addition to an encapsulation of context information, similar to the encapsulation envisaged in the Context Toolkit, the user and interaction with the user are given more consideration. By defining a software model (a variation of the Arch model [Coutaz 1994]), consideration is explicitly given to the influence that context changes can have on the dialogue with the user. Henricksen & Indulska [2004] approaches context less from the point of view of the sensors, but their work shows an integration of context into the software
Figure 1: Comparison of three architectures for context-aware applications.
engineering methodology. This is a necessary evolution in order to maintain the consistency and relevancy of the software models when building contextsensitive applications. However, so far this is not related to software development environments that abstract context, such as the context toolkit or contextors. Winograd [2001] distinguishes between three models for context management: Widget-based: Widgets encapsulate the device drivers at a more abstract level. Interaction with these entities takes place by sending messages to the widget and interpreting messages from the widget. Networked services: In this case, a service to which applications can be linked presents context information. Blackboard approach: Here, context information is held at a central point and applications can sign in to be notified if they want to make use of this information. A modern approach is to combine these three models into a hybrid model. One can argument that a widget-based approach implies a very close connection between devices and the direct dialogue with the user. An established infrastructure approach of networked services is the context fabric [Hong & Landay 2004]. In this infrastructure, devices can query an infrastructure and register for events in order to make use of context information. Communication between the devices and the infrastructure is supported with an XML-based Context Specification Language. The approach we present here involves a more strict division and context will only indirectly influence dialogue with the user. As a result of the strictly layered structure in the model presented below, at the lowest levels it is possible to work with both networked services (Web services, for example) and via the blackboard approach (achieving a socket connection with a context server). In the following section, we will examine more closely how the model here complements the existing context management models rather than trying to replace them.
3 Context Architectures Figure 1 shows the comparison between two architectures and our own architecture, which are suitable for gathering and processing context information from contextsensitive applications.
Designing Interactive Systems in Context: From Prototype to Deployment
89
The first architecture (Figure la), described in Henricksen & Indulska [2004], fits into a software engineering framework for context sensitive pervasive computing. Here, the abstraction for the context manager is already placed at the level of context gathering. The context manager interprets information from the contextgathering layer, which is translated by the context reception layer and acts as a distributed database for the application, which can relay queries to this context manager. This architecture takes no account of the effects on the user interface: a software engineering process that uses it will not provide explicit support to the influence of context changes in the user interface. The Context Toolkit [Dey & Abowd 2004; Salber et al. 1999; Dey & Abowd 2000] (Figure lb) is an object-oriented architecture that allows the application to approach the context information at different abstraction levels. This is not therefore a strictly layered structure, as is the case in Pervasive Computing Architecture. The application layer can access both directly objects that only encapsulate sensors as well as objects that abstract context information instead, such as objects that aggregate or even interpret context information. The Context Toolkit contains context widgets for sensor encapsulation, interpreters to interpret context data, aggregators to merge several types of context widget and a discoverer to search out services in the environment. Figure Ic shows our approach. Here, the raw context information with respect to the application and the user interface are dealt with separately. The user interface can only be influenced by the context by communicating events and associated data (may or may not be abstract information about the detected context change) to a superior layer. If a layer receives an event from a lower layer, it will interpret the associated information and decide whether the context change is significant enough to pass on the interpreted information to the next layer up. Significant means a defined threshold has been exceeded or a value change has taken place invoking a change. The user interface can thus only be updated by the dialogue controller which willfirsthave been notified by the Context Control Unit, which has in turn picked up an event from an abstract context object and so forth. In this way, the user interface designer can confine her/himself to modelling of the user interface at an abstract level, and how this is liable to context. This implies no account needs to be taken of how context information is available in the application. By contrast, the application can make use of the different levels of context information. Thus, the programmer still has the freedom to include concrete context information for her/his application. In this way, we attempt to combine the various approaches of thefirstand second architectures for the application of context information to the user interface and the application respectively. As a result of the difference in approach to the use of context information between application and user interface and the distinction between application and user interface, this architecture lends itself to integration in a human computer interaction (HCI) framework. Note that in our approach the user interface can only be changed via the dialogue controller, which makes use of models (task, dialogue and environment models). This protects consistency with the presentation defined by the designer, with the result that the usability available to the user when generating the user interface from the models is ensured.
90
Tim Clerckx, Kris Luyten & Karin Coninx
4 Designing Context-Sensitive Applications Before we explain the design phase, it is necessary to say how we structure context information. Several approaches are already available for modelling and using context in applications. Coutaz & Rey [2002] define the contextor as a software abstraction of context data which interprets sensor-based or aggregated information. In this way, several contextors can be merged to form a logical component. The Context Toolkit [Dey & Abowd 2004] includes abstract widgets for: • Encapsulating raw context details and abstracting implementation details. • The re-use of context widgets for various applications. In our approach, we choose to deal with these two goals at two abstraction levels. On the one hand, low-level widgets can separate raw context/sensor data from further interpretation. On the other hand, high-level widgets are easy to use and re-use for user interface designers and applications. We define: A Concrete Context Object (CCO) is an object that encapsulates a type of context (such as low-level sensors). An Abstract Context Object (ACO) is an object that can be queried about the status of the context information. An ACO uses at least one CCO. At runtime the number of CCOs can change depending on the services available at that time. The function of the ACO can be compared to the interpreter in the Context Toolkit. However, the ACO entirely encapsulates all sensor data and is the only link between user interface designer and context information. As a result, the user interface designer is only confronted with abstract context information. Our architecture lends itself to split the implementation of an application into the implementation of three independent parts: the user interface, the context objects and the application core. Contracts exist between these parts that make it possible to envisage communication between these entities. User Interface: The user interface designer concentrates on modelling the user interface using a model-based approach. Communication between user interface and application is provided by linking application tasks in the task model with methods available for the user interface in the application core. In order to integrate context information, the user interface designer can select from abstract context objects. Context Objects: The context objects can be separately implemented, for example by engineers for the encapsulation of hardware components (CCOs) and by AI specialists for the interpretation (ACOs) of raw context information. Application Core: The application core can be implemented by software engineers and can make use of all artefacts supplied by the Context Object section. This contrasts with the user interface, which can only be influenced by context changes via the dialogue controller.
Designing Interactive Systems in Context: From Prototype to Deployment
91
User interface designers do not always have a programming background. For this reason, we decided to design the user interface in a more abstract way than purely writing code. Several approaches already exist, which deal with user interface design using the model-based approach [Mori et al. 2003; Puerta 1997; Coninx et al. 2003] in order to design user interfaces for various types of context. The first step in the user interface design process is to draw up the task model. Since a traditional task model can only model static user interfaces, we expanded the ConcurTaskTrees (CTT) notation [Clerckx et al. 2004b]. The CTT notation [Patemo 1999] provides a graphical syntax, a hierarchic structure and a way of establishing temporal relationships between various (sub) tasks. A very important advantage of the CTT formalism is the generation of Enabled Task Sets (ETS) out of the specification. Patemo [1999] defines an ETS as: a set of tasks that are logically enabled to start their performance during the same period of time. We use the set of ETSs that can be identified in the task specification as a basis for the different dialogues the user interface will need to complete its tasks. We adapted this notation such that context information can be explicitly taken into account in the task specification. In order to achieve a concrete user interface, it is assumed that the designer adds abstract user interface components to the task model. This information is platform-independent so that the rendering backend can ultimately use this information to construct a concrete user interface for various platforms. The next step consists of completing the dialogue model. In order to support the designer, the statuses and transitions between the various individual dialogues are automatically generated, so as to simplify the work of the designer. The tool includes an algorithm to calculate the different dialogues and transitions between dialogues from the task specification [Luyten et al. 2003]. Afterwards, the designer can always adjust these transitions: she/he can add or remove transitions according to the results of a previous testing stage or the designers experience for example. This way the user interface designer can manipulate transitions that would be triggered by context changes. The designer thus has control over the influence of context on the usability of the user interface. More information about the design of the specific models can be found in Clerckx et al. [2004a] and Clerckx et al. [2004b].
5 Prototyping the User Interface Figure 2 shows the DynaMo-AID Design tool; a complete set-up with custom sensors is shown in Figure 3. The set-up in Figure 3 consists of a Printed Circuit Board (PCB) with a temperature and light sensor and some push-buttons, a laptop connected to this PCB on which the CCOs are located which encapsulate the sensors on the PCB and a PC with our prototyping tool, where the ACOs are directly connected to the laptop. Note that context information, user interface models and prototypes are integrated in the same environment. If we go back to our case study from the introduction, we can use the sensors to derive information from the environment of the mobile telephone [Schmidt et al. 1999]. Suppose we want to distinguish between 3 possible statuses for the mobile telephone: in the hand (temperature and pressure increase), lying on the table (standard light intensity, temperature and pressure) and hidden (fall in light
Designing Interactive Systems in Context: From Prototype to Deployment
Cp.ntjWrt.Ac.quljiJtljC>n • •
93
?.®*}t*'ll^JB9![*i?!L*l'3 AQ^ JPJ^rPf?!*^?.". Effect o n User interfece
Figure 4: An overview of the different types of context input, and how they communicate with the user interface.
intensity). If the mobile telephone is in the user's hand, the user can perform normal tasks, such as composing and reading text messages, telephoning, etc. If the mobile telephone is placed on the table, a clock will appear and interaction with the device is possible via a microphone and a loudspeaker. In order to design the user interface, the designer first has to draw up a task model, using a notation which allows it to describe tasks for various types of context. Thus, the designer must specify which tasks are possible if the mobile telephone is in the user's hand, if it is lying on the table or if it is hidden. A separate, complete dialogue model will then be calculated automatically for these three types of context and presented to the designer. The designer her/himself can then indicate between which statuses transitions are possible under the influence of context changes (we will call these interdialogue transitions, see next section). The designer can thus decide, for example, only to make a transition from telephone in the hand to telephone on the table or telephone hidden if the user interface is in the main menu status. This avoids the user interface adjusting if this is not desirable. If the user is entering a text message and she/he puts the mobile telephone down on the table, he/she would not always be pleased if the user interface suddenly adjusted itself. In this way, the designer keeps usability under control. ACOs are linked to these transitions to make it clear what has to be taken into account in order to make the transition. An example of an ACO is the mobile-in-hand object. This object can indicate whether the telephone is in the user's hand, using the CCO for the temperature sensor and the CCO for the push-button. Figure 4 shows that use of CCOs can mean that several types of sensor are possible. This is shown in the left part of the figure and will be further explained in this paragraph. The middle part shows the context aggregation and interpretation components. The right part shows that the dialogue controller changes the state of the user interface prototype caused by a context change. Due to the separation of these parts the context acquisition can happen in several ways. Firstly, the typical hardware sensors, as in our hardware set-up. Furthermore, the tool also provides support for working not only with hardware sensors, but also for linking the CCOs to a softwarelike simulation panel, where the designer can simulate context situations by adjusting manipulators on the panel so that the CCOs are controlled by them. In this way.
Jim Clerckx, Kris Luyten & Karin Coninx
94
::«;S™:S:;:::.m<:f:S¥:¥f«iW^^^^^
W^^^^K^i^^l^f^^Mm^^.
llllllllJlllg^^^^^^^^^^^
Figure 5: The Dialog Predictor panel.
the designer can rapidly test his prototype without the hardware implementation (encapsulation) having to be achieved and the work can easily be linked to genuine sensors by replacing the CCOs. Another possibility is to operate a software agent via the CCO, which autonomously generates data and, in this way, enables the user interface to be tested by simulation. This method is also known as the 'Wizard of Oz' method: hardware can be simulated in our design tool without the need for the actual hardware, similar to that which is provided in the Papier Mach6 toolkit [Klemmeretal. 2004]. One of the major possibilities of the DynaMo-AID Design tool is the prediction of possible changes in the User Interface following termination of a task, the implementation of a user action or a context change. Figure 5 shows the Dialogue Predictor panel: it shows the tasks from the task model that are currently valid (left), which transitions are possible to other tasks (centre) and how these tasks will then be visualized (right). The combination of these different factors (task model, dialogue model, context information and user actions) takes place in real-time. The design tool generates a prototype user interface which it derives from the tasks in the task specification. The specific presentation of a gathering of tasks is generated from a device-independent XML-based user interface description which the designer can attach to the tasks. These user interface descriptions are described in more detail in Clerckx et al. [2004a] and are beyond the scope of this paper.
6 The DynaMo-AID Runtime Architecture 6.1 Smooth Migration from Design to Deployment The runtime architecture is intended to provide a smooth transition between the design and deployment of the context-sensitive user interface. This is ensured by abstracting the context information, which means that both genuine information
Designing Interactive Systems in Context: From Prototype to Deployment
95
hm':-'m^'MWH'm'immi^
t c e o jflif^C ACO ]
r
Context Control Untt (CCU)
Target Device
^U<^ Figure 6: The DynaMo-AID runtime architecture.
from sensors and simulators can be used in the design phase. Simulators can be both agent-based (Concrete Context Objects which autonomously simulate data) or directly dictated by the designer. The latter is very valuable during design of the user interface. These user interfaces therefore have a strong link with the information provided from the immediate environment in all phases of their creation (design, testing and deployment). Figure 6 gives a summary of the DynaMo-AID runtime architecture.
6.2 Context-driven Dialogues Three major components can be detected here in Figure 6: The application core: is the heart of the application and consists of the application running on the device, together with any services made available during the application runtime. The dialogue controller: controls communication between user interface, abstract context information and the application core. The dialogue controller possesses information about the user's tasks and how these can be influenced by the context. The CCU: encapsulates context information at such an abstract level that it only tells the dialogue controller that the context change that has taken place is significant enough to adjust the status of the user interface. The dialogue controller has a dynamic dialogue model and a dynamic task model at its disposal in order to decide when the user interface has to be updated. .These dynamic models are extended versions of traditional models, but adjusted so that account can be taken of the current context, if this influences the tasks the user can, wants to or may perform. For the task model, we use the decision nodes
96
Tim Clerckx, Kris Luyten & Karin Coninx
notationfromearlier work [Clerckx et al. 2004b]. This dynamic task model produces information during the design process about the dialogue model and a tool ensures that, during design of the dialogue model, the consistency with the previously defined tasks, which may or may not be liable to context changes, is guaranteed. The dynamic dialogue model consists of possible statuses of the user interface, such as in a traditional State Transition Network [Pamas 1969]. The difference is in the transitions that can occur. Here, we make a distinction between intra-dialogue and inter-dialogue transitions. An intra-dialogue transition is a transition between two statuses which is performed if the task described for the transition is performed by the user or the application. An inter-dialogue transition, by contrast, is a transition between two possible statuses of the user interface, but can only be performed if a context change has taken place which fulfils the conditions defined by the designer for the transition. Both types of transitions are made explicit to the designer in the tool by means of the Dialogue Predictor panel (Figure 5). From the time the application is launched, the status of the user interface and the application can be changed by three actors:, the user, the application and the CCU. Firstly, the CCU will detect the current context, supplied by the abstract interaction objects and the dialogue controller will be notified of the status in which the user interface will have to be launched. From then on, the user interface can be changed by the three actors. The user interacts with his device and can thus manipulate the user interface. The presentation renderer on the target device thus communicates events to the dialogue controller and this can then decide whether the status of the user interface needs to be changed or whether information about the user's action should be relayed to the application core. The application can also influence the user interface, for example, showing the results of a query if the application has finished processing the query. On the other hand, a service which becomes available for the application, for example upon entry to a WiFi hotspot, can influence the user's tasks and the dialogue controller will have to take this into account. The last actor on the list is the CCU. The following section explains how this actor reacts to context changes, detected by abstract context objects, and how context information reaches the CCU.
6.3 Context Sensing and Processing As previously mentioned, we distinguish between abstract and concrete context objects (ACOs and CCOs). The aim of this is to make available a number of highlevel widgets to the application and the user, without the latter having to know how all this has been implemented and so that the underlying implementation and sensors used can be changed. This additional distinction also offers the possibility of being able to change the sensors used during the runtime of the application, depending on the services available now and in the future. The CCU does after all ensure that mapping takes place from ACO to CCOs each time new services become available or when services disappear. In the first instance, this can easily be implemented by subdividing the CCOs into various categories. The ACOs then indicate to the CCU the categories of CCOs from which they can use context information. Making a distinction between ACOs and CCOs also means that these can be programmed separately and therefore by other people with a different background.
Designing Interactive Systems in Context: From Prototype to Deployment CCOs can for example be made by hardware specialists who can program close to the device driver, while ACOs can be progranmied by specialists in Artificial Intelligence so as to have the context interpreted in an intelligent way. The task of the CCU can be divided into three sub-tasks: • Recalculating the mapping of ACOs on CCOs: a service can be a supplier of context information. If this is the case, the CCU can make use of this and treat the service as a CCO and link to ACOs which can make use of this context information. • Detecting context changes: if a context change takes place in an ACO, the CCU will look at the dialogue model in order to decide whether the context change has a direct influence, so that an interdialogue transition has to be implemented. • Invoking an interdialogue transition: the CCU sends an event to the dialogue controller and tells it that a context change has taken place and that the interdialogue transition has to be implemented (as can be seen on the rightside of Figure 4). The section discussing context architectures has already stated that, from the point of view of the user interface, use is made of a strictly layered structure. By building in these abstraction levels, changing the user interface under the influence of context changes remains fully controlled. After all, context changes can only have an effect on the user interface in the statuses and for the context indicated by the user interface designer in the design phase. This layered structure does however imply some additional complexities. For instance, when passing on information to an adjacent layer, an interpretable artefact always has to be given as well. In order to transmit information between hardware sensors and CCOs, little has to be established conventionally since implementation of the CCOs usually has to take place ad hoc because this will be highly specific code. ACOs combine CCOs and contain code for interpreting groups of CCOs. If it is evident from this interpreted information that a context change has taken place, this is notified to the CCU. If a transition then exists in the dialogue model to follow up this context change, the dialogue controller will be notified to invoke the appropriate transition.
7 Future Work Despite the improved harmonization of the activities of the hardware engineer (building sensors), the application progranmier and the interface designer, a reliable link with a traditional software engineering process is missing, which could lead to complete integration. One possibility in this respect is to integrate the ModelDriven Architecture^ more fully with the various interface models already used in this context. Another important area of application, to which we will devote more attention in the future, is interfaces for disabled people. Here, it is of the utmost importance ^ hhtp://www.omg.org/mda/
97
98
Tim Clerckx, Kris Luyten & Karin Coninx
for the interface always to react appropriately and predictably to changes in the environment. Predictability is guaranteed by the model-based approach, which clearly indicates which tasks the interface has to fulfil. The possibilities offered by the interface are thus clearly demarcated and the influence of context on the interface is determined at the design stage. Nevertheless, the solution we propose here also offers the possibility of dealing with new situations (for example, a remote software service that fails) without this having to be explicitly modelled.
8 Conclusion We believe the work presented here is only one step towards a fully integrated development environment to develop context-sensitive user interface design. Various models from model-based user interface development (task, dialogue and presentation model) are combined with potential dynamic context changes in the design phase of the interface. As far as we know, this is currendy the only approach to involve context information in this way in the design process of an interactive system. However, more work is required in order to achieve an environment that can make highly complex interfaces. We have demonstrated in this paper how existing context management systems can be expanded to take the interaction section into account as well. By abstracting the context to the dialogue layer of the interface, we can explicitly take into account context changes during design of the interfaces. Moreover, the abstraction layer here ensures that context can be simulated and a working prototype can therefore easily be created in a laboratory setting. We expect this to increase the usability of the final interface since it is possible to perform user testing early in the design process.
Acknowledgements Part of the research at EDM is funded by EFRO (European Fund for Regional Development), the Flemish Government and the Flemish Interdisciplinary institute for Broadband technology (IBBT). The CoDAMoS (Context-Driven Adaptation of Mobile Services) project IWT 030320 is direcdy funded by the IWT (Remish subsidy organization).
References Clerckx, T, Luyten, K., & Coninx, K. [2004a], DynaMo-AID: A Design Process and a Runtime Architecture for Dynamic Model-based User Interface Development, in R. Bastide, R Palanque & J. Roth (eds.), Proceedings of the 9th IFIP Working Conference on Engineering for Human-Computer Interaction Jointly with the 11th International Workshop on Design, Specification and Verification of Interactive Systems (EHCI-DSVIS 2004), Vol. 3425 of Lecture Notes in Computer Science, Springer. In press. Clerckx, T, Luyten, K. & Coninx, K. [2004b], Generating Context-sensitive Multiple Device Interfaces from Design, in L. Jacob & J. Vanderdonckt (eds.), Proceedings of the 9th ACM International Conference on Intelligent User Interface jointly with the 5th International Conference on Computer-Aided Design of User Interfaces (lUI-CADUI 2004), ACM Press, pp.288-301.
Designing Interactive Systems in Context: From Prototype to Deployment
99
Coninx, K., Luyten, K., Vandervelpen, C , van den Bergh, J. & Creemers, B. [2003], Dygimes: Dynamically Generating Interfaces for Mobile Computing Devices and Embedded Systems, in L. Chittaro (ed.), Human-Computer Interaction with Mobile Devices and Services: Proceedings of the 5th International Symposium on Mobile Human-Computer Interaction (Mobile HCI2003), Vol. 2795 of Lecture Notes in Computer Science, SpringerVerlag, pp.256-70. Coutaz, J. [1994], Software Architecture Modeling for User Interfaces, in J. J. Marciniak (ed.), Encyclopedia of Software Engineering, John Wiley & Sons, pp.38-49. Coutaz, J. & Rey, G. [2002], Foundations for a Theory of Contextors, in C. Kolski & J. Vanderdonckt (eds.). Proceedings of the 4th International Workshop on Computer-aided Design of User Interfaces (CADUI2002), Vol. 3, Kluwer Academic Publishers, pp. 13-33. Invited talk. Dey, A. K. & Abowd, G. D. [2000], The Context Toolkit: Aiding the Development of Context-aware Applications, Paper presented at the Workshop on Software Engineering for Wearable and Pervasive Computing, http://www.cc.gatech.edu/fce/contexttoolkit/pubs/ SEWPCOO.pdf (last accessed 2005-06-07). Dey, A. K. & Abowd, G. D. [2004], Support for the Adaptation and Interfaces to Context, in A. Seffah & H. Javahery (eds.). Multiple User Interfaces, Cross-Platform Applications arul Context-Aware Interfaces, John Wiley & Sons, pp.261-96. Dey, A. K., Salber, D. & Abowd, G. D. [2001], A Conceptual Framework and a Toolkit for Supporting the Rapid Prototyping of Context-aware Applications, Human-Computer Interaction 16(2-4), 97-166. Henrickisen, K. & Indulska, J. [2004], A Software Engineering Framework for Contextaware Pervasive Computing, in A. Tripathi et al.(eds.). Proceedings of the Second IEEE International Conference on Pervasive Computing and Communications (PerCom '04), IEEE Computer Society Press, pp.77-86. Hinckley, K., Pierce, J., Sinclair, M. & Horvitz, E. [2(X)0], Sensing Techniques for Mobile Interaction, in M. Ackerman & K. Edwards (eds.). Proceedings of the I3th Armual ACM Symposium on User Interface Software and Technology, UIST'OO, CHI Letters 2(2), ACM Press, pp.91-100. Hong, J. & Landay, J. A. [2004], Context Fabric: Infrastructure Support for ContextAwareness, http://guir.cs.berkeley.edu/projects/confab/. GUIR: Berkeley Group for User Interface Research. Klemmer, S. R., Li, J., Lin, J. & Landay, J. A. [2004], Papier-Mach6: Toolkit Support for Tangible Input, in E. Dykstra-Erickson & M. Tscheligi (eds.). Proceedings of SIGCHI Conference on Human Factors in Computing Systems (CHI'04), ACM Press, pp.399-406. Luyten, K., Clerckx, T, Coninx, K. & Vanderdonckt, J. [2003], Derivation of a Dialog Model from a Task Model by Activity Chain Extraction, in J. Jorge, N. Jardim Nunes & J. FalcSo e Cunha (eds.). Interactive Systems. Design, Specification, arui Verification: Proceedings of the lOth International Workshop, DSV-IS 2003, Vol. 2844 of Lecture Notes in Computer Science, Springer-Verlag, pp. 191-205.
100
Tim Clerckx, Kris Luyten & Karin Coninx
Mori, G., Patemb, F. & Santoro, C. [2003], Tool Support for Designing Nomadic Applications, in D. Leake, L. Johnson & E. Andre (eds.), Proceedings of the 8th ACM International Conference on Intelligent User Interface (IUI2003), ACM Press, pp. 141-8. Pamas, D. L. [1969], On the Use of Transition Diagrams in the Design of a User Interface for an Interactive Computer System, in Proceedings of the 1969 24th National Conference, ACM Press, pp.379-85. Patemd, F. [1999], Model-Based Design and Evaluation of Interactive Applications, Springer-Verlag. Puerta, A. [1997], A Model-Based Interface Development Environment, IEEE Software 14(4), 40-7. Salber, D., Dey, A. K. & Abowd, G. D. [1999], The Context Toolkit: Aiding the Development of Context-enabled Applications, in M. G. Williams & M. W. Altom (eds.), Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: The CHI is the Limit (CHI'99), ACM Press, pp.434-41. Schmidt, A. [2002], Ubiquitous Computing — Computing in Context, PhD thesis, Lancaster University. Schmidt, A., Aidoo, K. A., Takaluoma, A., Tuomela, U., van Laerhoven, K. & van de Velde, W. [1999], Advanced Interaction in Context, in H.-W. Gellersen (ed.). Handheld and Ubiquitous Computing: Proceeding of the First International Symposium on Handheld and Ubiquitous Computing (HUC 1999), Vol. 1707 of Lecture Notes in Computer Science, Springer-Verlag, pp.89-101. Winograd, T. [2001], Architectures for Context, Human-Computer Interaction 16(2-4), 40119.
Using Context Awareness to Enhance Visitor Engagement in a Gallery Space Peter Lonsdale^, Russell Beale* & WUl Byrne^ ^ School of Engineering, * School of Computer Science, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK Email: fpxl, rxb,
w,fbyme}@cs,bham.ac,uk
Context-awareness can greatly enhance the usability of mobile devices by making it possible for users to continue with other activities without having to pay too much attention to the device. At the same time context-aware applications can provide timely support for user activities by responding to changes in the user's state and acting accordingly. We describe our work on developing a generic context awareness architecture that is being deployed in a gallery space to enhance learner engagement with the gallery exhibits. Our system makes use of contextual information to determine what content should be displayed on the device. Users can also navigate this content by explicitly changing their context in the dimensions of physical location and dwell time. Visitors have the opportunity to physically interact with the abstract information layer that is overlaid on the gallery space. The system also actively encourages movement in the gallery by identifying links between paintings. We describe our architecture, implementation, and the design challenges faced in deploying this system within a gallery. Keywords: context awareness, mobile learning, museum, mobile usability, PDAs, location awareness.
1 Introduction Context awareness is a relatively nascent field of research that centres on the use of information pertaining to the user and their environment to drive the behaviour of a device or system (for reviews of context-awareness applications and research perspectives, see Chen & Kotz [2000] and Dourish [2004]. By taking account of contextual information, systems can be made easier to use, and can provide more appropriate responses, than if they respond passively to user requests. There is currently considerable interest in the use of context awareness to provide enhanced
102
Peter Lonsdale, Russell Beale & Will Byrne
usability in this way, and in particular developers of mobile applications are looking to context awareness to provide solutions to the specific problems posed by the design limitations of mobile computing devices. Context awareness is especially important for mobile devices because it allows us to overcome the usability problems associated with small, handheld devices [Sharpies & Beale 2003], and also to make effective use of the user's physical and social surroundings to provide timely support to their activities. We can provide enhanced user support by using contextual information to drive mobile applications, and simultaneously we can exploit physical context as a means to interact with the application. For example, a user's location can drive delivery of content on a device, and at the same the user can utilize the system's sensitivity to their location as a navigation tool, selecting different items of content by altering their physical location.
/. 1 Context Awareness for Mobile Learning Mobile learning is too often conceived of as simply the mobile equivalent of eleaming. The assumption is that learning can be delivered through content displayed on mobile devices in the same way as it is displayed on other systems such as desktop PCs. However, the use of mobile computing devices is qualitatively different to the use of other computing devices, and we must take account of this when developing m-leaming applications. We must also consider that a user with a mobile device is often much more influenced by their surroundings than a user of a desktop PC might be. Mobile devices such as phones and PDAs are used in a huge variety of settings and environments, and we cannot rely on having the user's full attention. Mobile learning is not something that can be delivered, it is something that might happen, given the right combination of learner, surroundings, content, and activity. This serendipitous nature of mobile learning is further enforced by the very informal way in which people use things like mobile phones and PDAs. They are not setting out to learn something, they are often engaged in something else entirely, and we must make the best use we can of the devices they have to hand to support their activities [Rogers 2002]. We can do this using context awareness to ensure that the device is always ready with relevant information, but does not need to distract the user in order to achieve this readiness. The EU 1ST Project MOBIleam [Bormida et al. 2002] has focused on the development of a large scale platform for delivering learning content to learners with mobile devices, including mobile phones. Tablet PCs, and PDAs. As part of this work, the University of Birmingham has developed an architecture for context awareness that is currently being deployed at the Nottingham Castle Museum gallery to provide an enhanced and more engaging experience for visitors to the gallery. Our aims have been: • To provide timely support to the user. • To allow the user to maintain their attention on the world.
Using Context Awareness to Enhance Visitor Engagement in a Gallery Space • To allow the user to inspect, understand, and alter the current context model for their own purposes.
1.2 Context Awareness for Museum and Galleries Museums, galleries, and heritage sites seek to engage visitors in the artefacts they exhibit, as well as encourage participation in the learning space provided. The use of computer technologies in museums is not a new concept, and kiosk-based content presentation and interactive exhibits are a conmion sight. Mobile devices offer opportunities to provide technological means to engage visitors whilst they are situated within the gallery space itself. Moreover, the small size and portability of these devices means that we can seek to engage the user without distracting their attention from the exhibits they are trying to enjoy. Context-aware mobile applications have been used effectively to deliver supporting information to tourists in the form of location-aware tourist guides, for example the CyberGuide project [Abowd et al. 1997]. Similarly, locationaware applications have been used to deliver content that is appropriate to a visitor's particular location within a gallery space — notable examples include Tate Modem's multimedia pilot study [Proctor & Burton 2003] and the CAERUS system [Naismith & Smith 2004]. Wearable and mobile computers have also been used to provide augmented experiences that go beyond basic location awareness. Baber et al. [2001] describe a system that combines location-dependent content delivery with profiling of visitor needs, to provide a visit tailored to individual requirements. Oppermann & Specht [1999] describe a system that uses contextual information as the basis for supporting the user but not distracting them, whilst MacColl et al. [2002] describe their experiences of combining context and virtual presence. Location remains the primary feature of context that is exploited in most context-aware applications [Bristow et al. 2002; Chen & Kotz 2000; Dix et al. 2000; Selker & Burleson 2000]. To provide effective support for visitors to museums and galleries it is crucial to know where they are. Knowing what area a visitor is in means we can offer appropriate content and suggest possible activities. Knowing exactly which artefact a visitor is currently looking at means we can offer content and activities specifically for that artefact.
1.3 Uses of Context A visit to a museum is not just a series of stops in front of artefacts. The experience has a beginning, a middle, and an end. It is a process. We have sought to address this by considering visitor movement within the gallery. We began by designing our system to use location and timing information to provide appropriate content to users with mobile devices in a gallery space. What became clear was that the delivery of content in this way allowed us to encourage visitors to interact with the artefacts in a different way. A review of the content supplied to us by the gallery indicated that many of the paintings on display shared interesting histories or were linked in some way that was never made visible to visitors. By flagging these connections to the visitors we are able to encourage greater movement between the paintings on
103
104
Peter Lonsdale, Russell Beale & Will Byrne
display, beyond the basic linear path that most people follow in the gallery. This physical engagement with the learning space is an often neglected facet of learning. In our preliminary trials of our prototype, we discovered that because the system relies on context to deliver content, changes in that context can be exploited by the user themselves to deliberately trigger content changes and hence move to another item of content when desired. In other words, context becomes not just a mechanism for the system to select content, but also a tool with which the user themselves can navigate the information space. These two aspects of context aware gallery exploration, context as content selector and context as navigation tool, have driven our subsequent development of a system to support visitors to the gallery. The concept is that an information space that is overlaid on the existing physical space of the gallery can be navigated through physical means, engendered by the implementation of a context-sensitive application. Mobile devices are hard to use, because they have small screens, and the user is usually trying to do something else at the same time as navigate the onscreen menus. By using physical movement as an interaction method, we can give people a new way to interact with the information space that we have overlaid onto the physical gallery space.
2 Context Awareness Architecture Context aware applications typically involve the use of rulesets or some other kind of matching system to generate appropriate system responses (e.g. content display or option selection) from appropriate stimuli (e.g. changes in location, orientation, lighting levels, user input etc.). This approach requires the definition of fairly rigid rules (or their equivalent) and an exhaustive set of possible responses that the system can make. Our approach has been driven by the need to support the process of learners moving through a learning space, and hence our model and architecture for context awareness is much more process centred. Another motivating factor in moving away from programmatically defined rules is the desire to support content developers and experience designers who wish to make use of context aware applications without wanting to engage in software development. We have devised an architecture, described in detail in Beale & Lonsdale [2004], for context awareness that involves the definition (in textual form) of a set of software objects called context feature objects. Each of these context feature objects responds to a specific stimulus from actual context data, and responds by searching for matching metadata tags on the available content that the system can currently deliver. Any match results in the current score for that item of content or action being increased. When all available content and actions have been scored, those with higher scores are deemed to be of more relevance to the current context than those with lower scores. This scoring or ranking process occurs every time the context changes, e.g. whenever the user moves to another location. The context awareness system is configured by specifying a set of context feature objects and link objects using a structured syntax that the system parses at runtime to generate actual software objects that perform the context awareness processing. In specifying context feature objects, it is necessary only to know the
Using Context Awareness to Enhance Visitor Engagement in a Gallery Space name of the metadata tag that is appropriate and the range of values that a context feature object should respond to. Links between context feature objects are similarly defined. In this way we have provided a non-progranmiatic interface to the context awareness system, and one that could easily be translated into even more usable tools such as a graphical user interface for the configuring of context aware applications. Contextual data itself is assumed to be gathered by separate systems, and is input into the context awareness architecture in a generic fashion by specifying simply a name for the data and then supplying its value. Context feature objects that are able to respond to the type of information passed in will do so. In this way the context awareness architecture is only loosely coupled to the technical infrastructure which provides the actual context data, and different sources of context data may easily be substituted at any time. From a technical perspective, this functionality is further enhanced by the use of a Web services architecture to deploy the system. This means that communication with the context awareness architecture is easily achieved through standard protocols and data formats.
2.1 Conceptual Context Model The easiest way to understand the contextual approach we have taken is through the metaphor of a movie. The movie itself has a main theme, and a variety of subplots and threads running through it. This is equivalent to the overall context. It is dynamic, changing over time and with the interactions of the participants, where history is important. A scene in the movie corresponds to a context state: a specific set of themes and characters are to the fore and have primary importance. A scene from the movie has these key characters in it, plus some props — this corresponds to what we call the context substate. Thus, as in a movie, the whole movie is needed for a full understanding, but a lot of information does exist in a single frame.
2.2 Context Feature Objects Our software architecture comprises a set of software objects called context feature objects (CFOs) that correspond to real-world context features relating to the learner's setting, activity, device capabilities and so on to derive a context substate, as described above. Data can be acquired through either automated means (for example sensors or other software subsystems) or can be input directly by the user. This context substate is used to perform first exclusion of any unsuitable content (for example high-resolution webpages that cannot be displayed on a PDA) and then ranking of the remaining content to determine the best n options. This ranked set of options is then output to the content delivery subsystem. 2.2.1 Types of Context Features Context feature objects are either excluders or rankers. Items of content that are deemed entirely inappropriate for the current context are excluded. That is to say they are removed from the list of recommended content and not subject to any further consideration — items that match a single exclusion criterion will not receive any further rankings and will not be recommended no matter how high a score they receive, and so exclusion is qualitatively different to pimply receiving a low or zero ranking. Content remaining in the list after the exclusion process is then ranked according to how well it matches the current context. The ranking process
105
106
Peter Lonsdale, Russell Beale & Wi7/ Byrne
simply increments the score of each item of content that has metadata matching the stimulus values of any particular context feature. The size of the increment depends on the salience value of the context feature doing the ranking. Individual CFOs can have their salience values changed so that they exert more influence on the ranking process. A CFO has a set of possible values, and an indicator of which value is currently selected. It is also possible for CFOs to have multiple sets of possible values, with the current active set being determined by the current value of another linked context feature. Whilst this has no bearing on the recommendation process, it is important in terms of providing an inspectable model of the context state to the user, who can observe the influence of one context feature on another. For example, options relating to current activity can change depending on the user's current location. 2.2.2 Linked Context Features Each context feature object responds to only one metadata tag and performs either an exclusion or ranking function. To achieve more complex filtering of content, CFOs can be linked together so that their function can depend on the state of other context feature objects. Link objects are used to send either the values of context features or the time they have held that value to other context features. Criteria on that link determine whether action should be taken. For example, we might have a context feature that responds directly to input from a sensor network specifying the location of the user. Another context feature infers the level of interest of the user by taking input from a link that acts on the time the location feature has had its current value. A user dwelling in one place for a longer period implies a higher level of interest in that location. A third context feature may respond to user input that can over-ride the inferred level of interest — this uses a link object that acts on the value input by the user. Conflicts between links and context features are resolved using salience values which specify the relative importance of each. These salience values are at present specified by the designer(s) of the context-aware experience, but more automated methods of conflict resolution could be employed in future iterations.
23 Output The ordered list of ranked items of content is passed to delivery subsystems for use in determining exactly what content should be made available to the user. In this \yay, the context-awareness sub-system has no way of specifying exactly what is made available — the system is intended only to make recommendations to the system and to the user. This method of recommendation is preferred so that should the system make a mistake, and make inappropriate reconmiendations, its output does not override selections made elsewhere in the system (for example, the user might specify a particular page of content and then not want that item to be replaced by another). It should be clearly understood that the recommendations made are not only done on content — recommendations can also determine new navigational strategies through the virtual or real space. We are not concerned with only filtering content, but in the more general question of providing appropriate support, which may be re-
Using Context Awareness to Enhance Visitor Engagement in a Gallery Space
107
ordering information, offering it in a different order, or directing the user to another part of the physical space — which will in turn affect the context system.
3 In the Gallery We have deployed our context awareness architecture in the gallery space at Nottingham Castle. Our intention is to provide visitors to the gallery with an enhanced experience through the utilization of contextual information to drive the behaviour of their mobile device. Content and options displayed on the screen will be tailored according to the user's current context, and users are also able to make explicit use of the context sensitivity to drive the behaviour of the device themselves.
3.1 Designing the Experience We have consulted with the curators at Nottingham Castle Museum to ensure that our system will deliver appropriate support to visitors in the gallery space. Several issues arose during our consultation, of which two are inmiediately relevant to the design of the context aware visitor experience: Lack of focused attention: visitors will usually enter the gallery space via one door, move through the space in a linear way, and then exit without really paying much attention to what they see on the way. Deadspots: certain artefacts within the gallery are often overlooked by visitors, for a variety of reasons; positioning, lighting, or other factors. We wanted initially to use our context awareness system to attempt to overcome these issues, and thus provide a more engaging experience for the visitors whilst at the same time addressing these areas that concern the gallery staff. A crucial part of the design centres on the fact that visitors move through a physical space. This movement was determined to be the primary context feature for our system to use. In addition, movement itself is not constrained to two or even three dimensions — visitors' movements can be described also in terms of the fourth dimension, time. The particular path a visitor takes through the gallery, the time they spend at individual paintings, and whether or not they retrace their steps can all be used to drive a context aware application. Our system has been set-up to deliver appropriate content using the following principles: • Which painting is the user currently closest to? This is determined from our positioning system as described below. The system is able to provide accurate data about which painting the user is currently closest to. • How long has the user been in their current position? An increased dwell time at a specific painting is assumed to indicate a higher level of interest in that painting. • Has the user been in this position before? If the user has been to a painting before, the content they viewed on their previous visit can be used to determine the appropriate level of content to display this time. Previous content can also be offered for review.
108
Peter Lonsdale, Russell Beale & mil Byrne
As well as using context awareness to determine what the device does, we are also exploring the use of context awareness as a means to physically engage the learner in the learning space, and to encourage movement within that space. 3.1 A Encouraging Movement Within the Space Mobile devices are often deployed in museums and similar locations as a means to deliver content or provide some other element of interactivity to the exhibits. But delivering content means that we are in danger of replacing hands on interaction with *heads down one-way transmission of information' [Hsi 2003]. Instead, what we can do is to use the device and the content it can display to cause the visitor to see the artefacts in a different way, and to expose the links between paintings that were not visible without the use of the technology to point them out. This functionality has been implemented through structuring the audio content provided by the device to highlight links with other paintings in the gallery space. Users are expected to navigate to the other paintings without additional assistance, which is in part the reason we have seen the use of the context sensitivity as a navigation tool. 3,L2 Enabling Navigation through Physical Movement We have observed that users wanted to navigate the information space by physically changing their context so that they were effectively driving the system through physical actions. Our application already supported this through being sensitive to context changes, but to further enhance usability in this area we have explored the use of salient contextual information on the user interface so that users can monitor the state of the context system and determine whether they have achieved the state they are aiming for. In this case, it became necessary to indicate to the user the exact location the device was currently registering for them, whether it thought they were moving or stationary, and how long it thought they had been in that location.
3.2 Deploying the Experience To provide the functionality described above within our context awareness architecture, it is necessary to define two context feature objects to monitor Painting and Interest. The Painting CFOs responds directly to which painting is closest to the user, and scores all items of content that are relevant to that painting. The Interest CFOs also responds indirectly to location. A Link is defined between Painting and Interest which specifies a number of possible values for Interest, depending on the time that Painting has held its current value. The longer that Painting holds its value, the higher the value of Interest. If a visitor retraces their steps, the context architecture is able to determine the last known value of Interest, by consulting an internal database that stores sets of values of the CFOs. Using Painting as the search key for this database, the context system can determine what level of interest was reached last time the visitor was at this painting. The functionality described here could be achieved using a far less involved set of rules. However, our implementation offers a high degree of flexibility and also the chance for non-programmers to easily create context aware experiences without having to worry about the specifics of the code behind the system. The
Using Context Awareness to Enhance Visitor Engagement in a Gallery Space gallery experience is just one example of a relatively simple application that can be deployed using our architecture. The architecture itself is designed to be flexible and extensible, to allow for much greater complexity than was used for these initial gallery trials. We are using a bespoke ultrasound tracking system to determine the location of users as they move around the gallery space. This system has been developed at the University of Birmingham as part of another project [Cross et al. 2002], and has been successfully adapted to provide input to our context awareness system. The ultrasound system comprises a set of transmitters placed at known points on the walls of the gallery, and a receiver which connects to a PocketPC device. The receiver is able to triangulate its position from the signals received from the fixed transmitters.
4 Results of User Trials From December 2004 to April 2005, we conducted user trials of our context aware system at Nottingham Castle Museum gallery. At time of writing, our results are at a preliminary stage, and we have not yet analysed data gathered from our questionnaires or audio/video recordings. All participants were visitors to the Nottingham Castle Gallery who were approached and asked if they wished to take part in our study. All were given a brief introduction to the system and its aims. All participants (except the control condition) were asked to complete a pre- and post-task questionnaire so that we could assess what they had learned from their visit. We gather data from several sources for our trials: • Pre- and post-task questionnaire data, to determine what visitors have learned from their visit. • Video recordings: of visitors' movements in the gallery. • Audio recordings: of visitors' conversations whilst using the system. • System logs: of content delivered, movement between paintings, options selected on the PDA. We used an independent measures experimental design to determine the impact of the use of our handheld guide (experimental condition) in comparison with traditional guide materials (baseline condition: a printed booklet) and no provision of guide materials at all (control condition). Preliminary results are drawn from informal observations taken by the experimenters during the trials. We found that visitors using the paper guide tended to follow a more 'rigid' pattern of movement around the gallery, visiting paintings in a specific order, then stopping to consult the guide book. In contrast, visitors with the PDA were more likely to move around the gallery according to what interested them, after scanning the room for paintings that caught their eye. It seemed that because the handheld guide had no inherent structure, this structure was not imposed on the visitors' behaviour.
109
no
Peter Lonsdale, Russell Beale & mil Byrne
A number of specific problems were observed when people were using the system. People quickly developed high expectations of the system based on previous experience, often remarking on paintings that did not offer the same depth of content as the others. Content availability was apparent from the screen display, but this seemed non-intuitive for many users. Even the basic system was perceived as overly complex by many users, emphasising the need for content delivery systems such as this to remain as simple as possible. Despite perceiving the system as complex, most users seemed to find the system useful once they had discovered what it could provide. However, few users made use of the content navigation options on the device, and were content to simply have content delivered in the order the system dictated.
5 Conclusions and Next Steps The system described here has been deployed in Nottingham Castle Museum gallery and is currently undergoing user trials. Preliminary testing of our prototypes has indicated that are important research issues surrounding the use of context sensitive architectures both to drive applications and to provide alternative means of content navigation for users. The main challenges in this area are those of determining appropriate ways to represent these new metaphors for navigation to users, and creating usable interfaces within the constraints imposed by the design of mobile devices. In particular, it seems that context-aware applications must be simultaneously invisible — in the sense that the user can use the system without being concerned with the details of how it is performing its task — and optionally highly visible, so that the user can inspect the state of the system, correct mistakes, and use the contextual information for their own purposes such as content navigation.
References Abowd, G. D., Atkeson, C, Hong, J., Long, S., Kooper, R. & Pinkerton, M. [1997], Cyberguide: A Mobile Context-aware Tour Guide, Wireless Networks 3(5), 421-33. Baber, C, Bristow, H., Cheng, S., Hedley, A., Kuriyama, Y, Lien, M., Pollard, J. & Sorrell, P. [2001], Augmenting Museums and Art Galleries, in M. Hirose (ed.), Human-Computer Interaction — INTERACT '01: Proceedings of the Eighth IFIP Conference on HumanComputer Interaction^ Vol. 1, lOS Press, pp.439-47. Beale, R. & Lonsdale, P. [2004], Mobile Context Aware Systems: The Intelligence to Support Tasks and Effectively Utilise Resources, in S. Brewster & M. Dunlop (eds.), Human-Computer Interaction — Mobile HCI 2004: Proceedings of the 5th International Symposium on Mobile Human-Computer Interaction, Vol. 3160 of Lecture Notes in Computer Science, Springer-Verlag, pp.240-51. Bormida, G. D., Lefrere, P. Vaccaro, R. & Sharpies, M. [2002], The MOBILeam Project: Exploring New Ways to Use Mobile Environments and Devices to Meet the Needs of Leamers, Working by Themselves and With Others, in S. Anastopoulou, M. Sharpies & G. Vavoula (eds.). Proceedings of the European Workshop on Mobile and Contextual Learning, University of Birmingham, pp.51-2.
Using Context Awareness to Enhance Visitor Engagement in a Gallery Space
111
Bristow, H. W., Baber, C, Cross, J. & Wooley, S. [2002], Evaluating Contextual Information for Wearable Computing, in Proceedings of the 6th International Symposium on Wearable Computers (ISWC 2002), IEEE Computer Society Press, pp. 175-86. Chen, G. & Kotz, D. [2000], A Survey of Context-aware Mobile Computing Research, Technical Report TR2000-381, Dartmouth College, USA. Cross, J., Wooley, S., Baber, C. & Gaffriey, V. [2002], Wearable Computing for Field Archeology, in Proceedings of the 6th International Symposium on Wearable Computers (ISWC 2002), IEEE Computer Society Press, p. 169. Dix, A., Rodden, T., Davies, N., Trevor, J., Friday, A. & Palfreyman, K. [2000], Exploiting Space and Location as a Design Framework for Interactive Mobile Systems, ACM Transactions on Computer-Human Interaction 7(3), 285-321. Dourish, P [2004], What We Talk About When We Talk About Context, Personal and Ubiquitous Computing 8(1), 19-30. Hsi, S. [2003], A Study of User Experiences Mediated by Nomadic Web Content in a Museum, Journal of Computer-assisted Learning 19(3), 308-19. MacColl, I., Millard, D., Randell, C , Steed, A., Brown, B., Benford, S., Chalmers, M., Conroy, R., Dalton, N., Galani, A., Greenhalgh, C , Michaelides, D., Rodden, T, Taylor, I. & Weal, M. [2002], Shared Visiting in EQUATOR City, in C. Greenhalgh, E. Churchill & W. Broil (eds.). Proceedings of the Fourth International Conference on Collaborative Virtual Environments (CVE2002), ACM Press, pp.88-94. Naismith, L. & Smith, P. [2004], Context-sensitive Information Delivery to Visitors in a Botanic Garden, in Proceedings of ED-MEDIA: World Conference on Educational Multimedia, Hypermedia arui Telecommunications, Association for the Advancement of Computers in Education (AACE), pp.5525-5530. Oppermann, R. & Specht, M. [1999], Adaptive Mobile Museum Guide for Information and Learning on Demand, in H.-J. Bullinger & J. Zieger (eds.). Proceedings of the 8th International Conference on Human-Computer Interaction (HCI International '99), Lawrence Erlbaum Associates, pp. 642-6. Proctor, N. & Burton, J. [2003], Tate Modem Multimedia Tour Pilots 2002-2003, in J. Attewell & C. Savill-Smith (eds.). Proceedings of the Second European Conference on Learning with Mobile Devices — MLEARN2003, Learning and Skills Development Agency (LSDA), p.545. Rogers, T. [2002], Mobile Technologies for Informal Learning — a Theoretical Review of the Literature, in S. Anastopoulou, M. Sharpies & G. Vavoula (eds.). Proceedings of the European Workshop on Mobile and Contextual Learning, University of Birmingham, pp. 1920. Selker, T. & Burleson, W. [2000], Context-aware Design and Interaction in Computer Systems, IBM System Journal 39(3-4), 880-91. Sharpies, M. & Beale, R. [2003], A Technical Review of Mobile Computational Devices, Journal of Computer-assisted Learning 19(3), 392-5.
Engagement with an Interactive Museum Exhibit Naomi Haywood & Paul Cairns UCL Interaction Centre, 31-32 Alfred Places London WCIE 7DP, UK Tel: +44 20 7679 5208 Fax: +44 20 7679 5295 Email: [email protected], [email protected] URL: http://www.uclic.ucl.ac.uk/paul Learning and engagement have been recognised as very important in defining the effectiveness of interactive museum exhibits. However the relationship between these two notions is not fully understood. In particular, little is known about engagement with interactive exhibits and how it relates to learning. This paper describes a hypothesis seeking approach to find out how chUdren engage with an interactive exhibit at the Science Museum. Engagement is found to be described in terms of the three categories: participation, narration and co-presence of others. These aspects of engagement can be seen to arise from specific aspects of the interaction design of the exhibit. Moreover, they also overlap with features required for a positive learning experience. These findings suggest many fruitful directions for future research in this area. Keywords: immersion, interactive exhibit, narrative, learning, co-presence.
1
Introduction
Museums are a major source of public education outside of the formal schooling system in the UK [Teachemet 2004]. However, rather than competing with formal education, they provide a complementary resource for both formal and informal learning. For example, many museum visitors are groups of school pupils who visit the museum as part of their formal education. Further, many museum visitors are families, with parents aiming to allow their children to encounter areas of informal
114
Naomi Haywood & Paul Cairns
education that they may not otherwise encounter [Jensen 1994]. Museums also function as source of leisure and entertainment. Indeed, museums are one of the central provisions for entertainment which are widely accessible to the general public [Falk & Dierking 2000]. Thus, museums must aim to provide entertainment that is simultaneously informative and educational. Increasingly, museums look to interactive exhibits to fulfil this aim. For the purposes of the current discussion, we take interactive exhibits to be exhibits that allow for interaction in some form other than mere visual perception. Frequently this interaction involves physical manipulation, such as visitors clicking buttons or flicking switches in response to specific questions or demands presented on screens. Interactivity therefore allows visitors to determine what the exhibit presents. For example, many interactive exhibits allow visitors to determine the order of presented information and whether they want to obtain more information concerning a specific area of interest [vom Lehn et al. 1999]. It must be noted, though, that not all exhibits that claim to be interactive would actual meet this criterion. Indeed, the recent "interactive exhibit" at the British Museum [2004] was a purely visual experience albeit some of it in 3D computer animation. The general aim of these interactive exhibits is to allow for learning and entertainment. For the consideration of interactive exhibits, Falk & Dierking [2000] define learning broadly in terms of how users are able to comprehend the presented information. For example, a visitor may interact with an exhibit presenting images of the human heart, its functions and individual parts. If this visitor is subsequently able to note that the heart is a muscular organ which pumps blood around the blood vessels, then learning can be said to have occurred. Falk & Dierking also broadly define entertainment in terms of the exhibit being engaging. For example, if visitors spend time interacting with an exhibit without taking part in other activities, then this exhibit can be said to be engaging. Recently, museums have made frequent use of interactive exhibits and generally consider their use to be successful in terms of learning and engagement [Gammon 2003]. However, the precise nature of how learning and engagement occur and how they may relate to each other remains uncertain. For example, it is possible that visitors spend long durations of time interacting with exhibits without reading the presented information. Therefore while the exhibit may be engaging, it may not encourage visitors to learn. Further, it is possible that visitors may learn from an interactive exhibit despite spending only a short duration of time interacting with it and simultaneously being involved in other activities. The goal of museums is to produce successful exhibits and therefore to be able to reliably design exhibits for learning and engagement. Much research had been and is being done on investigating the educational effectiveness of museums. Indeed, this is the sole focus of the Journal of Education in Museums. However, though engagement has been identified as significant, it is not known how to design exhibits for engagement. In particular, we were unable tofinda clear discussion of the role of interaction in making an exhibit engaging. In part at least, this seems to be because it is not really understood what engagement actually is [Brown & Cairns 2004]. The purpose, then, of this study is to develop hypotheses of what it means for an interactive exhibit to be engaging, how engagement as understood from the study
Engagement with an Interactive Museum Exhibit
115
may relate to learning and, where possible, what elements of the interaction could lead to engagement. The hypotheses found suggest avenues for future research. This hypothesis seeking approach is necessarily qualitative and we have developed a grounded theory [Strauss & Corbin 1998] in order to elicit and organise a conception of engagement based on first hand accounts of using an interactive exhibit. Though learning is important, it was probed for rather than measured as it was felt that an explicit measure of learning would interfere with the participants' experience or reporting of engagement. Instead, the relationship of the theory of engagement to learning is developed and explored in the discussion. As is well known in HCI, the context of use can strongly influence specific interactions. Museums present quite specific contexts. As noted by Gammon [2003] individuals in museums frequently behave in a considerably different manner from when they are in other contexts. Moreover, vom Lehn et al. [1999] found that the learning experience of an individual was also determined by collaboration with others. For example, adults may point out key features to children, and visitors may observe each other interacting with exhibits. This suggests that any learning occurring by means of interactive exhibits is embedded in the social context. For this reason, the study was conducted with a specific exhibit, the Energy Everywhere exhibit, in the Science Museum, London. Ten children were recruited to interact with the exhibit and then interviewed about their experiences. The grounded theory developed centred around three concepts of participation, narration and the co-presence of others. A key finding, which contrasts with vom Lehn et al.'s [1999] studies, is that co-presence is an important factor in the theory of engagement rather than collaboration which was considered important for learning. These concepts will be explained and demonstrated in the results section. The succeeding section discusses these concepts in terms of how interaction with the exhibit relates to engagement and learning and, therefore, possible lines of future research. The discussion will also be used to re-contextualize the theory within the existing literature.
2 Energy Everywhere Before describing the methodology of the study, it is useful to briefly describe the actual exhibit studied. The exhibit is part of a permanent exhibition, Energy — fuelling the future, at the Science Museum in London. This exhibition was developed by Science Museum staff in collaboration with educators, scientists and consultants experienced in exhibition design. It opened in July 2004 and includes a total of six interactive exhibits, various information terminals and works of art relating to energy. The present research focuses on one specific exhibit, named Energy Everywhere. This exhibit is positioned at the entrance of the exhibition and is aimed at pupils of key stages two and three of the National Curriculum and families with children between seven to fourteen years old. The exhibit is an animated film with a linear structure that starts when it detects the presence of a person in the vicinity of the exhibit. The person is invited to stand on a flashing yellow square in front of the screen and to clap their hands to start. This sets a sequence of animated scenes with sounds and a voice-over describing
116
Naomi Haywood & Paul Cairns
how energy is present in the scenes and how it is being transformed from one form to another. The graphics for the scenes are quite abstract where iconized forms such as trees and landscape are depicted but made up from words for the object itself. For example, the sun appears at the beginning and is drawn from many instances of the word 'energy.' At three specific points in the sequence, the visitor is invited to interact with the exhibit by making gestures. The three gestures are: digging for coal; spinning your arms around to generate wind; and clapping hands to make lightning strike. The exhibit also prompts the visitor if they do not do the appropriate actions, for instance, it may display and say "Clap louder" if the visitor does not clap loud enough for it to detect. Successfully completed actions are also acknowledged with "Well done!" both appearing on the screen and being spoken. In all the exhibit takes around five minutes to complete the full sequence.
3
Method
In order to formulate hypotheses of engagement in an interactive exhibit we used a grounded theory approach [Strauss & Corbin 1998]. The basis for data gathering was interviews with museum visitors. Grounded theory allows for quite flexible interviewing that could be open to examining the specific concept of engagement but also exploring other concepts should they appear related to engagement in the minds of the interviewees. Also, grounded theory allows for an acceptable and rigorous working up of the interview data into a robust framework that could be used as the starting point for further studies. The basic approach of the study was to have visitors use the Energy Everywhere exhibit and then to be interviewed afterwards about their experience. Due to the timing of the project and the Science Museum's development of the exhibit, the earlier interviews were performed with a prototype in a special evaluation room out of the context of the full exhibition. The later interviews were done based on the actual exhibit in the exhibition gallery when it had been installed. Potentially, the visitors using the prototype could have had an unrealistic experience but the theoretical sampling approach of grounded theory allowed the later interviews to fully explore the effect of the exhibition context on the overall experience. Additionally, there is the risk that, by knowing they were participants, the children might have engaged differently with the exhibits. In the prototype this was unavoidable but with the final exhibit, children were only approached once they had finished using the exhibit. As their experiences were integrated in the results with those of the earlier participants, it is hoped that any artificiality has been ameliorated. The interested reader is invited to contact the authors for full details of the method, ethical clearance, consent and transcripts of the interviews.
3.1
Participants
Since research on learning suggests that there are age and sex differences in terms of how learning occurs [Richardson & Sheldon 1988] the present research aimed at recruiting a balance of girls and boys. Further, recruitment was based on ensuring that a wide range of ages within the target age group was considered.
Engagement with an Interactive Museum Exhibit The children were recruited from the visitors to the Science Museum. Both the children and their guardians were approached. The general purpose of the interview was explained to them and consent was obtained from both rather than just the guardian. In total ten children participated, six interacted with the prototype and four interacted with the final exhibit. Of the six children who interacted with the prototype three were girls and three were boys. Their ages ranged from ten years to thirteen years. Of the four children who interacted with the final exhibit three were girls and one was a boy. Their ages ranged from nine years to twelve years. The age range does present a risk that engagement could be a significantly different experience particularly if individual differences are also taken into account. However, the grounded theory should bring out both the commonality and divergence of experience that could be attributed to age. As it happened, there was no evident simple relationship between age and the sense of engagement. All children were native English speakers and went to schools in the UK. Further, all children took part in the research individually though under supervision from their accompanying guardian. That is, guardians were explicitly discouraged from using the exhibit themselves. Ten children is a somewhat small sample but recruiting children in the main exhibit was problematic. It was felt that the children who took part should have completed using the exhibit as a sign of at least some degree of engagement. Unfortunately, not many children who used the exhibit did actually complete the full cycle of use. Nonetheless, the grounded approach provides assurances that the description of engagement developed is at least faithful to the experience of the ten children who did take part. This is sufficient for the goals of the study to develop some notion of engagement that can be developed in future research. It should also be noted that the experiences of those children that did not complete the exhibit would make an equally fascinating study but it would be orthogonal to the goals of the current work.
3.2 Interviews The grounded theory was constructed on the data gathered from semi-structured interviews focused around three key areas. Engagement clearly was a key area that the interviews tried to address. Initially, the questions on engagement were very exploratory. For example, children were asked to compare the experience with watching television or reading. Learning was also included as a focus for the interviews because it clearly is intended to be an important aspect of the exhibit. However, no effort was made to rigorously measure learning as this could easily result in changing the experience of engagement. For example, if a visitor was pre-tested before using the exhibit, they might suppose that they would be post-tested and so alter their natural behaviour with the exhibit. Alternatively, it would seem unethical to spring a test on a child after using the exhibit but prior warning of the test could either put children off from participating in the study or again alter their approach to the exhibit. Thus, learning was probed but not measured. Even so, we found it was still possible to find quite concrete examples of learning.
117
118
Naomi Haywood & Paul Cairns
Collaboration was also considered a key area for consideration as it had been identified by vom Lehn et al. [1999] as important for the success of museum exhibits. For example, questions specifically asked about how the children talked with others around them whilst using the exhibit. Naturally, as the interviews progressed, it became clear that these key areas were different from what had been expected. Grounded theory recommends that interview schedules should change to adapt and fully expand the dimensions emerging from the data. Thus, final interviews changed the emphasis towards ideas that had emerged in earlier interviews. For instance, children were no longer asked to compare their experience with television or reading but instead asked to relate their experience to playing. Also, the notion of collaboration mutated into that of co-presence and children were asked more about what the presence of others meant rather than how they specifically interacted with others. The interviews lasted between fifteen and twenty minutes. They were recorded with consent from the children and their guardians. Video recording was not used as it was felt that the interview data was the primary source. Indeed, the interviewer did attempt to note particular attitudes and facial expressions of the children as they used the exhibit but it was not possible to meaningfully interpret them for the aims of the study.
33 Analysis The analysis of the data followed the usual grounded theory practice of analysing as interviews were done. Thus it was possible to adapt the interviews over the course of the study. Microanalysis and open coding were used extensively at the start of the interviews in order begin to define concepts, dimensions and categories in the data. Axial coding was also done as the data accumulated in order to bring out the relationships between the emerging concepts and to gain a holistic sense of the data. As expected, once interviews were underway, conunon themes began to emerge. The later interviews, where they reiterated already identified concerns, were not fully coded. Instead, the focus of the coding was on the more novel areas, in particular, on the differences between the prototype and exhibit contexts. This approach concerning the analysis of interviews stands in accordance with suggestions by Glaser [1992] and Dick [2002] who propose that it is advantageous to consider key parts of interviews rather than coding entire interviews.
4 Results The process of gathering, analysing and interpreting the results is inherently integrated in the grounded theory approach. This means that it is not easy to present how the central categories of the theory emerged. Instead, we present a (necessarily linear) account of the three categories, namely participation, narration and copresence of others. These arose from the data as being the main distinct concepts that underpin the engagement of the children with the exhibit. The categories are derived from the transcripts of the interviews but again it would be neither possible nor appropriate to present these in full. Therefore, important quotes from these transcripts are presented in order to provide examples of the obtained results.
Engagement with an Interactive Museum Exhibit
119
4.1 Participation For the present research participation is defined as a playful process during which information is made personal by children becoming part of the presented scenes. It emerged that children had a sense of participation while interacting with Energy Everywhere and that this sense is determined by the concepts oi simple graphics and power. Simple Graphics Participation in Energy Everywhere seems possible based on the simplicity of the presented graphics. The children seemed able to feel part of the presented screens and they indicated specifically that it was the graphics that encouraged this. Further, it emerged that children enjoy sensation. For example, one child noted: "Everything was painted in words, that's so unreal [...] it made me think of different kinds of things I know [... ] when moving around I felt like I could be part of these things [... ] I liked it." Further, when talking about the simplicity of the graphics, children frequently noted that this allowed them to play. Therefore it seems that children conceptualise their interaction with Energy Everywhere in terms of play. For example, one child noted: " [The exhibit] was like a game, you play with it and because it's so simple you have to develop it further in your head." Another child noted: "The small words were like a puzzle to play with [... ] I liked playing with it." However, some children perceived the simple, iconic graphics as confusing and therefore felt detached from Energy Everywhere. Specifically, some children noted that the use of small words to form graphics made it difficult to simultaneously read the words and perceive the picture. It seems that this made it difficult for these children to participate in the learning experience presented by the exhibit. For example, one child noted: "I didn't know whether to read the words or look at the whole picture first [... ] That was confusing [... ] and made it difficult to learn." Power An important aspect of children's interaction with Energy Everywhere is their experience of power. In many instances children related their enjoyment of the exhibit and their participation in it to the power that it made them possess. The following dialogue expresses this point: Child: "It was cool [... ] I made energy [... ] I forgot that other children can do that too [... ] That's cool." Researcher. "Did you also have power when you made wind?" Child: "I had power because I made the wind [... ] It's not real power because it's only a simulation [...]. That's cool."
120
Naomi Haywood & Paul Cairns
When questioned if there were specific features of the exhibit or specific times during their interaction that they felt powerful, children noted times when they were able to directly interact with the exhibit. Specifically, many children noted that they felt powerful while pretending to dig up coal, moving their arms to make wind or clapping their hands to make lightning hit a tree. For example, one child noted: " It was when I made the lightning finally hit the tree and it exploded [... ] That was when I felt like I had lots of power." Importantly, children frequently related their experience of power to there being nothing between them and the screen. It seems that this allowed children to pretend that they were carrying out the activities in real life. For example, one child noted: "There was no mouse or anything [... ] so it didn't feel like it was a computer. It's much more like really pretending you're digging."
4.2 Narration Narration can be defined as the formation of stories and accounts of events. The present research indicates that for interactive exhibits narration is conceptualised in terms of linear structure diVid fantasy. Linear Structure Children frequently referred to Energy Everywhere as a story in terms of it possessing a beginning, a middle and an end. It emerged that this perception of Energy Everywhere as a story possessing a linear structure shapes children's interaction with the exhibit. For example, one child noted: "[The exhibit] is like a story of how energy moves [... ] in the beginning it shows how energy comes from under the ground, then it moves [... ] in the end it shows how energy can become lightning [... ] that shows you what you have to do." The linear structure of Energy Experience also seems to have allowed children to learn the connectivity of the presented information by creating stories around this structure. The following dialogue is indicative of this suggestion: Child: "At first the energy is stored in the sun. This allows for coal to be created under the ground. Miners must then dig it up so that it can be used [... ] Then coal can be burned and used by people, for example to heat houses in the old days [... ] Energy moves around differently, depending on what kind it is." Researcher: "So the things you saw were connected?" Child: "Yes, they were connected by energy moving and the things that can happen to energy, like lightning and fire." Researcher: "Can you tell me how you know this?" Child: "It showed it on the screen [... ] I connected things by looking at
Engagement with an Interactive Museum Exhibit
121
However, in some cases children made incorrect causal inferences. These incorrect inferences mainly relate to perceived causal relationships between features of the presented information. In particular, some children's narratives expressed that the energy of some features presented on the screen leads to the movement of other features, which is not always correct. For example, one child's narrative includes the statement: "The clouds in the air make energy for the waves to move." When asked if there was anything about the exhibit that confused her, this child stated that there was not. Therefore it seems that children may make incorrect inferences without perceiving Energy Everywhere as confusing. Fantasy It emerged that children's narratives are not based merely on following the linear structure provided by the exhibit, but rather that children's narratives frequently include fantasy. For example, one child created a story, in which she imagined herself flying over the presented landscape. Specifically, it emerged that in creating these narratives children frequently extend the presented information to include their own fantasies. For example, one child noted: "The waves looked silly, like in a cartoon [... ] not the real thing [... ] That was funny [... ] and it made me feel like I was part of a cartoon [...]Ilikethat." Another child noted: "The trees made up of words made me think of children's books [... ] here trees move because of the wind [... ] I make the wind." Another fantasy seems to be triggered by the exhibit demanding that children pretend to dig up coal. Pretending to dig up coal necessitates the ability to fantasise that the action of moving ones arms resembles digging up coal. Children frequently noted that moving their arms seemed to make sense only when imagining what it is like to dig up coal in reality. Further, children frequently noted that after having imagined what it is like to dig up coal, they imagined the impact of other information presented. For example, one child noted: "It [moving his arms in pretence of digging up coal] made sense only if I imagined what it is really like [... ] it must be hard for miners to dig for so much coal [... ] When I was swinging my arms to make wind I thought of how strong wind can make trees fall [... ] I imagined what it is like for firemen to clear them off roads." However, it must be noted that the information that made up these fantasies was not always correct. For example, one child stated: "I was flapping my arms like a bird. I guess birds make wind in the air by flapping their wings."
122
Naomi Haywood <&. Paul Cairns
4.3 Co-presence of Others The present research suggests that the co-presence of others, but not collaboration is an important feature of children's interaction with Energy Everywhere. This is surprising since questions concerning collaboration were an important feature of the initial interview guidelines. However, children did not mention collaboration from their own initiative and did not consider collaboration to be an important aspect of their experience when prompted by the researcher. This suggests that collaboration is not an important feature of children's conception of their learning experience in this exhibit. Therefore collaboration does not seem to be important in connecting learning and engagement. Instead it emerged that in order to adequately conceptualise children's experience with interactive exhibits it is essential to consider the co-presence of others. It seems that while there are no specific features within Energy Everywhere that allow for this co-presence of others, the exhibition as a whole does. This is expressed clearly by one child: "There was space for others to stand around [... ] and I could see them when I looked." It emerged that this category of co-presence of others is based on the concepts of reassurance and feedback, distractions, attracting attention and communication. Reassurance and Feedback Children frequently noted that other visitors provided them with reassurance and feedback concerning their actions: "I wanted to know if I was doing it right, so I turned to my mum [... ] She nodded and smiled so I knew I was doing it right." Further, children frequently noted that the mere presence of others reassured them and provided them with feedback. It emerged that in many cases this reassurance and feedback is more important than reassurance and feedback provided by the exhibit. This is expressed in the following dialogue: Child: "Since there were so many people watching me, it must be interesting and I must be doing a good job." Researcher: "And the words *Well done!' [presented on the screen], did they tell you that you were doing a good job?" Child: "Yes, but I wasn't so sure, it might always say that." Distractions It emerged that the possibility of distractions caused by the co-presence of others allows children to increase their engagement with the learning experience. For example, one child noted: "There was so much noise and stuff happening [around the exhibit]. I had to just look at the screen and not look away so that I would not miss bits of what is being taught [... ] That was like in the cinema when you can't see around you."
Engagement with an Interactive Museum Exhibit
123
However, it also emerged that actual distractions seem to reduce the experience of engagement. For example, after another child walked between him and the screen one child noted: "I turned to look at who was watching me and then didn't know what I had to do any more [... ] It felt like it would be best to start again because I forgot what I had learnt." These negative effects of distractions seem to relate not only to children's physical actions, but also to their creation of narratives and their experience of enjoyment. For example, after another child repeatedly clapped his hands, one child noted: "The whole thing about what was happening to the energy seemed less real [... ] and was not so much fun." Attracting Attention Children frequently noted that their interaction with Energy Everywhere attracted the attention of other visitors. It emerged that some children enjoy this attention: "Clapping my hands was really cool. It was noisy and many people turned to look at me." Also, attracting the attention of others frequently motivates children to spend time with Energy Every where and examine it in more detail. For example, one child noted: "I liked the sound and the pictures [... ] another child watched me clap to start [... ] that made me want to take a closer look." Additionally, attracting the attention of others motivates children to perform actions correctly. For example: "My friends were watching me so I didn't want to make any mistakes." It emerged that the time, at which the attention of other visitors is attracted is important. For example, children frequently noted that attracting the attention of others by clapping their hands to initiate their interaction with Energy Everywhere encouraged them to continue this interaction. Further, it seems that attracting the attention of others early during their interaction allows children to gain reassurance and feedback concerning whether their actions are correct. This is expressed in the following dialogue: Child: "I clapped my hands to start. This made my friend turn to look." Researcher: "And how did it make you feel that your friend turned to look?" Child: "Good [... ] She must like the exhibit so I wanted to continue."
124
Naomi Haywood & Paul Cairns
In contrast, during later stages of interaction attracting the attention of copresent individuals made children feel embarrassed. This could be due to the length of time spent interacting by that stage or possibly the gestures made. For example, one child noted: "When I was spinning my arms my mother looked at me funnily [... ] I felt stupid and would have preferred to stop." Another child noted: "It was a bit strange waving my arms in front of everyone [... ] People were staring [... ] I felt a bit silly and wanted to stop." Communication It emerged that the co-presence of others is associated with children's desire to talk to others about their experience with Energy Everywhere. Further, it emerged that this desire to talk to others is related to a desire to learn. For example, one child noted: " Seeing my friends [who were interacting with another exhibit] made me want to tell them what I learnt [... ] I wanted to learn a lot so that I could tell them lots." Moreover, children seemed to consider learning in terms of what they can later communicate to others. For example, one child noted: " I like how I learn about energy moving [... ] so that I can tell my friends how it changes." For some children this learning seems to be important only if they are able to communicate this learning to others. This was expressed clearly by one child: Child: "There were so many things to learn and do." Researcher, "Can you give me an example of something you learnt and did?" Child: "I learnt about the wind moving the sea, and clouds forming, and many other things." Researcher: "Would you be able to explain what you have learnt to someone who doesn't know about energy?" Child: "Yes, I think most of the things I saw and what I then did [... ] I must be able to explain to others what I saw otherwise there is no use in learning things."
5 Implications for Engagement In order to understand how interactive exhibits may lead to engagement, we discuss how the categories underpinning engagement arose from the interactive structure of the Energy Everywhere exhibit. Of course, these relationships are based only on the experience of the children who participated with this exhibit. The discussion
Engagement with an Interactive Museum Exhibit
125
is therefore couched in terms of areas for further exploration rather than definitive design guidelines for interactive exhibits. Though no effort was made to formally measure learning in this study, it is worth drawing out the relationship between the theory developed here and existing theories of learning in children. In particular, engagement as described here is commensurate with supporting learning though whether it supports learning the right thing is another matter. The following two subsections make the links from interaction to engagement and from engagement to learning. The discussions will also be used to contextualise the results in the existing literature related to this area.
5.1 From Interaction to Engagement The basic interaction of the children with the Energy Everywhere exhibit is that they perform physical actions in order to both take part in the scenes presented and also to allow the sequence of scenes to progress. The present results suggest that these initial physical activities make sense to children only if they use fantasy to imagine how these activities are carried out in real life. This indicates that while performing initial actions children use fantasy to make sense of their actions. Fantasy seems to be an important feature of engagement since it is associated with enjoyment and allows individuals to step into their own imaginary world [Jones 1997]. Since children continued to make frequent use of fantasy, it is possible that the initial necessity to fantasise may encourage the use of fantasy throughout their interaction with Energy Everywhere. This suggests that this early physical interaction could be a useful feature of interaction to encourage engagement. In addition to the association between fantasy and sense-making, fantasy also seems important by allowing children to become part of the presented scenes. For example, when moving their arms in the pretence of digging up coal some children perceive this in terms of "really" pretending to dig up coal rather than as part of their interaction with the exhibit. The children clearly make the distinction between really pretending and somehow 'humouring' the exhibit. Thus, to some extent, it is not just that the children have power through the immediacy of their interaction but that immediacy relates directly to their sense of fantasy. The two concepts work together to reinforce the feeling of engagement. One of the more surprising concepts to emerge was the use of a narrative to also help make sense of the exhibit. The linear sequence of the exhibit contrasts with other sorts of interactive exhibits where children are free to select the information presented. This could be considered as a constraint and so reduce the possibility of engagement. Instead, it seems that the continuous use of fantasy is related to the linear structure of the exhibit. Specifically, it seems that children create narratives, which allow for the use of fantasy while still following the linear structure. Interestingly, the narratives that the children create do not necessarily match with the narrative intended by the exhibit. This may be because the exhibit's narrative is not always clear and the children are having to fill in the gaps to continue making sense of the exhibit. This suggests that a more clearly defined narrative could actually reduce the engagement by removing the need for the children to fantasise. In any case, this result has theoretical implications since it suggests that the common notion
126
Naomi Haywood & Paul Cairns
that fantasy is largely free from external constraints (e.g. Piaget [1951] and Singer [1994]) may not hold true for fantasy occurring in interactive exhibits. The simple graphics also seemed to have the drawback of disorienting some children. This disorientation seemed to be somewhat akin to the Stroop Effect [Stroop 1935] in that children could not choose whether to attend to the words or the pictures made from the words. The resulting confusion is likely to reduce engagement [Douglas & Hargadon 2000] and so perhaps these simple graphics may actually not be simple enough. Though not related directly to the interactive element of the exhibit, the copresence of others is a feature of the construction of the exhibit. The unmediated interaction requires space around which others can stand and this space is a clearly defined area which should be for the child using the exhibit. The co-presence then allowed for other possibilities that would support engagement with the exhibit. Falk & Dierking [2000] discuss the importance of providing cues and encouragement for developing engagement. Though the exhibit does provide these things, the children seem wise to the possibly superficial nature of the encouragement. Fortunately, they are able to seek it from the people they do trust who are around them and watching them. The encouragement may be explicitly provided or implicitly, inferred from the interest and attentiveness of those watching. The presence of others though was not always positive. As the exhibit progressed, the children were required to make some quite large movements that would possibly draw unnecessary attention to themselves and perhaps make them look "silly." It could be that this was due to the length of time for which the children had been the centre of attention. Initially, being attended to may have been motivating but over a longer period, it may be too much attention and the children become self-conscious. Alternatively, it could simply be that the children do no like making large and unusual movements. In either case, it seems exhibits need to balance the opportunity for being "in the spotlight" with the over-exposure that this might entail. It is worth noting that both the positive and negative aspects of co-presence correspond with the findings of Brown & Cairns [2004] with engagement in games. There, engagement occurred when players were motivated to learn to play the game but full immersion would not occur unless the players were able to reduce selfawareness. Co-presence seems to be both motivating and heightening self-awareness and so is equivocal in its effect on engagement.
5.2 From Engagement to Learning Narration is known to be an important element in learning. Plowman et al. [1999] studied multimedia learning environments such as CD-ROMS and proposed that narration is linked to learning by making the presented information personal. Similarly, Falk & Dierking [2000] proposed that the establishment of personal context leads to deeper learning by allowing individuals to attach meaning to the presented information. Further, it seems that by means of narration children are able to consider events and actions from various perspectives, a process known as decentring. For example, decentring is evident when children consider the presented information from the
Engagement with an Interactive Museum Exhibit
127
perspective of a coal miner or afiremanclearing trees off roads. As noted by Piaget [1951] fantasy is important for decentring in terms of its relationship to the process of assimilation. Vygotsky [1978] also considered fantasy to be important for general learning since it allows for the creation of novel cognitive structures. Vygotsky notes that fantasy is thus essential for the separation of meaning from origins and is based on changes occurring within the Zone of Proximal Development, that is, the difference between children's actual level of achievement and children's potential level of achievement. Vygotsky argues that, while fantasising, children are no longer constrained by their surroundings and are instead able to explore the limits of their own understanding. Thus the features of the interaction that lead to narration are therefore supporting personalisation of the information and hence could lead to a good learning experience. Co-presence can also be understood to be important for learning. The presence of others clearly motivated children, at least initially, and motivation has been identified as key to learning [Piaget & Inhelder 1969]. Moreover, the children also reported that doing well at the exhibit meant that they would be able to tell others about it. This is not only motivating but Gammon [2003] argues that an increased willingness to discuss information subsequent to interacting with an exhibit is an indicator of personal learning. Geier [2004] also notes that in many instances narration allows forfirst-personexperiences to be communicated to others. Thus co-presence of others not only motivates children but also gives them the opportunity to consider and actually conmiunicate their experiences to others. However, the mere fact of co-presence contrasts with the importance of collaboration [Falk & Dierking 2000]. This research confirms that learning from exhibits is a social experience, though socialisation may not be so explicit as collaboration in order for learning to occur. Jackson & Pagan's [2000] notions concerning the importance of collaboration for enhancing the educational value of engagement may need to be extended to include the importance of the co-presence of others. Of course, it should also be noted that the narratives that children created did not always correspond with what was being taught and that others around them could be a source of distraction and inhibition. This suggests that engagement can lead to positive learning experiences but that the focus of engagement needs to be considered carefully when designing the exhibit.
6 Conclusion Th€i grounded theory described here suggests that children's engagement with interactive exhibits can be understood in terms of three key categories: participation, narration and co-presence of others. These categories can be clearly related to some aspects of the exhibit design and so suggest fruitful areas for future research into the design of interactive exhibits and the nature of engagement with them. In particular, the theory suggests that it may be sufficient to design only for co-presence of others rather than collaboration in order to provide an engaging experience. Moreover, engagement with the exhibit does have parallels with what is needed for successful
128
Naomi Haywood & Paul Cairns
learning, and this was not previously known. Thus, this research provides many new questions whose answers could lead to the improved design of museum exhibits for engagement and learning.
Acknowledgements Many thanks to the Science Museum for the extensive support provided and to all of the participants and their guardians who took time from their visits to talk to us. Thanks also to Sarah Faisal and Lidia Oshlyansky for their helpful comments on this paper and the anonymous referees for their substantial feedback.
References British Museum [2004], Mummy: The Inside Story, http://www.thebritishmuseum.ac.uk/ mummy/ (last accessed 2005-02-07). Brown, E. & Caims, P. [2004], A Grounded Investigation of Game Immersion, in E. DykstraErickson & M. Tscheligi (eds.), CHr04 Extended Abstracts of the Conference on Human Factors in Computing Systems, ACM Press, pp. 1297-1300. Dick, B. [2002], Grounded Theory: A Thumbnail Sketch, http://www.scu.edu.au/schools/ gcm/ar/arp/grounded.html (retrieved 2004-03-01). Douglas, Y. & Hargadon, A. B. [2000], The Pleasure Principle: Immersion, Engagement, and Flow, in F. M. Shipman, P. J. Nuemburg & D. L. Hicks (eds.), Proceedings of the Eleventh ACM Conference on Hypertext and Hypermedia — Hypertext'00, ACM Press, pp. 153-60. Falk, J. H. & Dierking, L. D. [2000], Learning from Museums: Visitor Experiences and the Making of Meaning, Altamira Press. Gammon, B. [2003], Assessing Learning in Museum Environments. A Practical Guide for Museum Evaluators, http://www.ecsite-uk.net/about/reports/ indicators_leaming_1103_^ammon.pdf (retrieved 2004-03-10). Geier, M. [2004], Role-playing in Educational Environments, http://www.cc.gatech.edu/ megak/7001/Roleplaying.html (retrieved 2004-07-10). Glaser, B. [1992], Basics of Grounded Theory Analysis: Emergence vs. Forcing, Sociology Press. Jackson, R. L. & Fagan, E. [2000], Collaboration and Learning within Immersive Virtual Reality, in E. Churchill & M. Reddy (eds.). Proceedings of the Third International Conference on Collaborative Virtual Environments (CVE 2000), ACM Press, pp.83-92. Jensen, N. [1994], Children's Perceptions of their Museum Experiences: A Contextual Perspective, Children's Environments 11(4), 300-24. Jones, M. G. [1997], Learning to Play; Playing to Learn: Lessons Learned from Computer Games, http://www.gsu.edu/ wwwitr/docs/mjgames/ (retrieved 2004-07-22). Piaget, J. [1951], Play, Dreams and Imitation in Childhood, Routledge and Kegan Paul. Piaget, J. & Inhelder, B. (eds.) [1969], The Psychology of the Child, Basic Books.
Engagement with an Interactive Museum Exhibit
129
Plowman, L., Luckin, R., Laurillard, D., Stratfold, M. & Taylor, J. [1999], Designing Multimedia for Learning: Narrative Guidance and Narrative Construction, in M. G. Williams & M. W. Altom (eds.), Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: The CHI is the Limit (CHI'99), ACM Press, pp.310-17. Richardson, K. & Sheldon, S. [1988], Cognitive Development to Adolescence^ Lawrence Erlbaum Associates. Singer, J. L. [1994], Imaginative Play and Adaptive Development, in J. H. Goldstein (ed.), Toys, Play and Child: How do these Aspects Inform Engagement and Learning?y Cambridge University Press, pp.6-26. Strauss, A. & Corbin, J. [1998], Basics of Qualitative Research — Techniques and Procedures for Developing Grounded Theory, second edition. Sage Publications. Stroop, J. [1935], Studies of Interference in Serial Verbal Reactions, Journal of Experimental Psychology: General 18, 643-62. Teachemet [2004], Museums Moving with the Times, http://www.teachemet.gov.uk/ teachingandleaming/resourcematerials/museums/ (last accessed 2005-02-07). vom Lehn, D., Heath, C. & Hindmarsh, J. [1999], Discovering Science: Action and Interaction at the Exhibit-face, http://www.kcl.ac.uk/depsta/pse/mancen/witrg/pdf^ vlehnDiscover.pdf (retrieved 2004-03-03). Vygotsky, L. S. [1978], Mind In Society: The Development of Higher Psychological Processes, Harvard University Press. Edited by Michael Cole, Vera John-Steiner, Sylvia Scribner, Ellen Souberman.
User Needs in e-Govemment: Conducting Policy Analysis with Models-on-the-Web Barbara Mirel, Mary Maher^ & Jina Huh University of Michigan, 1075 Beal, Ann Arbor, Michigan 48109, USA Tel: +7 734 332 8969, +1 734 645 3664 Fax: +1734 302 2408 Email: bmirel @ umich. edu, jinah @ umich. edu ^ Economic Research Service, 1800 M Street NW, Washington DC 20036-5831, USA Tel: +1202 694 5126 Fax: +1202 694 5638 Email:
Design conventions are emerging in e-govemment models-on-tlie-Web but they are not based on evidence of analysts' actual what-if analyses for purposes like policymaking. From field studies, we developed representations of policy analysts' actual work and compared them to the assumed goals and tasks built into existing online models, inferred through goal-based requirements methods. We found a large gap exists and argue that current online models are impoverished because they ignore expertise users bring to bear on their work. Keywords: e-govemment, public policy, models, Web development, decision support systems, expertise, user models, usability, transparency.
132
Barbara Mirel, Mary Maker & Jina Huh
1 Introduction Increasingly, e-goverament simulations are available on the Web to support users in analysing and making decisions with federal data. For example, online economic models simulate conraiodity production, prices, and adoption of new technologies to help users forecast and assess economic and environmental impacts under various conditions, entered as what-if scenarios. These applications provide graphical user interfaces (GUIs) to powerful simulation models that analysts previously could only run in proprietary languages. In these online simulations a conwnon de facto design standard is emerging. Design conventions, usually beneficial for human-computer interactions, are troubling in this developing genre. Like all design standards, these embody assumptions about users' core activities, problem solving actions, and scope of model-based analysis. But these standards do not derive from systematic user experience analysis or usability testing. In fact, little if any evaluative evidence exists to show whether these quickly emerging design conventions actually fit people's demonstrated approaches to complex, what-if analyses. Therefore, the broad and open question that our study addresses is: Are current models-on-the-Web as they are now designed truly the right product for analysts' needs? To answer this question we analyse how well the notion of users' work built into the designs of current models-on-the-Web fits the work that analysts actually do in context. We study users who are policy analysts, specifically specialists in agricultural economics who construct policy arguments for such questions as: "What incentives may prompt farmers to adopt effective strategies for managing risks from drought?" We find a large gap between policy advisers' actual work and the work envisioned by current applications. Basically, current applications represent and support an impoverished view of what-if analysis. Ourfindingssuggest that for more useful,fit-to-purposeonline models, design thinking needs to change about the scope of users' work and the identities they bring to bear on it. Specifically, designers must recognize and design for the pragmatic and domain expertise that shapes policy analysts' goals, scope, and problem solving processes, including critical influences missing from models-on-the-Web today — the frameworks analysts apply and share with stakeholders about what constitutes legitimate and effective policy arguments. To pursue this line of research, we face several challenges, for example how to capture and represent users' actual work in context when it is nonlinear and dynamic; and how to extract and represent models of work embodied in existing applications in ways that map to descriptions of users' contextual and goal-driven work. To address these challenges, we combine diverse methods that are similarly oriented to designing for goals andfitnessto purpose.
2 Relevant Research A top research priority in many digital government fields is to better support people in what-if analysis and decision-making [Gushing et al. 2003]. Yet few studies exist about the usability of models-on-the-Web, and we have found none that addresses the use of online models for policy analysis.
User Needs in e-Govemment: Conducting Policy Analysis with Models-on-the-Web Given the scant human-computer interaction (HCI) and digital government research about models-on-the-Web, we draw on relevant studies from other fields to establish what is known so far about policy analysts' processes of what-if analysis and to uncover effective strategies for extracting built-in notions of users' goals and complex work from already existing applications. As cognitive psychology and complex problem solving research suggests, policy analysts are experienced problem solvers who have subject matter expertise and domain-specific strategic knowledge [Zachary 1988]. They work within exacting time pressures and construct pragmatic arguments that have to convince stakeholders who have diverse interests, priorities, and perspectives. As research in cognitive psychology and complex problem solving shows, this group of users brings a distinct expertise to analysis that defines their inquiry approaches [Feltovich et al. 2004; Pannell 2004; Zeitz 1997; Feltovich et al. 1997; Johnson 1988; Zachary 1988]. The following expertise is relevant: Experts play with system constraints and determine effects of possible actions and choices before ever making them. They focus more on strategies than procedures and choose moves based on their mental images of and visible cues about a model, its assumptions, and its interactions and dependencies among variables. They call up domain- and policy discourse-specific frameworks to define their problems and appropriate lines of argument, frameworks, for example, to argue that a model is credible for the situation at hand. In addition, expert analysts are distinctively skilled in discovering patterns in data displays and matching them to their own array of well-developed mental patterns for the outcomes that could occur in a situation in question. In terms of fitting software support to these analysts' expert-based approaches to work, research in decision support systems emphasizes that for complex tasks, decisions, and fuzziness, analysts need support and guidance in information processing, process structuring, and communication/reporting [Zigura & Buckland 1998]. In addition, for the integrated human judgement and quantitative, statistical forecasts that this work involves, analysts require autonomy and support for the following judgements: Choosing an appropriate model for one's purposes; determining relevant and acceptable inputs for scenarios; arranging output for meaning; and adjusting statistical forecasts to account for special factors [Fildes et al. in press]. Control over these judgements gives analysts a sense of ownership of their work, and ownership has proven to be a prime determinant of effective solutions [Fildes et al. in press]. Yet user control has its limits. Certain aspects of a statistical model-on-the-Web must remain opaque to protect analysts from misapprehending the underlying complexity and making inaccurate inferences. This review of relevant research about users is incomplete without an examination of how to turn these analysts' contextually-driven practices and needs into requirements that do not let situation, purpose, and stakeholders' influences fall through the cracks. Studies in requirements engineering shed light on this issue [Cockton 2000; Chung et al. 1999]. For example, 'problem frames' [Jackson 2000] re-orient requirements analysis to users' problems in the world as the driving force behind solutions. In this
133
134
Barbara Mirel, Mary Maker & Jina Huh
perspective, shared phenomena between actors in the problem space give rise to requirements For our study, this means that stakeholders' influences — for instance their expectations for arguments — must be built into the application. Insights into how to generate such problem-sensitive requirements are found in studies on goal modelling and non-functional requirements. The reverse engineering approaches of goal-based requirements analysis methods (GBRAM) are most relevant to our project [Ant6n 1996; Hsi & Potts 2000]. These methods provide a means for inferring and describing goal-based requirements in terms of users' problems and purposes, thereby corresponding to user models drawn from analysts' actual goaldriven work. With this *family resemblance', it becomes possible to compare the two sets of user models. We do so, and to assess the fit we apply evaluation categories found in model-misfit research by Blandford & Green [2002] and Sutcliffe et al. [2000].
3 Our Methodology To develop user models of what-if analysis in context, we conducted field studies of 40 analysts from 10 organizations who are experienced using models in agricultural and ag-ecological decision-making. They included 17 policy analysts, 23 nonacademic and academic research analysts, one business strategist, and three model progranmiers. Some policy analysts are also research analysts and wear different hats at different times. Interviews were semi-structured and asked about drawbacks and benefits of online models and about one or two sample modelling analysis that interviewees had done. Interviewees walked us through their processes, knowledge, challenges, critical incidents, and time constraints. We also observed three of these analysts as they used a model to analyse a policy issue of their choice. From interview and observation data, we constructed policy analysts' workflows and analysed the functional roles of and motivations behind analysts' moves and strategies. We composed scenarios and developed visual 'mountainscape' representations of users' work, a suitable metaphor for complex exploratory inquiry [Mirel & Allmendinger 2004]. We abstracted policy analysts' patterns of inquiries and the phenomena they share with stakeholders that drive this work. To identify the notion of users' work embodied in existing online models, we analysed five online simulations: FAIR — see http://fairmodel.econ.yale.edu/main.htm POLYSYS — see http://agpolicy.org/polysys.html EPIC — see http://www.public.iastate.edu/elvis/i_epic_description.html DREAM — see http://www.i^ri.org/dream.htm Crystal Ball — see http://www.decisoneering.com These models are all similar in scope to our project and cited by interviewees as well known and widely respected. From these online models, we identified common design traits and extracted the notions of work built into them. We adaptively applied GBRAM to one of them and examined the application's structure and the functional relationships among fields and features. From results, we identified goals and concepts that are present and absent, central and peripheral in the application.
User Needs in e-Govemment: Conducting Policy Analysis with Models-on-the-Web
1. Start 2. IMvtDdoaitTMntotion Get comiMtaMt wMti th» modtl 1 . PtonlMtranaiytiidMtomodtl'ilknio Ocfin* tcopt of niod«M>K«d an«i)rsi» Play wHhth* model 4. Ptcparedata Transfomi data as needed Set up the scenario with muHiple module entrte« 5. OMckentriai'compadbMtietarKlreatortabierwss Oiedi erttries'dperMlendes and adverse reactkms «. Runthemodei 7. Validateoutputdabug and modify A. ftcrunthcmodel • . VMidece output, debug and modify 10. Aerunthemodel 11. ValidMe output interpret retuks 12. Cumulatively create convincing Morie*
Figure 1: Fabio's what-if policy analysis.
We compared this extracted user model to our user representations from field study findings and assessed the fit between the two.
4 Results and Discussion 4.1 Field Study: Users' Model-based Analysis in Context 4.1 J
Scenario of Use
The following case is drawn from composite field study findings and informed by the modelling literature [Costantini et al. 2002]. In it, the analyst interacts with a model-on-the-Web that is better aligned to his needs than current applications really are. We number the analyst's inquiry processes and depict these numbers on the 'analytical mountainscapes' storyboarded in Figure 1. In the moutainscape, the analyst moves toward the goal of creating convincing arguments for policymakers — the mountaintop - and his paths take him in and out of an intricate cave, which is the underlying simulation model. He excavates the cave (model) to set up and run scenarios as well as to assure throughout analysis that his new knowledge and evolving arguments are valid and complete.
Below, we represent users* work as application-level patterns of inquiry. Patterns include problems, goals, and questions — the very reasons why modelson-the-Web are created — and they include contextual conditions and constraints that affect the achievement of these goals. In Fabio's case, these latter forces are fairly similar across patterns so we describe them only once below. Following them, we describe the distinct parts of each pattern of inquiry. 4,L3
The Following Contextual Conditions and Constraints Affect Analysis in all Three Patterns of Inquiry — Planning, Set up, and Interpretation • Technical and modelling constraints: Pre-defined formats for input and output: pre-defined dependencies, logic, and allowed inputs, predefined interactivity to access information and analyse output.
• Inputs that may be allowed by modelling rules but that are practically unreasonable. • A tension between model opacity and transparency: The need for some parts of a model to be opaque for accuracy but others to be transparent for user judgements. • Socio-political constraints: Influences exerted by shared and expected conventions for convincing policy arguments, e.g. addressing diverse interests, perspectives, and priorities; discussing long and short term implications of reconmiended policies; presenting a convincing case against rejected alternatives; including qualifications due to model constraints; and addressing who gains and who loses. • Cognitive constraints: Limits in cognitive capacity for processes of expert analysis. • Time constraints. Despite similar constraints across patterns of inquiry, patterns for planning, set up, and interpretation, each has distinct problems, goals, and questions. In these patterns, we present goals at a high level instead of more finely grained ones (e.g. accessing a schematic of an underlying model) because we aim to represent users' work in ways that do not foster a premature shift to mapping low level needs or actions to discrete, context-free features or feature fixes. (See Figure 2.) 4, L4 Implications for Requirements These goal- and problem-oriented patterns suggest that requirements for modelson-the-Web must account for the fact that policy analyses are never complete or valid unless they dynamically involve all these interlocked patterns. Moreover, goals relate to assuring the model's fit with argument purposes, not simply working the model. Toward these ends, requirements go beyond the obvious of assuring the ability to enter inputs and generate and save outputs. They must target users' needs to access the modelling information relevant to public debate, conventions of
User Needs in e-Govemment:
Conducting Policy Analysis with Models-on-the-Web
Planning and Conceptualizing Problem
Is this the right model for my purposes?
Goals and Sub-goals
Get into a comfort zone with the model: Recognize its limitations for my purposes and circumstances. Prove the model is valid and apt for the integrity of my analysis. Consult with model developers for adjustments as needed. Set a scope and questions for analysis that can be addressed by the model. Prepare data /other preconditions to fit model and analysis requirements. Validate my understanding by doing dry runs — mentally, actually, or both.
Questions
Is the model appropriate for my context and focus? How will model constraints qualify my arguments? Are data, elasticities and equations up to date? Setting up and Running Scenarios
Problem
What sets of scenarios will build a case for proposed policy choices?
Goals and Sub-goals
Conceptualize/vet a relevant set of scenarios for various lines of reasoning. Construct each scenario by choosing appropriate and necessary input: Assure internal consistency and reasonable figures in entries Guard against adverse interactions between parameters, other entries. Remember what has been entered already and its effects next entries. Make sure inputs reflect desired scenario and extend cumulative stories. Specify output displays that fit analysis needs. Validate, debug, modify, and re-run scenarios to assure analytical integrity.
Questions
What baselines and other parameters do I use? What interactions occur between changes/input I make? What input have I done ah-eady and does that affect what I enter next? Interpreting and Making Meaning
Problem
What arguments do scenario outcomes alone and cumulatively imply?
Goals and Sub-goals
Validate outcomes. Match patterns in outputs to mental model patterns for a given situation. Manipulate layout and arrangements to facilitate interpretation. Compare results. Progressively find and create a convincing story. Build this case by determining the next scenarios to run. At opportune times, export output to other software for in-depth analysis.
Questions
Are outcomes valid? What equations underlie output numbers or graphs? How much impact did the conditions I entered have and on what? How can I sort output to find patterns and relationships of interest? Figure 2: Patterns of inquiry from a user's point of view.
139
140
Barbara Mirel, Mary Maker & Jina Huh
policy arguments, and analytical integrity and to manage evolving knowledge and cumulative scenarios without overloading their cognitive capacity.
4.2 Notions of Users' Work Built Into Online Simulations 4.2.1 Common Designs in Models-on-the-Web How well do the designs of current models-on-the-Web represent users' actual work in context? Our findings show that it is appropriate to generalize an answer to this question because designs across applications share similar traits. Models-on-theWeb commonly have tabbed modules that include fields for defining inputs, and they have pre-defined output screens. Designs also include a *run' button, output screens — graphs, tables or both — and functionality for reading in data, defining formats for output, saving output, browsing datasets, and editing through toolbars and menus. They rarely allow users to directly manipulate results other than saving and exporting them. Designs offer information about input fields and model constraints through generic help and context-sensitive explanations. 4.2.2 Goal-based Requirements Analysis of one Model-on-the- Web We analysed the Dynamic Research Evaluation for Management (DREAM) model, an application that lets users run scenarios with a multi-market, partial equilibrium model to project how the adoption of various new technologies may affect commodity prices, production, consumption, trade, and household income in one or many regions. From this application, we abstracted its structure to identify where support for analysis is *bloated' and where it is impoverished. Figure 3a depicts the nine modules DREAM presents in tabbed screens, and Figure 3b shows the inputs that screens within niodules allow users to enter. As Figure 3 suggests, structurally the application is 'set up-centric', 'bloated' for set up goals and tasks. The predominant goals it supports coincide with what we defined earlier in the set up pattern of inquiry as 'Construct each scenario by choosing appropriate and necessary input' and 'Specify output displays that fit analysis needs'. Support for other goals, e.g. vetting a relevant set of scenarios or arranging data displays in output, is absent. Getting into a comfort zone with a model is peripheral. 4.2.3 Misuse of Information Objects In order to achieve these other goals, DREAM expects users either to do the work in other software or to infer modelling logic and assumptions from visible modules that reflect input requirements and from generic information in documentation, online help and field-sensitive help. Clearly, as Fabio's scenario shows, planning and interpretation in analysis are neither separate activities nor can users complete them simply by amassing separate pieces of data or information objects. Rather they involve complex pr
User Needs in e-Govemment: Conducting Policy Analysis with Models-on-the-Web (a)
F s s t u p 1 [ 8«tup 1 [ Stiidy 1 ISOMMllO I
R&D Mch yaar Extwwiog
Omit oosti/bwYefKi |fromrMuN8 OtfMT
3[
D«flM NMJ ^P«»Wy Tlm^ P * * ^ * ^ * * !
141
I I RMMlto I
TranamiMion oos^
Initial pKcM
Raal diaiiount rata
,
(b)
Figure 3: Structure of DREAM, (a) The sequence of modules displayed through tabbed screens, (b) Input fields for user entries, with links showing dependencies between entries.
the order users should follow to set up and run scenarios. But this order is not beneficial for getting into a comfort zone and understanding the 'guts' of the model's methodology. To explore the model, users need to start with the fourth and fifth tabs related to technology inputs. Unfortunately, nothing in the visible structure of DREAM signals this importance. Moreover, even when users go to these screens, context sensitive help for fields gives only generic definitions of discrete modelling factors; and users can only get cues about the roles of various variables and associated equations through error messages. Structurally missing in the application is any layering of guidance information for problem solving purposes. It is not just that 'set up-centricity' and dictionary-type help diminish ease of use. In terms offitness-to-purpose,they actually dictate against the expert problem solving exploration and mentally projected moves and strategies that characterize analysts' work. In terms of problems and goals, it appears that DREAM assumes the pattern of inquiry shown in Figure 4 which, as we detail in the next section, is impoverished compared to users' real world work. DREAM also assumes a pattern for interpreting output but envisions it largely as a read-only activity, as discussed in the next section. The workflows built into DREAM'S structures and functionality are shown in Figure 5,
142
Barbara Mirel, Mary Maker & Jina Huh
Setting up and Running Scenarios Problem
How should I specify the inputs to set up and run this single scenario?
Goals and Sub-goals
Conceptualize scenarios for various lines of reasoning. Construct the scenario by choosing appropriate and necessary input Assure dependencies and conditional relations across entries are met Make sure inputs are complete and in the right form. Specify output displays that fit analysis needs.
Questions
What baselines do I use? What elasticities make sense for this set up? What values are feasible based on tiie modelling rules and assumptions? What input have I done already and does that affect what I enter next?
Conditions and Constraints Obstacles addressed
Technical constraints: Predefined formats for input and output; predefined dependencies, logic and allowed inputs; predefined interactivity. Tension between model opacity and transparency. Time pressures. Lack of ready access to information about parts of the model.
Figure 4: Pattern of inquiry from DREAM'S point of view.