Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbruecken, Germany
6773
Randall Shumaker (Ed.)
Virtual and Mixed Reality – New Trends International Conference, Virtual and Mixed Reality 2011 Held as Part of HCI International 2011 Orlando, FL, USA, July 9-14, 2011 Proceedings, Part I
13
Volume Editor Randall Shumaker University of Central Florida Institute for Simulation and Training 3100 Technology Parkway and 3280 Progress Drive Orlando, FL 32826, USA E-mail:
[email protected]
ISSN 0302-9743 e-ISSN 1611-3349 ISBN 978-3-642-22020-3 e-ISBN 978-3-642-22021-0 DOI 10.1007/978-3-642-22021-0 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: Applied for CR Subject Classification (1998): H.5, H.4, I.3, I.2, C.3, I.4, I.6 LNCS Sublibrary: SL 3 – Information Systems and Application, incl. Internet/Web and HCI
© Springer-Verlag Berlin Heidelberg 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Foreword
The 14th International Conference on Human–Computer Interaction, HCI International 2011, was held in Orlando, Florida, USA, July 9–14, 2011, jointly with the Symposium on Human Interface (Japan) 2011, the 9th International Conference on Engineering Psychology and Cognitive Ergonomics, the 6th International Conference on Universal Access in Human–Computer Interaction, the 4th International Conference on Virtual and Mixed Reality, the 4th International Conference on Internationalization, Design and Global Development, the 4th International Conference on Online Communities and Social Computing, the 6th International Conference on Augmented Cognition, the Third International Conference on Digital Human Modeling, the Second International Conference on Human-Centered Design, and the First International Conference on Design, User Experience, and Usability. A total of 4,039 individuals from academia, research institutes, industry and governmental agencies from 67 countries submitted contributions, and 1,318 papers that were judged to be of high scientific quality were included in the program. These papers address the latest research and development efforts and highlight the human aspects of design and use of computing systems. The papers accepted for presentation thoroughly cover the entire field of human–computer interaction, addressing major advances in knowledge and effective use of computers in a variety of application areas. This volume, edited by Randall Shumaker, contains papers in the thematic area of virtual and mixed reality (VMR), addressing the following major topics: • • • •
Augmented reality applications Virtual and immersive environments Novel interaction devices and techniques in VR Human physiology and behaviour in VR environments
The remaining volumes of the HCI International 2011 Proceedings are: • Volume 1, LNCS 6761, Human–Computer Interaction—Design and Development Approaches (Part I), edited by Julie A. Jacko • Volume 2, LNCS 6762, Human–Computer Interaction—Interaction Techniques and Environments (Part II), edited by Julie A. Jacko • Volume 3, LNCS 6763, Human–Computer Interaction—Towards Mobile and Intelligent Interaction Environments (Part III), edited by Julie A. Jacko • Volume 4, LNCS 6764, Human–Computer Interaction—Users and Applications (Part IV), edited by Julie A. Jacko • Volume 5, LNCS 6765, Universal Access in Human–Computer Interaction— Design for All and eInclusion (Part I), edited by Constantine Stephanidis • Volume 6, LNCS 6766, Universal Access in Human–Computer Interaction— Users Diversity (Part II), edited by Constantine Stephanidis
VI
Foreword
• Volume 7, LNCS 6767, Universal Access in Human–Computer Interaction— Context Diversity (Part III), edited by Constantine Stephanidis • Volume 8, LNCS 6768, Universal Access in Human–Computer Interaction— Applications and Services (Part IV), edited by Constantine Stephanidis • Volume 9, LNCS 6769, Design, User Experience, and Usability—Theory, Methods, Tools and Practice (Part I), edited by Aaron Marcus • Volume 10, LNCS 6770, Design, User Experience, and Usability— Understanding the User Experience (Part II), edited by Aaron Marcus • Volume 11, LNCS 6771, Human Interface and the Management of Information—Design and Interaction (Part I), edited by Michael J. Smith and Gavriel Salvendy • Volume 12, LNCS 6772, Human Interface and the Management of Information—Interacting with Information (Part II), edited by Gavriel Salvendy and Michael J. Smith • Volume 14, LNCS 6774, Virtual and Mixed Reality—Systems and Applications (Part II), edited by Randall Shumaker • Volume 15, LNCS 6775, Internationalization, Design and Global Development, edited by P.L. Patrick Rau • Volume 16, LNCS 6776, Human-Centered Design, edited by Masaaki Kurosu • Volume 17, LNCS 6777, Digital Human Modeling, edited by Vincent G. Duffy • Volume 18, LNCS 6778, Online Communities and Social Computing, edited by A. Ant Ozok and Panayiotis Zaphiris • Volume 19, LNCS 6779, Ergonomics and Health Aspects of Work with Computers, edited by Michelle M. Robertson • Volume 20, LNAI 6780, Foundations of Augmented Cognition: Directing the Future of Adaptive Systems, edited by Dylan D. Schmorrow and Cali M. Fidopiastis • Volume 21, LNAI 6781, Engineering Psychology and Cognitive Ergonomics, edited by Don Harris • Volume 22, CCIS 173, HCI International 2011 Posters Proceedings (Part I), edited by Constantine Stephanidis • Volume 23, CCIS 174, HCI International 2011 Posters Proceedings (Part II), edited by Constantine Stephanidis I would like to thank the Program Chairs and the members of the Program Boards of all Thematic Areas, listed herein, for their contribution to the highest scientific quality and the overall success of the HCI International 2011 Conference. In addition to the members of the Program Boards, I also wish to thank the following volunteer external reviewers: Roman Vilimek from Germany, Ramalingam Ponnusamy from India, Si Jung “Jun” Kim from the USA, and Ilia Adami, Iosif Klironomos, Vassilis Kouroumalis, George Margetis, and Stavroula Ntoa from Greece.
Foreword
VII
This conference would not have been possible without the continuous support and advice of the Conference Scientific Advisor, Gavriel Salvendy, as well as the dedicated work and outstanding efforts of the Communications and Exhibition Chair and Editor of HCI International News, Abbas Moallem. I would also like to thank for their contribution toward the organization of the HCI International 2011 Conference the members of the Human–Computer Interaction Laboratory of ICS-FORTH, and in particular Margherita Antona, George Paparoulis, Maria Pitsoulaki, Stavroula Ntoa, Maria Bouhli and George Kapnas. July 2011
Constantine Stephanidis
Organization
Ergonomics and Health Aspects of Work with Computers Program Chair: Michelle M. Robertson Arne Aar˚ as, Norway Pascale Carayon, USA Jason Devereux, UK Wolfgang Friesdorf, Germany Martin Helander, Singapore Ed Israelski, USA Ben-Tzion Karsh, USA Waldemar Karwowski, USA Peter Kern, Germany Danuta Koradecka, Poland Nancy Larson, USA Kari Lindstr¨om, Finland
Brenda Lobb, New Zealand Holger Luczak, Germany William S. Marras, USA Aura C. Matias, Philippines Matthias R¨ otting, Germany Michelle L. Rogers, USA Dominique L. Scapin, France Lawrence M. Schleifer, USA Michael J. Smith, USA Naomi Swanson, USA Peter Vink, The Netherlands John Wilson, UK
Human Interface and the Management of Information Program Chair: Michael J. Smith Hans-J¨ org Bullinger, Germany Alan Chan, Hong Kong Shin’ichi Fukuzumi, Japan Jon R. Gunderson, USA Michitaka Hirose, Japan Jhilmil Jain, USA Yasufumi Kume, Japan Mark Lehto, USA Hirohiko Mori, Japan Fiona Fui-Hoon Nah, USA Shogo Nishida, Japan Robert Proctor, USA
Youngho Rhee, Korea Anxo Cereijo Roib´ as, UK Katsunori Shimohara, Japan Dieter Spath, Germany Tsutomu Tabe, Japan Alvaro D. Taveira, USA Kim-Phuong L. Vu, USA Tomio Watanabe, Japan Sakae Yamamoto, Japan Hidekazu Yoshikawa, Japan Li Zheng, P. R. China
X
Organization
Human–Computer Interaction Program Chair: Julie A. Jacko Sebastiano Bagnara, Italy Sherry Y. Chen, UK Marvin J. Dainoff, USA Jianming Dong, USA John Eklund, Australia Xiaowen Fang, USA Ayse Gurses, USA Vicki L. Hanson, UK Sheue-Ling Hwang, Taiwan Wonil Hwang, Korea Yong Gu Ji, Korea Steven A. Landry, USA
Gitte Lindgaard, Canada Chen Ling, USA Yan Liu, USA Chang S. Nam, USA Celestine A. Ntuen, USA Philippe Palanque, France P.L. Patrick Rau, P.R. China Ling Rothrock, USA Guangfeng Song, USA Steffen Staab, Germany Wan Chul Yoon, Korea Wenli Zhu, P.R. China
Engineering Psychology and Cognitive Ergonomics Program Chair: Don Harris Guy A. Boy, USA Pietro Carlo Cacciabue, Italy John Huddlestone, UK Kenji Itoh, Japan Hung-Sying Jing, Taiwan Wen-Chin Li, Taiwan James T. Luxhøj, USA Nicolas Marmaras, Greece Sundaram Narayanan, USA Mark A. Neerincx, The Netherlands
Jan M. Noyes, UK Kjell Ohlsson, Sweden Axel Schulte, Germany Sarah C. Sharples, UK Neville A. Stanton, UK Xianghong Sun, P.R. China Andrew Thatcher, South Africa Matthew J.W. Thomas, Australia Mark Young, UK Rolf Zon, The Netherlands
Universal Access in Human–Computer Interaction Program Chair: Constantine Stephanidis Julio Abascal, Spain Ray Adams, UK Elisabeth Andr´e, Germany Margherita Antona, Greece Chieko Asakawa, Japan Christian B¨ uhler, Germany Jerzy Charytonowicz, Poland Pier Luigi Emiliani, Italy
Michael Fairhurst, UK Dimitris Grammenos, Greece Andreas Holzinger, Austria Simeon Keates, Denmark Georgios Kouroupetroglou, Greece Sri Kurniawan, USA Patrick M. Langdon, UK Seongil Lee, Korea
Organization
Zhengjie Liu, P.R. China Klaus Miesenberger, Austria Helen Petrie, UK Michael Pieper, Germany Anthony Savidis, Greece Andrew Sears, USA Christian Stary, Austria
Hirotada Ueda, Japan Jean Vanderdonckt, Belgium Gregg C. Vanderheiden, USA Gerhard Weber, Germany Harald Weber, Germany Panayiotis Zaphiris, Cyprus
Virtual and Mixed Reality Program Chair: Randall Shumaker Pat Banerjee, USA Mark Billinghurst, New Zealand Charles E. Hughes, USA Simon Julier, UK David Kaber, USA Hirokazu Kato, Japan Robert S. Kennedy, USA Young J. Kim, Korea Ben Lawson, USA Gordon McK Mair, UK
David Pratt, UK Albert “Skip” Rizzo, USA Lawrence Rosenblum, USA Jose San Martin, Spain Dieter Schmalstieg, Austria Dylan Schmorrow, USA Kay Stanney, USA Janet Weisenford, USA Mark Wiederhold, USA
Internationalization, Design and Global Development Program Chair: P.L. Patrick Rau Michael L. Best, USA Alan Chan, Hong Kong Lin-Lin Chen, Taiwan Andy M. Dearden, UK Susan M. Dray, USA Henry Been-Lirn Duh, Singapore Vanessa Evers, The Netherlands Paul Fu, USA Emilie Gould, USA Sung H. Han, Korea Veikko Ikonen, Finland Toshikazu Kato, Japan Esin Kiris, USA Apala Lahiri Chavan, India
James R. Lewis, USA James J.W. Lin, USA Rungtai Lin, Taiwan Zhengjie Liu, P.R. China Aaron Marcus, USA Allen E. Milewski, USA Katsuhiko Ogawa, Japan Oguzhan Ozcan, Turkey Girish Prabhu, India Kerstin R¨ ose, Germany Supriya Singh, Australia Alvin W. Yeo, Malaysia Hsiu-Ping Yueh, Taiwan
XI
XII
Organization
Online Communities and Social Computing Program Chairs: A. Ant Ozok, Panayiotis Zaphiris Chadia N. Abras, USA Chee Siang Ang, UK Peter Day, UK Fiorella De Cindio, Italy Heidi Feng, USA Anita Komlodi, USA Piet A.M. Kommers, The Netherlands Andrew Laghos, Cyprus Stefanie Lindstaedt, Austria Gabriele Meiselwitz, USA Hideyuki Nakanishi, Japan
Anthony F. Norcio, USA Ulrike Pfeil, UK Elaine M. Raybourn, USA Douglas Schuler, USA Gilson Schwartz, Brazil Laura Slaughter, Norway Sergei Stafeev, Russia Asimina Vasalou, UK June Wei, USA Haibin Zhu, Canada
Augmented Cognition Program Chairs: Dylan D. Schmorrow, Cali M. Fidopiastis Monique Beaudoin, USA Chris Berka, USA Joseph Cohn, USA Martha E. Crosby, USA Julie Drexler, USA Ivy Estabrooke, USA Chris Forsythe, USA Wai Tat Fu, USA Marc Grootjen, The Netherlands Jefferson Grubb, USA Santosh Mathan, USA
Rob Matthews, Australia Dennis McBride, USA Eric Muth, USA Mark A. Neerincx, The Netherlands Denise Nicholson, USA Banu Onaral, USA Kay Stanney, USA Roy Stripling, USA Rob Taylor, UK Karl van Orden, USA
Digital Human Modeling Program Chair: Vincent G. Duffy Karim Abdel-Malek, USA Giuseppe Andreoni, Italy Thomas J. Armstrong, USA Norman I. Badler, USA Fethi Calisir, Turkey Daniel Carruth, USA Keith Case, UK Julie Charland, Canada
Yaobin Chen, USA Kathryn Cormican, Ireland Daniel A. DeLaurentis, USA Yingzi Du, USA Okan Ersoy, USA Enda Fallon, Ireland Yan Fu, P.R. China Afzal Godil, USA
Organization
Ravindra Goonetilleke, Hong Kong Anand Gramopadhye, USA Lars Hanson, Sweden Pheng Ann Heng, Hong Kong Bo Hoege, Germany Hongwei Hsiao, USA Tianzi Jiang, P.R. China Nan Kong, USA Steven A. Landry, USA Kang Li, USA Zhizhong Li, P.R. China Tim Marler, USA
XIII
Ahmet F. Ozok, Turkey Srinivas Peeta, USA Sudhakar Rajulu, USA Matthias R¨ otting, Germany Matthew Reed, USA Johan Stahre, Sweden Mao-Jiun Wang, Taiwan Xuguang Wang, France Jingzhou (James) Yang, USA Gulcin Yucel, Turkey Tingshao Zhu, P.R. China
Human-Centered Design Program Chair: Masaaki Kurosu Julio Abascal, Spain Simone Barbosa, Brazil Tomas Berns, Sweden Nigel Bevan, UK Torkil Clemmensen, Denmark Susan M. Dray, USA Vanessa Evers, The Netherlands Xiaolan Fu, P.R. China Yasuhiro Horibe, Japan Jason Huang, P.R. China Minna Isomursu, Finland Timo Jokela, Finland Mitsuhiko Karashima, Japan Tadashi Kobayashi, Japan Seongil Lee, Korea Kee Yong Lim, Singapore
Zhengjie Liu, P.R. China Lo¨ıc Mart´ınez-Normand, Spain Monique Noirhomme-Fraiture, Belgium Philippe Palanque, France Annelise Mark Pejtersen, Denmark Kerstin R¨ ose, Germany Dominique L. Scapin, France Haruhiko Urokohara, Japan Gerrit C. van der Veer, The Netherlands Janet Wesson, South Africa Toshiki Yamaoka, Japan Kazuhiko Yamazaki, Japan Silvia Zimmermann, Switzerland
Design, User Experience, and Usability Program Chair: Aaron Marcus Ronald Baecker, Canada Barbara Ballard, USA Konrad Baumann, Austria Arne Berger, Germany Randolph Bias, USA Jamie Blustein, Canada
Ana Boa-Ventura, USA Lorenzo Cantoni, Switzerland Sameer Chavan, Korea Wei Ding, USA Maximilian Eibl, Germany Zelda Harrison, USA
XIV
Organization
R¨ udiger Heimg¨artner, Germany Brigitte Herrmann, Germany Sabine Kabel-Eckes, USA Kaleem Khan, Canada Jonathan Kies, USA Jon Kolko, USA Helga Letowt-Vorbek, South Africa James Lin, USA Frazer McKimm, Ireland Michael Renner, Switzerland
Christine Ronnewinkel, Germany Elizabeth Rosenzweig, USA Paul Sherman, USA Ben Shneiderman, USA Christian Sturm, Germany Brian Sullivan, USA Jaakko Villa, Finland Michele Visciola, Italy Susan Weinschenk, USA
HCI International 2013
The 15th International Conference on Human–Computer Interaction, HCI International 2013, will be held jointly with the affiliated conferences in the summer of 2013. It will cover a broad spectrum of themes related to human–computer interaction (HCI), including theoretical issues, methods, tools, processes and case studies in HCI design, as well as novel interaction techniques, interfaces and applications. The proceedings will be published by Springer. More information about the topics, as well as the venue and dates of the conference, will be announced through the HCI International Conference series website: http://www.hci-international.org/ General Chair Professor Constantine Stephanidis University of Crete and ICS-FORTH Heraklion, Crete, Greece Email:
[email protected]
Table of Contents – Part I
Part I: Augmented Reality Applications AR Based Environment for Exposure Therapy to Mottephobia . . . . . . . . . Andrea F. Abate, Michele Nappi, and Stefano Ricciardi
3
Designing Augmented Reality Tangible Interfaces for Kindergarten Children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pedro Campos and Sofia Pessanha
12
lMAR: Highly Parallel Architecture for Markerless Augmented Reality in Aircraft Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrea Caponio, Mauricio Hincapi´e, and Eduardo Gonz´ alez Mendivil
20
5-Finger Exoskeleton for Assembly Training in Augmented Reality . . . . . Siam Charoenseang and Sarut Panjan Remote Context Monitoring of Actions and Behaviors in a Location through 3D Visualization in Real-Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . John Conomikes, Zachary Pacheco, Salvador Barrera, Juan Antonio Cantu, Lucy Beatriz Gomez, Christian de los Reyes, Juan Manuel Mendez-Villarreal, Takeo Shime, Yuki Kamiya, Hedeki Kawai, Kazuo Kunieda, and Keiji Yamada Spatial Clearance Verification Using 3D Laser Range Scanner and Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hirotake Ishii, Shuhei Aoyama, Yoshihito Ono, Weida Yan, Hiroshi Shimoda, and Masanori Izumi
30
40
45
Development of Mobile AR Tour Application for the National Palace Museum of Korea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jae-Beom Kim and Changhoon Park
55
A Vision-Based Mobile Augmented Reality System for Baseball Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seong-Oh Lee, Sang Chul Ahn, Jae-In Hwang, and Hyoung-Gon Kim
61
Social Augmented Reality for Sensor Visualization in Ubiquitous Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Youngho Lee, Jongmyung Choi, Sehwan Kim, Seunghun Lee, and Say Jang
69
XVIII
Table of Contents – Part I
Digital Diorama: AR Exhibition System to Convey Background Information for Museums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Takuji Narumi, Oribe Hayashi, Kazuhiro Kasada, Mitsuhiko Yamazaki, Tomohiro Tanikawa, and Michitaka Hirose Augmented Reality: An Advantageous Option for Complex Training and Maintenance Operations in Aeronautic Related Processes . . . . . . . . . Horacio Rios, Mauricio Hincapi´e, Andrea Caponio, Emilio Mercado, and Eduardo Gonz´ alez Mend´ıvil
76
87
Enhancing Marker-Based AR Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . Jonghoon Seo, Jinwook Shim, Ji Hye Choi, James Park, and Tack-don Han
97
MSL AR Toolkit: AR Authoring Tool with Interactive Features . . . . . . . . Jinwook Shim, Jonghoon Seo, and Tack-don Han
105
Camera-Based In-situ 3D Modeling Techniques for AR Diorama in Ubiquitous Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Atsushi Umakatsu, Hiroyuki Yasuhara, Tomohiro Mashita, Kiyoshi Kiyokawa, and Haruo Takemura Design Criteria for AR-Based Training of Maintenance and Assembly Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sabine Webel, Ulrich Bockholt, and Jens Keil
113
123
Part II: Virtual and Immersive Environments Object Selection in Virtual Environments Performance, Usability and Interaction with Spatial Abilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andreas Baier, David Wittmann, and Martin Ende
135
Effects of Menu Orientation on Pointing Behavior in Virtual Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nguyen-Thong Dang and Daniel Mestre
144
Some Evidences of the Impact of Environment’s Design Features in Routes Selection in Virtual Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . Em´ılia Duarte, Elisˆ angela Vilar, Francisco Rebelo, J´ ulia Teles, and Ana Almeida
154
Evaluating Human-Robot Interaction during a Manipulation Experiment Conducted in Immersive Virtual Reality . . . . . . . . . . . . . . . . . Mihai Duguleana, Florin Grigorie Barbuceanu, and Gheorghe Mogan
164
3-D Sound Reproduction System for Immersive Environments Based on the Boundary Surface Control Principle . . . . . . . . . . . . . . . . . . . . . . . . . . Seigo Enomoto, Yusuke Ikeda, Shiro Ise, and Satoshi Nakamura
174
Table of Contents – Part I
XIX
Workspace-Driven, Blended Orbital Viewing in Immersive Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scott Frees and David Lancellotti
185
Irradiating Heat in Virtual Environments: Algorithm and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marco Gaudina, Andrea Brogni, and Darwin Caldwell
194
Providing Immersive Virtual Experience with First-person Perspective Omnidirectional Movies and Three Dimensional Sound Field . . . . . . . . . . Kazuaki Kondo, Yasuhiro Mukaigawa, Yusuke Ikeda, Seigo Enomoto, Shiro Ise, Satoshi Nakamura, and Yasushi Yagi Intercepting Virtual Ball in Immersive Virtual Environment . . . . . . . . . . . Massimiliano Valente, Davide Sobrero, Andrea Brogni, and Darwin Caldwell
204
214
Part III: Novel Interaction Devices and Techniques in VR Concave-Convex Surface Perception by Visuo-vestibular Stimuli for Five-Senses Theater . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tomohiro Amemiya, Koichi Hirota, and Yasushi Ikei
225
Touching Sharp Virtual Objects Produces a Haptic Illusion . . . . . . . . . . . Andrea Brogni, Darwin G. Caldwell, and Mel Slater
234
Whole Body Interaction Using the Grounded Bar Interface . . . . . . . . . . . . Bong-gyu Jang, Hyunseok Yang, and Gerard J. Kim
243
Digital Display Case Using Non-contact Head Tracking . . . . . . . . . . . . . . . Takashi Kajinami, Takuji Narumi, Tomohiro Tanikawa, and Michitaka Hirose
250
Meta Cookie+: An Illusion-Based Gustatory Display . . . . . . . . . . . . . . . . . Takuji Narumi, Shinya Nishizaka, Takashi Kajinami, Tomohiro Tanikawa, and Michitaka Hirose
260
LIS3D: Low-Cost 6DOF Laser Interaction for Outdoor Mixed Reality . . . Pedro Santos, Hendrik Schmedt, Bernd Amend, Philip Hammer, Ronny Giera, Elke Hergenr¨ other, and Andr´e Stork
270
Olfactory Display Using Visual Feedback Based on Olfactory Sensory Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tomohiro Tanikawa, Aiko Nambu, Takuji Narumi, Kunihiro Nishimura, and Michitaka Hirose
280
XX
Table of Contents – Part I
Towards Noninvasive Brain-Computer Interfaces during Standing for VR Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hideaki Touyama
290
Part IV: Human Physiology and Behaviour in VR Environments Stereoscopic Vision Induced by Parallax Images on HMD and its Influence on Visual Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Satoshi Hasegawa, Akira Hasegawa, Masako Omori, Hiromu Ishio, Hiroki Takada, and Masaru Miyao Comparison of Accommodation and Convergence by Simultaneous Measurements during 2D and 3D Vision Gaze . . . . . . . . . . . . . . . . . . . . . . . Hiroki Hori, Tomoki Shiomi, Tetsuya Kanda, Akira Hasegawa, Hiromu Ishio, Yasuyuki Matsuura, Masako Omori, Hiroki Takada, Satoshi Hasegawa, and Masaru Miyao Tracking the UFO’s Paths: Using Eye-Tracking for the Evaluation of Serious Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael D. Kickmeier-Rust, Eva Hillemann, and Dietrich Albert The Online Gait Measurement for Characteristic Gait Animation Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yasushi Makihara, Mayu Okumura, Yasushi Yagi, and Shigeo Morishima
297
306
315
325
Measuring and Modeling of Multi-layered Subsurface Scattering for Human Skin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tomohiro Mashita, Yasuhiro Mukaigawa, and Yasushi Yagi
335
An Indirect Measure of the Implicit Level of Presence in Virtual Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Steven Nunnally and Durell Bouchard
345
Effect of Weak Hyperopia on Stereoscopic Vision . . . . . . . . . . . . . . . . . . . . . Masako Omori, Asei Sugiyama, Hiroki Hori, Tomoki Shiomi, Tetsuya Kanda, Akira Hasegawa, Hiromu Ishio, Hiroki Takada, Satoshi Hasegawa, and Masaru Miyao Simultaneous Measurement of Lens Accommodation and Convergence to Real Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tomoki Shiomi, Hiromu Ishio, Hiroki Hori, Hiroki Takada, Masako Omori, Satoshi Hasegawa, Shohei Matsunuma, Akira Hasegawa, Tetsuya Kanda, and Masaru Miyao
354
363
Table of Contents – Part I
Comparison in Degree of the Motion Sickness Induced by a 3-D Movie on an LCD and an HMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hiroki Takada, Yasuyuki Matsuura, Masumi Takada, and Masaru Miyao Evaluation of Human Performance Using Two Types of Navigation Interfaces in Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lu´ıs Teixeira, Em´ılia Duarte, J´ ulia Teles, and Francisco Rebelo Use of Neurophysiological Metrics within a Real and Virtual Perceptual Skills Task to Determine Optimal Simulation Fidelity Requirements . . . . Jack Vice, Anna Skinner, Chris Berka, Lauren Reinerman-Jones, Daniel Barber, Nicholas Pojman, Veasna Tan, Marc Sebrechts, and Corinna Lathan Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
XXI
371
380
387
401
Table of Contents – Part II
Part I: VR in Education, Training and Health Serious Games for Psychological Health Education . . . . . . . . . . . . . . . . . . . Anya Andrews
3
Mixed Reality as a Means to Strengthen Post-stroke Rehabilitation . . . . Ines Di Loreto, Liesjet Van Dokkum, Abdelkader Gouaich, and Isabelle Laffont
11
A Virtual Experiment Platform for Mechanism Motion Cognitive Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiumin Fan, Xi Zhang, Huangchong Cheng, Yanjun Ma, and Qichang He
20
Mechatronic Prototype for Rigid Endoscopy Simulation . . . . . . . . . . . . . . . Byron P´erez-Guti´errez, Camilo Ariza-Zambrano, and Juan Camilo Hern´ andez
30
Patterns of Gaming Preferences and Serious Game Effectiveness . . . . . . . Katelyn Procci, James Bohnsack, and Clint Bowers
37
Serious Games for the Therapy of the Posttraumatic Stress Disorder of Children and Adolescents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rafael Radkowski, Wilfried Huck, Gitta Domik, and Martin Holtmann Virtual Reality as Knowledge Enhancement Tool for Musculoskeletal Pathology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sophia Sakellariou, Vassilis Charissis, Stephen Grant, Janice Turner, Dianne Kelly, and Chistodoulos Christomanos
44
54
Study of Optimal Behavior in Complex Virtual Training Systems . . . . . . Jose San Martin
64
Farming Education: A Case for Social Games in Learning . . . . . . . . . . . . . Peter Smith and Alicia Sanchez
73
Sample Size Estimation for Statistical Comparative Test of Training by Using Augmented Reality via Theoretical Formula and OCC Graphs: Aeronautical Case of a Component Assemblage . . . . . . . . . . . . . . . . . . . . . . Fernando Su´ arez-Warden, Yocelin Cervantes-Gloria, and Eduardo Gonz´ alez-Mend´ıvil
80
XXIV
Table of Contents – Part II
Enhancing English Learning Website Content and User Interface Functions Using Integrated Quality Assessment . . . . . . . . . . . . . . . . . . . . . . Dylan Sung The Influence of Virtual World Interactions toward Driving Real World Behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hari Thiruvengada, Paul Derby, Wendy Foslien, John Beane, and Anand Tharanathan Interactive Performance: Dramatic Improvisation in a Mixed Reality Environment for Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jeff Wirth, Anne E. Norris, Dan Mapes, Kenneth E. Ingraham, and J. Michael Moshell Emotions and Telerebabilitation: Pilot Clinical Trials for Virtual Telerebabilitation Application Using Haptic Device and Its Impact on Post Stroke Patients’ Mood and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . Shih-Ching Yeh, Margaret McLaughlin, Yujung Nam, Scott Sanders, Chienyen Chang, Bonnie Kennedy, Sheryl Flynn, Belinda Lange, Lei Li, Shu-ya Chen, Maureen Whitford, Carolee Winstein, Younbo Jung, and Albert Rizzo An Interactive Multimedia System for Parkinson’s Patient Rehabilitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wenhui Yu, Catherine Vuong, and Todd Ingalls
90
100
110
119
129
Part II: VR for Culture and Entertainment VClav 2.0 – System for Playing 3D Virtual Copy of a Historical Clavichord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Krzysztof Gardo and Ewa Lukasik A System for Creating the Content for a Multi-sensory Theater . . . . . . . . Koichi Hirota, Seichiro Ebisawa, Tomohiro Amemiya, and Yasushi Ikei Wearable Display System for Handing Down Intangible Cultural Heritage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Atsushi Hiyama, Yusuke Doyama, Mariko Miyashita, Eikan Ebuchi, Masazumi Seki, and Michitaka Hirose Stroke-Based Semi-automatic Region of Interest Detection Algorithm for In-Situ Painting Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Youngkyoon Jang and Woontack Woo Personalized Voice Assignment Techniques for Synchronized Scenario Speech Output in Entertainment Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . Shin-ichi Kawamoto, Tatsuo Yotsukura, Satoshi Nakamura, and Shigeo Morishima
141 151
158
167
177
Table of Contents – Part II
Instant Movie Casting with Personality: Dive Into the Movie System . . . Shigeo Morishima, Yasushi Yagi, and Satoshi Nakamura A Realtime and Direct-Touch Interaction System for the 3D Cultural Artifact Exhibition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wataru Wakita, Katsuhito Akahane, Masaharu Isshiki, and Hiromi T. Tanaka Digital Display Case: A Study on the Realization of a Virtual Transportation System for a Museum Collection . . . . . . . . . . . . . . . . . . . . . Takafumi Watanabe, Kenji Inose, Makoto Ando, Takashi Kajinami, Takuji Narumi, Tomohiro Tanikawa, and Michitaka Hirose
XXV
187
197
206
Part III: Virtual Humans and Avatars Integrating Multi-agents in a 3D Serious Game Aimed at Cognitive Stimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Priscilla F. de Abreu, Luis Alfredo V. de Carvalho, Vera Maria B. Werneck, and Rosa Maria E. Moreira da Costa Automatic 3-D Facial Fitting Technique for a Second Life Avatar . . . . . . Hiroshi Dohi and Mitsuru Ishizuka Reflected in a Liquid Crystal Display: Personalization and the Use of Avatars in Serious Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shan Lakhmani and Clint Bowers Leveraging Unencumbered Full Body Control of Animated Virtual Characters for Game-Based Rehabilitation . . . . . . . . . . . . . . . . . . . . . . . . . . Belinda Lange, Evan A. Suma, Brad Newman, Thai Phan, Chien-Yen Chang, Albert Rizzo, and Mark Bolas Interactive Exhibition with Ambience Using Video Avatar and Animation on Huge Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hasup Lee, Yoshisuke Tateyama, Tetsuro Ogi, Teiichi Nishioka, Takuro Kayahara, and Kenichi Shinoda
217
227
237
243
253
Realistic Facial Animation by Automatic Individual Head Modeling and Facial Muscle Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Akinobu Maejima, Hiroyuki Kubo, and Shigeo Morishima
260
Geppetto: An Environment for the Efficient Control And Transmission of Digital Puppetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daniel P. Mapes, Peter Tonner, and Charles E. Hughes
270
XXVI
Table of Contents – Part II
Body Buddies: Social Signaling through Puppeteering . . . . . . . . . . . . . . . . Magy Seif El-Nasr, Katherine Isbister, Jeffery Ventrella, Bardia Aghabeigi, Chelsea Hash, Mona Erfani, Jacquelyn Morie, and Leslie Bishko Why Can’t a Virtual Character Be More Like a Human: A Mixed-Initiative Approach to Believable Agents . . . . . . . . . . . . . . . . . . . . Jichen Zhu, J. Michael Moshell, Santiago Onta˜ n´ on, Elena Erbiceanu, and Charles E. Hughes
279
289
Part IV: Developing Virtual and Mixed Environments Collaborative Mixed-Reality Platform for the Design Assessment of Cars Interior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Giandomenico Caruso, Samuele Polistina, Monica Bordegoni, and Marcello Aliverti
299
Active Location Tracking for Projected Reality Using Wiimotes . . . . . . . . Siam Charoenseang and Nemin Suksen
309
Fast Prototyping of Virtual Replica of Real Products . . . . . . . . . . . . . . . . . Francesco Ferrise and Monica Bordegoni
318
Effectiveness of a Tactile Display for Providing Orientation Information of 3d-patterned Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nadia Garcia-Hernandez, Ioannis Sarakoglou, Nikos Tsagarakis, and Darwin Caldwell ClearSpace: Mixed Reality Virtual Teamrooms . . . . . . . . . . . . . . . . . . . . . . Alex Hill, Matthew Bonner, and Blair MacIntyre Mesh Deformations in X3D via CUDA with Freeform Deformation Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yvonne Jung, Holger Graf, Johannes Behr, and Arjan Kuijper Visualization and Management of u-Contents for Ubiquitous VR . . . . . . . Kiyoung Kim, Jonghyun Han, Changgu Kang, and Woontack Woo
327
333
343
352
Semi Autonomous Camera Control in Dynamic Virtual Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marcel Klomann and Jan-Torsten Milde
362
Panoramic Image-Based Navigation for Smart-Phone in Indoor Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Van Vinh Nguyen, Jin Guk Kim, and Jong Weon Lee
370
Table of Contents – Part II
Foundation of a New Digital Ecosystem for u-Content: Needs, Definition, and Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yoosoo Oh, S´ebastien Duval, Sehwan Kim, Hyoseok Yoon, Taejin Ha, and Woontack Woo Semantic Web-Techniques and Software Agents for the Automatic Integration of Virtual Prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rafael Radkowski and Florian Weidemann
XXVII
377
387
Virtual Factory Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marco Sacco, Giovanni Dal Maso, Ferdinando Milella, Paolo Pedrazzoli, Diego Rovere, and Walter Terkaj
397
FiveStar: Ultra-Realistic Space Experience System . . . . . . . . . . . . . . . . . . . Masahiro Urano, Yasushi Ikei, Koichi Hirota, and Tomohiro Amemiya
407
Synchronous vs. Asynchronous Control for Large Robot Teams . . . . . . . . Huadong Wang, Andreas Kolling, Nathan Brooks, Michael Lewis, and Katia Sycara
415
Acceleration of Massive Particle Data Visualization Based on GPU . . . . . Hyun-Rok Yang, Kyung-Kyu Kang, and Dongho Kim
425
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
433
AR Based Environment for Exposure Therapy to Mottephobia Andrea F. Abate, Michele Nappi, and Stefano Ricciardi Virtual Reality Laboratory – University of Salerno, 84084, Fisciano (SA), Italy {abate,mnappi,sricciardi}@unisa.it
Abstract. Mottephobia is an anxiety disorder revolving around an extreme, persistent and irrational fear of moths and butterflies leading sufferers to panic attacks. This study presents an ARET (Augmented Reality Exposure Therapy) environment aimed to reduce mottephobia symptoms by progressive desensitization. The architecture described is designed to provide a greater and deeper level of interaction between the sufferer and the object of its fears. To this aim the system exploits an inertial ultrasonic-based tracking system to capture the user’s head and wrists positions/orientations within the virtual therapy room, while a couple of instrumented gloves capture fingers’ motion. A parametric moth behavioral engine allows the expert monitoring the therapy session to control many aspects of the virtual insects augmenting the real scene as well as their interaction with the sufferer. Keywords: Augmented reality, exposure therapy, mottephobia.
1 Introduction Mottephobia is the term used to describe the intense fear of moths and more in general of butterflies. According to psychologists’’ classification of phobias, which distinguish between agoraphobia, social phobia and specific phobia, mottephobia falls within the last category and represents an animal phobia, an anxiety disorder which is not uncommon though not so well-known as arachnophobia. In severe cases, panic attacks are triggered in mottephobia sufferers if they simply view a picture or even think of a moth. Consequently, many of these persons will completely avoid situations where butterflies or moths may be present. If they see one, they often follow it with close scrutiny as to make sure it does not come anywhere near them. Sometimes the fear is caused by a split second of panic during exposure to the animal. This wires the brain to respond similarly to future stimuli with symptoms such as fast heartbeat, sweating, dry mouth and elevated stress and anxiety levels. In general, the most common treatment for phobias is exposure therapy, or systematic desensitization. This involves gradually being exposed to the phobic object or situation in a safe and controlled way. For example, a mottephobic subject might start out by looking at cartoon drawings of butterflies. When they reach a point where the images no longer trigger the phobic response, they may move on to photographs, and R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 3–11, 2011. © Springer-Verlag Berlin Heidelberg 2011
4
A.F. Abate, M. Nappi, and S. Ricciardi
so on. Therapy is a slow process, but can have lasting effects. In the last decade the systematic desensitization treatment has been approached by means of virtual reality based environments and more recently by augmented reality techniques where in-vivo exposure is difficult to manage. In this case the contact between the sufferer and the source of its fear is performed via a virtual replica of it which can be visualized on a screen or through an head-up display and may even enable a simulated interaction. This study presents a novel augmented reality based environment for exposure therapy to mottephobia. The final goal is to match the emotional impact experimented during the exposure to real moths while providing therapists a level of control of virtual moths’ behavior which would be impossible in-vivo. The rest of this paper is organized as follow. Related works and their comparison with the proposed approach are presented in section 2., while the system’s architecture is described in detail in section 3. The experiments conducted and their results are presented in section 4., while conclusions are drawn in section 5.
2 Related Works and Proposed Approach In the last decade the systematic desensitization treatment has been approached by means of virtual reality based environments and more recently by augmented reality techniques where in-vivo exposure is difficult to manage. Virtual Reality based Exposure Therapy (VRET) has proved to be an effective strategy for phobias treatment since the original study by Carlin and al. in 1997 [1] which first reported about the efficacy of a virtual exposure to spiders opening the way to other researches in this line [2, 3]. More recently augmented reality has also been proposed to allow the sufferer to see the real environment around him/her instead that a virtual one while displaying the virtual contents co-registered to the user’s field of view as they were really present there, possibly resulting in more convincing stimula for the therapy (ARET). This objective has been approached by means of (visible and invisible) marker based techniques [4, 5] using both video-based and optical-based see-through head mounted display [6]. The aforementioned marker-based approach involves some limitations: from one side the operative volume is restricted to a fraction of the environment (typically the desktop where the marker is located) possibly limiting the user’s head movement to not lose the marker and therefore the co-registration between real and virtual. On the other side the choice of marker’s location (either visible or not) is limited by lighting and orientation constraints related to pattern detection/recognition issues which may reduce the range of the experience. This design may be still valid when interacting with not-flying creatures (like spiders or cockroaches) especially considering the low cost of the optical tracking, but it is very limiting when simulating flying insects’ behavior which involves much larger spaces. Furthermore, in most proposals the virtual insects do not react to user’s hands actions, i.e. they perform their pre-built animation(s) independently from where exactly hands and fingers are, eventually reacting only to actions like pressing a key to crush the insects. In this paper, the mottephobia ARET environment proposed addresses the aforementioned limitations exploiting a head/wrists inertial tracking system, instrumented gloves and a parametric moth behavior approach to enable a greater and deeper level of interaction between the sufferer and the object of its fears.
AR Based Environment for Exposure Therapy to Mottephobia
5
3 System’s Architecture The overall system’s architecture is schematically depicted in Fig. 1. The main components are the Moth Behavioral Engine which controls both the appearance and the dynamic behavior of the virtual moths represented in the dedicated 3D Dataset throughout the simulation, the Interaction Engine managing the sufferer-moths interaction exploiting hands gesture capture and wrists tracking, and the AR Engine in charge of scene augmentation (based on head tracking) and stereoscopic rendering via the see-through head mounted display which also provides audio stimula generated on a positional basis.
Fig. 1. Schematic view of the proposed system
As the main objective was a believable hand-moth interaction, wireless instrumented gloves and ultrasonic tracking devices have been used. An instrumented glove, indeed, enables a reliable gesture capture as each single finger has individual sensors which are unaffected by any other fingers.
6
A.F. Abate, M. Nappi, and S. Ricciardi
In this case, left and right hand gesture acquisition is performed via a couple of wireless 5DT Dataglove 14 ultra, featuring fourteen channels for finger flexion and abduction measurement, with 12 bit of sampling resolution each. As datagloves do not provide any spatial information, the system relies on an inertial ultrasonic-based tracking system (Intersense IS 900 VET) with six degrees-of-freedom, to detect head and wrists position in 3D space and their rotation on yaw, pitch and roll axis. Among the advantages of this setup there is the wide capture volume (respect to video based solutions requiring the user to be positioned in a precise spot within camera field of view), an accuracy in the range of millimeters for distance measurements and of tenths of degree for angular measurements and a high sampling rate suited to accurately capture fast movements. A preprocessing applied to each of six channels (for each hand) filters capture noise by means of a high frequency cut and a temporal average of sampled values. Left and right hands data streams are outputted to the Interaction Engine, while head tracking is sent to the AR Engine for virtual-to-real coregistration. The Moth Behavioral Engine allows the therapist to control many parameters of the simulated exposure (see Fig. 2). Both behavioral and interaction parameters can be adjusted interactively during the exposure session, allowing the therapist to modify the simulation on-the-fly, if required. These parameters include the “number”, the “size”, the maximum amount of “size variation” (with respect to a pseudo-random distribution) and the type of flying creatures to be visualized among those available in a previously built 3D dataset.
Fig. 2. The GUI screen including the main simulation parameters
Actually, this engine is based on a parametric particle system which controls the virtual moths as instances of a reference geometry (a polygonal model). The dynamic of the particles (i.e. the moths motion) is controlled at two different levels: the particle
AR Based Environment for Exposure Therapy to Mottephobia
7
level and the swarm level. At the particle level the motion of the single moth is controlled through a seamlessly loopable spline based animation defining the particular flying pattern. The “moth speed” parameter multiplied for a random variation value affects the time required to complete the pattern. At the swarm level the motion of the whole swarm is controlled through an emitter and a target which can be interactively selected among predefined locations in the 3D model of the virtual therapy environment. More than one swarm may be active at the same time allowing the moths to originate from different locations and thus providing a less repetitive and more unexpected experience. The “swarm speed” parameter affects the time required to complete the emitter-target path. Other two swarm level parameters, namely “aggressiveness” and “user avoidance” respectively affect the swarm dynamic behavior by attracting the swarm path towards the sufferer position and by defining the radius of the sufferer centered sphere in which the moths cannot enter. The Interaction Engine, exploits the user’s tracking data to enable realistic handmoth interaction. Indeed, not only the approximate hand location, but also each finger’s position can be computed based on the wrists tracking and forward kinematics applied to the flexion/abduction data captured by the instrumented gloves. By this design, as the user shake the hands the butterflies may react avoiding the collision and flying away according to their motion pattern, while in a more advanced stage of the therapy a direct contact with the insects is possible by allowing the insect to settle on the hand surface. To this regard, it has to be remarked that for the first interaction modality the instrumented gloves could be omitted (thus reducing the hardware required and the equipment to be worn), while for the other two “directcontact” modalities they are strictly necessary. During “direct-contact”, one or more virtual insects (according to the “direct contact” parameter) may settle on each hand in spots randomly selected among a pre-defined set of swarm targets (e. g. the palm, or the index finger or the back of the hand). Again, the purpose of this randomness is to prevent the sufferer to expect a contact happening always in the same way. The 3D dataset contains medium to low detail polygonal models of moth/butterflies, realistically textured and animated. These models are transformed and rendered by the visualization engine, also responsible for AR related real time transformations and for the stereo rendering of 3D content. The engine is built on the DirectX based Quest3D graphics toolkit (see Fig. 3), which enables dynamic simulation by means of the Newton Dynamics API or even via the Open Dynamics Engine (OpenDE, a.k.a. ODE) open-source library. To generate the AR experience, the visualization engine exploits user’s head position and orientation to transform the virtual content as seen from user’s point of view and coherently to a 3D model of surrounding environment, a crucial task referred as 3D registration. Any AR environment requires a precise registration of real and virtual objects, i.e. the objects in the real and virtual world must be properly aligned with respect to each other, or the illusion that the two worlds coexist will be compromised. Therefore at runtime two rendering cameras (one for each eye) are built, matching the exact position/orientation of user’s eyes, transforming each vertex of each virtual object to be displayed onto the real scene accordingly.
8
A.F. Abate, M. Nappi, and S. Ricciardi
Fig. 3. A fragment of Quest3D graph-based programming environment for finger-moth collision detection
Two renderings (left and right) are then calculated and coherently displayed through an optical see-through Head Mounted Display, which works by placing optical combiners in front of the user's eyes (see Fig. 4). These combiners are partially transmissive, so that the user can look directly through them to see the real world. The combiners are also partially reflective, so that the user sees virtual images bounced off the combiners from head-mounted LCD monitors. The rendering engine has been tailored to optical see-through HMD, but it could be adapted to video see-through displays. Eventually, a selective culling of a virtual object may be performed whereas it is partially or totally behind a real object, but in many cases this technique (and the overhead required to accurately model the real environment) could not be necessary. To further stimulate the user’s emotional reactions, audio samples mimicking the sound of moths’ flapping wings diffused through the headphones integrated in the HMD, are exploited to amplify the sensation of presence of the virtual insects according to their size, number and distance from the sufferer. The flapping wings audio-samples are short looping samples whose duration is in sync with the actual flapping animation cycle to achieve an audio-visual coherence.
4 Experiments We are still in the process of performing a quantitative study to measure the response of mottephobia sufferers to this approach to exposure therapy. So far, we have carried out some preliminary qualitative evaluations on the system described above, to gather first impressions about its potential efficacy from experts in exposure therapy and from their patients. These experiments involved five mottephobiac subjects showing various levels of symptoms’ seriousness and three exposure therapy specialists. The
AR Based Environment for Exposure Therapy to Mottephobia
9
test bed hardware included a dual quad-core Intel Xeon workstation equipped with an Nvidia Quadro 5600 graphics board with 1,5 Gigabytes of VRAM in the role of simulation server and control interface. The HMD adopted is a Cybermind Visette Pro with see-through option. The virtual therapy room has a surface of about 40 mq of which 15 mq fall within the capture volume of the tracking system, providing a reasonable space for moving around and interacting (see Fig. 5). Each of the 5 participants has been exposed to moth/butterflies augmenting the real scene during the course of 8 ARET sessions featuring a progressively closer level of interaction, while the experts were invited to control the simulation’s parameters after a brief training. After each session the participant have been asked to answer to a questionnaire developed to measure six subjective aspects of the simulated experience by assigning a vote in the integer range 1-10 (the higher the better) to: (A) Realism of Simulated Experience; (B) Visual Realism of Virtual Moths; (C) Realism of Moth Behavior; (D) Realism of Hand-Moth Interaction; (E) Emotional Impact of Audio Stimula; (F) Maximum Fear Level Experimented. Additionally, the therapists were asked to provide feedback on two qualitative aspects of the ARET control interface: (G) Accuracy of Control; (H) Range of Control. As shown in Table 1, while the evaluations provided are subjective and the number of users involved in these first trials is very small, the overall results seem to confirm that many of the factors triggering the panic attacks in mottephobiac subjects, like the sudden appearance of insects from behind or above, the moths’ erratic flying patterns, the sound of flapping wings or simply the insects’ visual aspect, are credibly reproduced by the proposed AR environment.
Fig. 4. See-Through HMD, datagloves and head/wrists wireless trackers worn during testing
10
A.F. Abate, M. Nappi, and S. Ricciardi Table 1. A resume of the scores provided by the users of the ARET system proposed Features
Min.
Avg.
Max.
(A) Realism of Simulated Experience
7
7.9
9
(B) Visual Realism of Virtual Moths
8
9.1
10
(C) Realism of Moth Behavior
6
6.8
8
(D) Realism of Hand-Moth Interaction
6
7.5
9
(E) Emotional Impact of Audio Stimula
8
8.2
9
(F) Maximum Fear Level Experimented
8
8.8
10
(G) Accuracy of Control
7
7.5
8
(H) Range of Control
8
9.0
10
Fig. 5. The room for virtual exposure therapy, augmented with interacting butterflies
On the other side the exposure therapy experts involved were favourably impressed by the level of control of the virtual simulation available. However, only a quantitative analysis conducted on a much wider number of subjects may objectively assess the efficacy of this ARET environment. To this regard, the evaluation we are carrying out is based on a modified version of the “fear of spider” questionnaire originally proposed by Szymanski and O’ Donoghue [7] as, to our best knowledge, there is no specific work of this kind for mottephobia.
AR Based Environment for Exposure Therapy to Mottephobia
11
5 Conclusions In this paper, we presented an AR based environment for exposure therapy of mottephobia. The proposed architecture exploits inertial tracking system, instrumented gloves and parametric behavioral/interaction engines to provide the user a more believable and emotionally involving interaction experience, improving at the same time the range and the accuracy of the user-system interaction during the usage. To this aim, we performed a first qualitative evaluation inolving ET experts and a group of mottephobia sufferers asked to respond to a questionnaire. So far the first qualitative reports confirm the potential of the proposed system for mottephobia treatment, while, according to the therapists involved, other kind of anxiety disorders could be favorably treated as well. We are currently working on completing the aforementioned quantitative study to assess the system’s effectiveness in reducing mottephobia symptoms as well as to compare this proposal with both marker-based ARET and VRET approaches. As currently the system is able to display only one type of moth/butterfly for a single session, we are also working to remove this limitation. Additionally we are developing a new version of the AR engine specific for video see-through HMDs.
References 1. Bouchard, S., Côté, S., St-Jacques, J., Robillard, G., Renaud, P.: Effectiveness of virtual reality exposure in the treatment of arachnophobia using 3D games. Technology and Health Care 14(1), 19–27 (2006) 2. Carlin, A., Hoffman, H.Y., Weghorst, S.: Virtual reality and tactile augmentation in the treatment of spider phobia: a case study. Behaviour Research and Therapy 35(2), 153–158 (1997) 3. Bouchard, S., Côté, S., Richards, C.S.: Virtual reality applications for exposure. In: Richards, C.S. (ed.) Handbook of Exposure, ch. 11 (in press) 4. Botella, C., Juan, M.C., Baños, R.M., Alcañiz, M., Guillen, V., Rey, B.: Mixing realities? An Application of Augmented Reality for the Treatment of Cockroach phobia. Cyberpsychology & Behavior 8, 162–171 (2005) 5. Juan, M.C., Joele, D., Baños, R., Botella, C., Alcañiz, M., Van Der Mast, C.: A Markerless Augmented Reality System for the treatment of phobia to small animals. In: Presence Conference, Cleveland, USA (2006) 6. Juan, M.C., Alcañiz, M., Calatrava, J., Zaragozá, I., Baños, R.M., Botella, C.: An Optical See-Through Augmented Reality System for the Treatment of Phobia to Small Animals. In: Shumaker, R. (ed.) HCII 2007 and ICVR 2007. LNCS, vol. 4563, pp. 651–659. Springer, Heidelberg (2007) 7. Szymanski, J., O’Donoghue, W.: Fear of spiders questionnaire. J. Behav. Ther. Exp. Psychiatry 26(1), 31–34 (1995)
Designing Augmented Reality Tangible Interfaces for Kindergarten Children Pedro Campos1,2 and Sofia Pessanha1 1
University of Madeira and Madeira Interactive Technologies Institute Campus Universitário da Penteada, 9000-390 Funchal, Portugal 2 VIMMI Group, Visualization and Intelligent Multimodal Interfaces, INESC-ID R. Alves Redol 9, 1000-029 Lisboa, Portugal
[email protected],
[email protected]
Abstract. Using games based on novel interaction paradigms for teaching children is becoming increasingly popular because children are moving towards a new level of inter-action with technology and there is a need to children to educational contents through the use of novel, attractive technologies. Instead of developing a computer program using traditional input techniques (mouse and keyboard), this re-search presents a novel user interface for learning kindergarten subjects. The motivation is essentially to bring something from the real world and couple that with virtual reality elements, accomplishing the interaction using our own hands. It’s a symbiosis of traditional cardboard games with digital technology. The rationale for our approach is simple. Papert (1996) refers that “learning is more effective when the apprentice voluntarily engages in the process”. Motivating the learners is therefore a crucial factor to increase the possibility of action and discovery, which in turn increases the capacity of what some researchers call learning to learn. In this sense, the novel constructionistlearning paradigm aims to adapt and prepare tomorrow’s schools to the constant challenges faced by a society, which is currently embracing and accelerating pace of profound changes. Augmented reality (Shelton and Hedley, 2002) and tangible user interfaces (Sharlin et al., 2004) fitted nicely as a support method for this kind of learning paradigm. Keywords: Augmented reality, Interactive learning systems, Tangible Interfaces.
1 Introduction Using games as a way for better educating children is becoming increasingly popular because children are moving towards a new level of interaction with technology and there is a need to approach them towards the educational contents. This can be done through the use of novel, more attractive technologies. The power of digital games as educational tools is, however, well understood. Games can be successfully used for teaching science and engineering better than lectures [1], and e.g. Mayo and colleagues even argued they could be the “cure for a numbing 200-person class.” [1]. Games can also be used to teach a number of very different subjects to children all ages. For instance Gibson describes a game aimed at R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 12–19, 2011. © Springer-Verlag Berlin Heidelberg 2011
Designing Augmented Reality Tangible Interfaces for Kindergarten Children
13
teaching programming to pre-teens school children [2]. Belotti and colleagues [5] describe an educational game using a state-of-the-art commercial game development approach, and enriched the environment with instances of developed educational modules. The research goals for these approaches are essentially to exploit the potential of computers and reach a demographic that is traditionally averse to learning. On a more specific line, there is also interesting research on using Augmented Reality (AR) games in the classroom. From high-school mathematics and geometry [3] to interactive solar systems targeted at middle school science students [4], the range of applications is relatively broad. However, there is a clear lack of solutions and studies regarding the application of these technologies with kindergarten children, who are aged 3-5 years old and therefore have different learning objectives. In this paper, we present a tangible user interface for an augmented reality game specifically targeted at promoting collaborative learning in kindergarten. The game’s design involved HCI researchers (the authors), kindergarten teachers and 3D designers. We evaluated the system during several days in tow different local schools and we recorded the children’s reactions, behaviors and answers to a survey we also conducted. Instead of developing a computer program using traditional input techniques (mouse and keyboard), this research presents a novel user interface for learning kindergarten subjects. The motivation is essentially to bring something from the real world and couple that with virtual reality elements, accomplishing the interaction using our own hands, thus, children don’t need to have previous experience using computers in order to use this system. The interface is, essentially, a symbiosis of traditional cardboard games with digital technology.
2 Related Work Technology today provides exciting new possibilities to approach children to digital contents. There are numerous areas where Augmented Reality (AR) can be applied, ranging from more serious areas to entertainment and fun. Thus, the process of viewing and manipulating virtual objects in a real environment can be found in many applications, especially in the area of education and training which are very promising applicants, since it is often necessary to use resources enabling a better view of the object under study. Other applications include the creation of collaborative environments in AR, which consist of multi-user systems with simultaneous access where each user views and interacts with real and virtual elements, each of their point of view. Given the scope of our work, we divide the review of the literature into two broad aspects: the use of augmented reality technology in the classroom, and approaches targeted at promoting collaboration in the classroom by means of novel technology – not necessarily based in augmented reality. The use of augmented reality systems in educational settings, per se, is not novel. Shelton and Hedley [6] describe a research project in which they used augmented reality to help teach undergraduate geography students about earth-sun relationships.
14
P. Campos and S. Pessanha
They examined over thirty students who participated in an augmented reality exercise containing models designed to teach concepts of rotation/revolution, solstice/equinox, and seasonal variation of light and temperature, and found a significant overall improvement in student understanding after the augmented reality exercise, as well as a reduction in student misunderstandings. Some other important conclusions about this system were that AR interfaces do not merely change the delivery mechanism of instructional content: They may fundamentally change the way that content is understood, through a unique combination of visual and sensory information that results in a powerful cognitive and learning experience [6]. Simulations in virtual environments are becoming an important research tool for educators [9]. Augmented reality, in particular, has been used to teach physical models in chemistry education [10]. Schrier evaluated the perceptions regarding these two representations in learning about amino acids. The results showed that some students enjoyed manipulating AR models by rotating the markers to observe different orientations of the virtual objects [10]. Construct3D [9] is a three-dimensional geometric construction tool specifically designed for mathematics and geometry education. In order to support various teacher-student interaction scenarios, flexible methods were implemented for context and user dependent rendering of parts of the construction. Together with hybrid hardware setups they allowed the use of Construct3D in classrooms and provided a test bed for future evaluations. Construct3D is easy to learn, encourages experimentation with geometric constructions, and improves spatial skills [9]. The wide range of AR educational applications also extend to physics. Duarte et al. [11] use AR to dynamically present information associated to the change of scenery being used in the real world. In this case, the authors perform an experiment in the field of physics to display information that varies in time, such as velocity and acceleration, which can be estimated and displayed in real time. The visualization of real and estimated data during the experiment, along with the use of AR techniques, proved to be quite efficient, since the experiments could be more detailed and interesting, thus promoting the cognitive mechanisms of learning. Promoting collaborating behaviors is crucial in the kindergarten educational context. Therefore, we briefly analyze approaches that use technology as a way to achieve higher levels of collaboration in the classroom. Children communicate and learn through play and exploration [16]. Through social interaction and imitating one another, children acquire new skills and learn to collaborate with others. This is also true when children work with computers. Using traditional mouse-based computers, and even taking into consideration that two or more children may collaborate verbally, only one child at a time has control of the computer. The recognition that group work around a single display is desirable has led to the development of software and hardware that is designed specifically to support this. The effect of giving each user an input device, even if only one could be active at a time was then examined and significant learning improvements were found [17]. Stewart et al. [18] observed that children with access to multiple input devices seemed to enjoy an enhanced experience, with the researchers observing increased incidences of student-student interaction and student-teacher interaction as well as
Designing Augmented Reality Tangible Interfaces for Kindergarten Children
15
changing the character of the collaborative interaction. The children also seemed to enjoy their experience more, compared with earlier observations of them using similar software on standard systems. There are also studies about the design of user interfaces for collaboration between children [14]. Some results present systems which effectively supported collaboration and interactivity that children enjoyed, and were engaged in the play [14]. Kannetis and Potamianos [13] investigated the way fantasy, curiosity, and challenge contributes to the user experience in multimodal dialogue computer games for preschool children, which is particularly relevant for our research. They found out that fantasy and curiosity are correlated with children's entertainment, while the level of difficulty seems to depend on each child's individual preferences and capabilities [13]. One issue we took into account when designing our AR game for kindergarten was that preschoolers become more engaged when multimodal interfaces are speech enabled and contain curiosity elements. We specifically introduced this element in our design, and confirmed the results described in [13].
3 An Augmented Reality Tangible Interface for Kindergarten As with any game, the solution space dimension was very high, so we collaboratively designed the game with kindergarten teachers, focusing on a biodiversity theme, using traditional book-based activities as a starting point. The developed system was based on a wooden board containing nine divisions where children can freely place the game’s pieces. The pieces are essentially based on augmented reality markers. Several (experienced) kindergarten teachers provided us with a learning objective and actively participated in the entire game’s design. For instance, they listed a series of requirements that any game or educational tool should comply when dealing with kindergarten children. They can be aged from 3 to 5 years old, and therefore have different teaching and caring needs, when compared with older children or other types of users. Among the most important requirements were: • Promote respectful collaborative behaviors like giving turns to friends, pointing out mistakes and offering corrections; • Promote learning of the given subject. • Promote a constructivist approach, where children learn by doing and by constructing solutions; • The previous requirement also implied that the physical material of the tangible interface had to be resistant and adequate to manipulation by the group of children; In our case, the learning objective was the study of animals and the environments (sea, rivers, land and air) they live in. Each division of the board’s game contains a printed image of a given environment. Given the manipulative nature of such game, the game’s pieces had to be made from a special material, which is particularly suited for children, a flexible but robust material. Each of the game’s pieces displays a 3D animal that can be manipulated, as in a regular augmented reality setting. The board also contains a fixed camera, which processes the real time video information. Figure 1 illustrates the overall setting of the
16
P. Campos and S. Pessanha
system, which can be connected to any kind of computer and display. In the figure, we show the system connected to a laptop, but during classroom evaluation we used a projector, to facilitate collaborative learning. The goal of the game is to place all the markers (game board pieces representing animals) in the correct slot of the board. We only give feedback about the correctness of the placement of pieces in the end, when the player places a special marker that is used for that purpose, i.e. a “show me the results” marker. Two different versions of the game were developed, to assess the impact of the feedback’s immediacy on the children’s levels of collaboration: a version where feedback can be freely given at any time (whenever children place the special marker to see the results, as shown in Figure 2); and a version where feedback is only given at the end of the game, i.e. when all the pieces have been placed in the board (again, by placing the special marker).
Fig. 1. The developed system, when used in a LCD display configuration
Figure 2 shows a screenshot of what children see displayed in the screen. The markers display 3D animals, which can be freely manipulated. The animals that are correctly placed have a green outline, incorrectly placed animals show a red outline. Following the teachers’ suggestions, we also added audio feedback, with pre-recorded sentences like “That’s not right, try it again!” This encouraged children, especially when positive reinforcement was given in the form of an applause sound. The game also features a detailed logging mechanism with all actions recorded with timestamps. This was developed as an aid to evaluating the effects on collaboration levels. The system logs the completion times of each game, the number of incorrectly placed markers, the number of feedback requests (which can be considered the number of attempts to reach a solution), and other variables.
Designing Augmented Reality Tangible Interfaces for Kindergarten Children
17
Fig. 2. The game’s screen, showing feedback as a red or green border around the animals
4 Discussion The results obtained so far indicate that using our augmented reality system is a positive step forward towards achieving the goal of reducing the distance between children and knowledge, by learning through play. The system has a very positive impact on the whole class collaboration. This is much harder than it seems, since kindergarten children have very low attention cycles. They get distracted very often, and they have trouble collaborating in an orderly manner. An important contribution from this paper, in terms of design issues that promote collaboration, is the importance of providing immediate feedback in virtual reality games such as the one we have developed. It is crucial that designers targeting kindergarten children are capable of exploiting the innate curiosity in these tiny users in order to achieve good levels of collaborative interactions. Motivation, enjoyment and curiosity are important ingredients for any kind of educational game, but they are even more important when it comes to kindergarten user interfaces. Interaction with tangible board pieces (the AR markers) may be well suited to very young children because of their physicality, but this is could not be sufficient to achieve good levels of motivation and collaboration.
5 Conclusions Augmented reality technology and tangible interfaces are well accepted by today’s kindergarten children and by their teachers as well. Large projection screens and a good blend of the physical game pieces with their virtual ones can prove effective for increasing motivation and collaboration levels among children. In the learning field, we also concluded that by playing the game the children’s number of wrong answers decreased, which suggests the game could help kindergarten children to learn simple concepts. Since kindergarten children loose the focus of their attention frequently, specially with a game, we feared that the game could harm the learning process. These results
18
P. Campos and S. Pessanha
suggest that the game didn’t make any harm to that process, since the next day’s posttest results showed a positive improvement. According to teachers’ feedback, the game looks like a promising way to complement the traditional teaching methods. About motivation, we observed high levels of motivation while children played the game because most of them were clearly motivated, e.g. they never gave up the game until they found the solution. Curiosity was another driving factor towards motivation. Children wanted to see all the 3D animals but for that to happen, they had to wait until all markers were placed. In terms of maintaining motivation, this was a crucial design issue. This research focus was around promoting collaboration. We analyzed several variables such as the number of collaborative comments made by children, number of constructive collaborative corrections made by children, including pointing gestures and the number of attempts made until reaching a solution. Results suggest that immediate feedback played an important role, increasing the number of collaborative behaviors and interactions among kindergarten children. We also studied the impact of display size, but the results showed that differences were not significant, although by observation, and also according to teachers’ feedback, the larger display seemed to better promote collaboration levels than the smaller display. Future work should consist of expanding the experiment in order to better assess the role played by the display size in collaboration levels. Future work will also include more tests with different schools, as well as investigating other features and design issues that could positively influence collaboration in kindergarten.
References 1. Mayo, M.J.: Games for science and engineering education. Communications of the ACM 50(7), 30–35 (2007) 2. Gibson, J.P.: A noughts and crosses Java applet to teach programming to primary school children. In: Proceedings of the 2nd International Conference on Principles and Practice of Programming in Java, PPPJ, vol. 42, pp. 85–88. Computer Science Press, New York (2003) 3. Kaufmann, H., Schmalstieg, D.: Mathematics and geometry education with collaborative augmented reality. In: ACM SIGGRAPH 2002 Conference Abstracts and Applications, pp. 37–41. ACM, New York (2002) 4. Medicherla, P.S., Chang, G., Morreale, P.: Visualization for increased understanding and learning using augmented reality. In: Proceedings of the International Conference on Multimedia Information Retrieval, MIR 2010, pp. 441–444. ACM, New York (2010) 5. Bellotti, F., Berta, R., Gloria, A.D., Primavera, L.: Enhancing the educational value of video games. Computers in Entertainment 7(2), 1–18 (2009) 6. Shelton, B., Hedley, N.: Using Augmented Reality for Teaching Earth-Sun Relationships to Undergraduate Geography Students. In: The First IEEE International Augmented Reality Toolkit Workshop, Darmstadt, Germany (September 2002), IEEE Catalog Number: 02EX632 ISBN: 0-7803-7680-3 7. Papert, S.: The Connected Family: Bridging the Digital Generation Gap. Longstreet Press, Atlanta (1996) 8. Sharlin, E., Watson, B., Kitamura, Y., Kishino, F., Itoh, Y.: On tangible user interfaces, humans and spatiality. Personal Ubiquitous Computing 8(5), 338–346 (2004)
Designing Augmented Reality Tangible Interfaces for Kindergarten Children
19
9. Tettegah, S., Taylor, K., Whang, E., Meistninkas, S., Chamot, R.: Can virtual reality simulations be used as a research tool to study empathy, problems solving and perspective taking of educators?: theory, method and application. International Conference on Computer Graphics and Interactive Techniques, ACM SIGGRAPH 2006 Educators Program, Article No. 35 (2006) 10. Schrier, K.: Using augmented reality games to teach 21st century skills. In: International Conference on Computer Graphics and Interactive Techniques, ACM SIGGRAPH 2006 Educators Program (2006) 11. Duarte, M., Cardoso, A., Lamounier Jr., E.: Using Augmented Reality for Teaching Physics. In: WRA 2005 - II Workshop on Augmented Reality, pp. 1–4 (2005) 12. Kerawalla, L., Luckin, R., Seljeflot, S., Woolard, A.: Making it real: exploring the potential of augmented reality for teaching primary school science. Virtual Reality 10(3-4), 163–174 (2006) 13. Kannetis, T., Potamianos, A.: Towards adapting fantasy, curiosity and challenge in multimodal dialogue systems for preschoolers. In: Proceedings of the 2009 International Conference on Multimodal Interfaces, ICMI-MLMI 2009, pp. 39–46. ACM, New York (2009) 14. Africano, D., Berg, S., Lindbergh, K., Lundholm, P., Nilbrink, F., Persson, A.: Designing tangible interfaces for children’s collaboration. In: CHI 2004 Extended Abstracts on Human Factors in Computing Systems, CHI 2004, pp. 853–868. ACM, New York (2004) 15. Brosterman, N.: Inventing Kindergarten. Harry N. Adams Inc. (1997) 16. Sutton-Smith, B.: Toys as culture. Gardner Press, New York (1986) 17. Inkpen, K.M., Booth, K.S., Klawe, M., McGrenere, J.: The Effect of Turn-Taking Protocols on Children’s Learning in Mouse- Driven Collaborative Environments. In: Proceedings of Graphics Interface (GI 97), pp. 138–145. Canadian Information Processing Society (1997) 18. Stewart, J., Raybourn, E.M., Bederson, B., Druin, A.: When two hands are better than one: Enhancing collaboration using single display groupware. In: Proceedings of Extended Abstracts of Human Factors in Computing Systems, CHI 1998 (1998) 19. Hsieh, M.-C., Lee, J.-S.: AR Marker Capacity Increasing for Kindergarten English Learning. National University of Tainan, Hong Kong (2008) 20. Self-Reference (2008)
lMAR: Highly Parallel Architecture for Markerless Augmented Reality in Aircraft Maintenance Andrea Caponio, Mauricio Hincapi´e, and Eduardo Gonz´ alez Mendivil Instituto Tecnol´ ogico y de Estudios Superiores de Monterrey, Ave. Eugenio Garza Sada 2501 Sur Col. Tecnol´ ogico C.P. 64849 — Monterrey, Nuevo Le´ on, Mexico
[email protected],
[email protected],
[email protected]
Abstract. A novel architecture for real time performance marker-less augmented reality is introduced. The proposed framework consists of several steps: at first the image taken from a video feed is analyzed and corner points are extracted, labeled, filtered and tracked along subsequent pictures. Then an object recognition algorithm is executed and objects in the scene are recognized. Eventually, position and pose of the objects are given. Processing steps only rely on state of the art image processing algorithms and on smart analysis of their output. To guarantee real time performances, use of modern highly parallel graphic processing unit is anticipated and the architecture is designed to exploit heavy parallelization. Keywords: Augmented Reality, Parallel Computing, CUDA, Image Processing, Object Recognition, Machine Vision.
1
Introduction
In recent times augmented reality (AR) systems have been developed for several applications and several fields. In order to augment user’s experience, AR systems blend image of actual objects, coming for instance from a camera video feed, with virtual objects which offer new important information. Therefore, AR systems need to recognize some object in a real scene: this is normally done by placing a particular marker on those specific objects. Markers are easy to recognize and AR systems based on this method are already widely used, as shown in section 2. However marker based systems are invasive, rigid and time consuming. To overcome these difficulties, marker-less AR has been proposed: avoiding markers leads to a much more effective AR experience but, on the other hand, requires the implementation of several image processing or sensor fusion techniques, resulting in more complex algorithms and in higher computational demands that risk to compromise user’s experience. In this article we present the design of lMAR (library for Marker-less Augmented Reality), a parallel architecture for marker-less AR, whose purpose is to provide developers with a software tool able to recognize one or more specific objects in a video feed and to calculate their pose and position with respect to the R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 20–29, 2011. c Springer-Verlag Berlin Heidelberg 2011
lMAR: Highly Parallel Architecture for Markerless AR in AM
21
camera reference frame. To counterweight algorithm complexity, lMAR design fully exploits parallel computing, now available at low cost thanks to modern CPUs and GPUs. This way the proposed system will be able to use very complex and computational intensive algorithms for image processing, while still delivering real time performance and avoiding low frame rate processing and video stuttering. This article is structured as follows: in section 2 state of the art AR solutions are presented, along with the most important application fields. Section 3 presents in detail the proposed architecture. Section 4 describes how lMAR guarantees real time performances. Section 5 closes the article offering some conclusions and detailing future work.
2
Related Work
AR has become really popular in the last 20 years and is currently used in many fields such as training, product development, maintenance, medicine and multimedia. In AR systems is quite common to use printed markers to successfully blend actual reality with virtual information. In fact, algorithms based on this kind of set up have been used for many years and are not computational demanding, so they can deliver a satisfying AR experience to the final user. On the other hand, even if marker based AR applications proved to be practical and deliver good performances, the presence of markers can be problematic in several situations: i.e. when we have to deal with objects of different size, when the markers must be positioned in locations difficult to be accessed, or when we have to work with unfriendly environmental conditions. Moreover, maintenance and training are among the principal research topics nowadays as there is a clear interest from industry to develop working applications, opening the opportunity for a global establishment of AR as a tool for speed up maintenance of complex systems and training of complex procedures. 2.1
Marker Based AR Solutions
In [6], Kim and Dey propose an AR based solution for training purposes: a video see-through AR interface is integrated into three prototype 3D applications regarding engineering systems, geospace, and multimedia. Two sample cases making use of marker tags are presented: (a) an AR-interfaced 3D CAE (Computer-Aided Engineering) simulation test-bed, and (b) a hapticallyenhanced broadcasting test-bed for AR-based 3D media production. In the 3D CAE simulation a marker is used to display a model and the interaction with the model is done by means of keyboard and markers, as both trigger certain activities. In [11] Uva et al. integrate AR technology in a product development process using real technical drawings as a tangible interface for design review. The proposed framework, called ADRON (Augmented Design Review Over Network), provides augmented technical drawings, interactive FEM simulation, multi-modal annotation and chat tools, web content integration and collaborative client/server
22
A. Caponio, M. Hincapi´e, and E.G. Mendivil
architecture. Technical drawings are printed along with hexadecimal markers which allow the system to display information like 3D models and fem analysis. Authors’ framework meant to use common hardware instead of expensive and complex virtual or augmented facilities, and the interface is designed specifically for users with little or no augmented reality expertise. Haritos and Macchiarella in [3] apply AR to training for maintenance in the aeronautical field by developing a mobile augmented reality system which makes use of markers applied to different parts of the aircraft in order to help technicians with the task of inspecting the propeller mounting bolts and safety wire for signs of looseness on Cessna 172S airplanes. 2.2
Marker-less AR Solutions
Paloc et al. develop in [10] a marker-less AR system for enhanced visualization of the liver involving minimal annoyance for both the surgeon and the patient. The ultimate application of the system is to assist the surgeon in oncological liver surgery. The Computer Aided Surgery (CAS) platform consists of two function blocks: a medical image analysis tool used in the preoperative stage, and an AR system providing real time enhanced visualization of the patient and its internal anatomy. In the operating theater, the AR system merges the resulting 3D anatomical representation onto the surgeon’s view of the real patient. Medical image analysis software is applied to the automatic segmentation of the liver parenchyma in axial MRI volumes of several abdominal datasets. The threedimensional liver representations resulting from the above segmentations were used to perform in house testing of the proposed AR system. The virtual liver was successfully aligned to the reflective markers and displayed accurately on the auto-stereoscopic monitor. Another project involving the marker-less approach is the Archeoguide by Vlahakis et al. [13]. The Archeoguide system provides access to a huge amount of information in cultural heritage sites in a compelling and user-friendly way, through the development of a system based on advanced IT techniques which includes augmented reality, 3D visualization, mobile computing, and multi-modal interaction. Users are provided with a see-through Head-Mounted Display (HMD), earphones and mobile computing equipment. Henderson and Feiner designed, implemented and tested a prototype of an augmented reality application to support military mechanics conducting routine maintenance tasks inside an armored vehicle turret [5]. Researchers created a marker-less application for maintenance processes and designed the hardware configuration and components to guarantee good performance of the application. The purpose of the project was to create a totally immersive application to both improve maintenance time and diminish the risk of injury, due to highly repetitive procedures.
3
lMAR: Overview of the Proposed Solution
In the previous sections we have underlined how the presence of markers can seriously hamper the integration of AR in several fields. This is particularly true
lMAR: Highly Parallel Architecture for Markerless AR in AM
23
in maintenance, where we need to identify several objects in particularly difficult environments. In fact, as said before, markers cannot be used when the size range of the objects we want to identify is really wide, as when we have to recognize both big and small objects, when the absolute size of the objects to identify is too small or too big, or when it is simply not possible to properly set up the scene with the needed tags. In these scenarios a marker-less AR approach would be more advisable. While marker based AR systems relies on the presence of tags for object identification, marker-less AR depends on modern computer vision techniques which usually are computational demanding, thus risking to deliver a stuttering and inaccurate AR experience. In order to minimize this risk we designed lMAR, a software architecture meant to execute object recognition with real time performances even in complex situations. 3.1
Working Principles
Purpose of lMAR is providing developers of marker-less AR with software tools for recognizing several specific objects present in a scene. The main idea is that we need to analyze a camera feed to find out which objects are present and in which specific pose and position they appear. The objects do not need to be on a specific plane, neither they need to satisfy specific conditions such as planarity. However, objects should show enough specific points needed to recognize them, so extremely flat monochromatic objects or highly reflective objects are not considered at the moment. lMAR was conceived to recognize objects only after a training phase. Once trained, lMAR functions will be able to analyze a video feed and return the number of recognized objects and, for each one of them, an identification and a homography matrix [4] representing object pose and scale. We can then distinguish two main functioning modes: a training mode and a working mode. During training mode the system learns one by one all the objects it will need to recognize; in the working mode lMAR functions will analyze a video feed to identify objects of interest and output their position and pose with respect to the camera frame. 3.2
Training Mode
In order to successfully recognize an object, a marker-less AR software must, first of all, learn how this object looks like. Fig. 1 shows how lMAR perform this step: at first an image I of the object obj is given to the system. The image is processed by a feature points extraction algorithm (FEA) and the list X of object’s feature points is used to populate a database which associates X to the unique object name obj. As object appearance can dramatically change with its position, the training stage should process several images of the same object, so that it could be seen and recognized from several perspectives. The database of objects is created by repeating this procedure with all the objects we want to recognize.
24
A. Caponio, M. Hincapi´e, and E.G. Mendivil
Fig. 1. Block diagram of lMAR training mode
It is worth pointing out that the training stage does not need to be fast, as it can be also done off-line using recorded videos or static images. Thus the algorithms used at this stage do not need to be fast and more attention can be given to the accurately populate the database. 3.3
Working Mode
Working mode of lMAR is shown in fig. 2 where we can distinguish three main stages: an Image Processing Stage, a Data Analysis Stage and an Output Generation Stage. Variable names in fig. 2 are described in table 1. Blocks of different colors are independent from each other and can be executed in parallel. On the contrary, blocks of the same color must be executed in sequence. This strategy is suggested in [7], and allows a multi-thread approach which helps to speed up the AR application.
Fig. 2. Block diagram of lMAR working mode
Image Processing Stage individuates feature points in the current view and compute relative motion between subsequent frame in the video feed. This is done by means of two different algorithms: a feature extraction and a feature tracking algorithm (FEA and FTA). The FEA is needed to analyze the current scene and individuate particular points, called corners, which are known to be invariant to several geometric transformations. Part of detected points belong to objects of interest and by analyzing them we will eventually recognize the objects. The FEA is fundamental for good performance of the AR system: it must be accurate and point out good features in the current scene to allow their successful tracking. On the other hand it must be fast, allowing a high frame rate of the whole application. In the past many algorithms have been developed for feature
lMAR: Highly Parallel Architecture for Markerless AR in AM
25
Table 1. Legend of variables in figure 2 Variable Name
Variable Meaning
I(ts )
Input image from the video feed when FEA is run.
I(tof )
Input image from the video feed when FTA is run.
I(tof − 1) Xs
Image previously processed by the FTA. Vector of feature points identified by FEA.
Xof
Vector of feature points identified by FTA.
Vof
Vector of velocities of points identified by FTA.
Xls
Vector of feature points identified by FEA, after
Xlof
Vector of feature points identified by FTA, after
DB
Database of object from the previous training.
X OBJ OBJ(t − 1)
labeling. labeling. Vector of filtered feature points. List of object recognized in the scene. List of object recognized in the scene at previous iteration.
Xobj
Vector of feature points belonging to recognized objects.
Xobj (t − 1)
Vector of feature points belonging to recognized objects at previous iteration.
H
Homography matrices representing pose of each
P
Matrix to indicate position in the scene for each
identified object. identified object.
extractions and corner detection; the most promising among them are the SIFT algorithm by Lowe [8], the SURF algorithm by Bay et al. [2] and the most recent DSIFT algorithm by Vedaldi and Fulkerson [12]. To compare the performances of these algorithms, we ran a preliminary study which is summarized in table 2. In this study, the three algorithms mentioned were evaluated by checking quality and number of the matches found among images from the Oxford Affine Covariant Regions Dataset [9]. Each algorithm received a score between 1 and 3 for several transformations. Finally a value to execution speed was assigned. A brief look to table 2 shows that even if the three algorithms perform well in every situation, SIFT outperforms all of them. Anyway, SIFT is the slowest algorithm and would not guarantee a high execution rate. On the other hand, also DSIFT offers really good performance, but, running at a quite high rate, qualifies as the best possible FEA.
26
A. Caponio, M. Hincapi´e, and E.G. Mendivil Table 2. Preliminary comparison between DSIFT, SIFT, and SURF algorithms DSIFT
SIFT
SURF
Affine Transformation
2
3
1
Blurring
3
3
3
Compression Artifacts
3
3
3
Rotation
3
3
3
Zoom
3
3
2
Speed of Execution
3
1
2
However, no matter which algorithm we choose, FEA will not be fast enough to guarantee an extremely reactive AR system. To improve overall performance we introduce a FTA, whose purpose is to track the features in the scene as the image changes in time. This approach was first proposed in [7], where the Optical Flow (OF) algorithm was used. OF is a common algorithm for analyzing two subsequent pictures of a same video and calculate the overall displacement between them. OF output can be used to extrapolate overall image movement, thus considerably simplifies matching of feature points between subsequent frames of the video feed. Data Analysis Stage is needed to process the information given by image processing. With reference to fig. 2, the Label Features operation is needed to find correspondent points between FEA and FTA outputs, so that points given by the two algorithms can be associated. After this, a Filter Features operation happens in order to choose the more robust feature points. The main idea is that when Xls and Xlof are received as input, the filter compares them, giving more importance to those which confirm each other. Moreover, the input points are compared with those which were previously recognized as object points: objects that were pictured in the previous frame are likely going to be still there. This step generates X, a list of feature points which are likely going to be more stable than Xls and Xlof alone. Finally, the Find Objects block finds good matches between points seen in the image and objects present in DB. Starting from the list of filtered feature points, X, and taking in consideration the list of objects recognized at the previous iteration, OBJ(t − 1), the algorithm will search for the best matches between groups of points and the object database. Eventually, the list of recognized objects OBJ and the list of feature points belonging to them, Xobj , are given as outputs. Output Generation Stage consists just in the block Calculate Homographies and Position, which is the last one in the process and calculates, from the lists OBJ and Xobj , the pose and position of each object with respect to the current camera frame. These information are expressed by a matrix of homographies H. Outputting H and OBJ will allow the rest of the AR system to understand which objects are present in the scene, where they are and how they are positioned.
lMAR: Highly Parallel Architecture for Markerless AR in AM
27
The AR system will use these information to augment the current scene, e.g. drawing 3D models of recognized objects on a screen, superimposing the virtual objects to the real one.
4
Strategies to Improve Performances
We have already stressed how important it is for AR systems to run smoothly and to provide users with a stutter-less experience. lMAR design takes this need into account in several ways. First of all, it puts side by side the FEA and the FTA as redundant algorithms: this way when FEA performs poorly and cannot recognize enough feature points or when it performs too slowly, FTA can be used to extrapolate objects’ position. Since FTA is much faster than FEA, lMAR is guaranteed to run at an overall higher frame rate than the one it would be constrained to by the FEA. As a second speeding up strategy, we designed lMAR as a multi-threaded solution, like suggested in [7]. Therefore, all operations independent from each other can be run in parallel, as different threads. This is clearly shown in fig. 2, where different colors represent different threads. In particular we can notice that the FEA and the FTA are independent from each other and from the rest of the recognition steps. In fact, as the data analysis stage processes data coming from the image processing stage, both FEA and FTA keep working as different threads, providing new data for the next iterations. A third way to improve performances regards the actual implementation of the FEA, which will be done by means of the DSIFT algorithm [12]. As shown in table 2, DSIFT is an excellent compromise between the quality of the feature extraction process and speed of execution. As a fourth final strategy to speed up the AR system, lMAR is designed to fully exploit parallel computing, now available at low cost thanks to modern GPUs. More specifically lMAR implementation will be done through the parallel computing CUDA architecture [1], which delivers the performance of NVIDIA’s graphics highly parallel processor technology to general purpose GPU Computing, allowing us to reach dramatic speedups in the proposed application. To take advantage of modern GPUs hardware capabilities, all lMAR functions are designed as parallel functions and will be implemented through CUDA.
5
Conclusions
Nowadays AR is becoming increasingly important, especially in training and maintenance. Many AR systems make use of special markers and tags to set up the virtual environment and to recognize real world objects . This makes AR systems useless or difficult to set up in many situations. To overcome this difficulty, marker-less AR systems have been proposed and researchers have lately dedicated a great amount of resources to their development. Anyway, to authors’ knowledge, no marker-less AR system is yet able to recognize several objects in the same scene while relying only on the analysis of the video feed of the scene.
28
A. Caponio, M. Hincapi´e, and E.G. Mendivil
lMAR was designed to fill this lack and to provide a software instrument to marker-less AR system development. In this article the general design of lMAR was presented. lMAR has been conceived to offer state of the art image processing and object recognition algorithms and, thanks to its highly parallel implementation, it will exploit the most recent hardware advances in GPUs, guaranteeing real time stutter-less performances. This will allow developers to offer an extremely satisfying AR experience, particularly for maintenance and training applications, where many different objects should be recognized. In the future all the described system and the needed algorithms will be developed as a C++/CUDA library, thus providing developers with an extremely performing tool to realize marker-less AR software. After this step, we will use lMAR library to realize a marker-less AR environment to support training and maintenance in aeronautic maintenance. Acknowledgments. Authors would like to thank A.DI.S.U. Puglia for the financial support of Dr. Andrea Caponio, according to the regional council resolution n. 2288/2009.
References 1. NVIDIA CUDA Compute Unified Device Architecture - Programming Guide (2010), http://developer.download.nvidia.com/compute/cuda/3_2/toolkit/ docs/CUDA_C_Programming_Guide.pdf 2. Bay, H., Esse, A., Tuytelaars, T., Gool, L.V.: Surf: Speeded up robust features. In: 9th European Conference on Computer Vision (May 2006) 3. Haritos, T., Macchiarella, N.: A mobile application of augmented reality for aerospace maintenance training. In: The 24th Digital Avionics Systems Conference, DASC 2005 (2005) 4. Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004); ISBN: 0521540518 5. Henderson, S., Feiner, S.: Evaluating the benefits of augmented reality for task localization in maintenance of an armored personnel carrier turret. In: 8th IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2009, pp. 135–144 (2009) 6. Kim, S., Dey, A.K.: Ar interfacing with prototype 3d applications based on usercentered interactivity. Comput. Aided Des. 42, 373–386 (2010) 7. Lee, T., Hollerer, T.: Multithreaded hybrid feature tracking for markerless augmented reality. IEEE Transactions on Visualization and Computer Graphics 15(3), 355–368 (2009) 8. Lowe, D.: Object recognition from local scale-invariant features. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision (1999) 9. Oxford Visual Geometry Research Group: Oxford affine covariant regions dataset, http://www.robots.ox.ac.uk/~vgg/data/data-aff.html 10. Paloc, C., Carrasco, E., Macia, I., Gomez, R., Barandiaran, I., Jimenez, J., Rueda, O., Ortiz de Urbina, J., Valdivieso, A., Sakas, G.: Computer-aided surgery based on auto-stereoscopic augmented reality. In: Proceedings of Eighth International Conference on Information Visualisation, IV 2004, pp. 189–193 (2004)
lMAR: Highly Parallel Architecture for Markerless AR in AM
29
11. Uva, A.E., Cristiano, S., Fiorentino, M., Monno, G.: Distributed design review using tangible augmented technical drawings. Comput. Aided Des. 42, 364–372 (2010) 12. Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008), http://www.vlfeat.org/ 13. Vlahakis, V., Ioannidis, M., Karigiannis, J., Tsotros, M., Gounaris, M., Stricker, D., Gleue, T., Daehne, P., Almeida, L.: Archeoguide: an augmented reality guide for archaeological sites. IEEE Computer Graphics and Applications 22(5), 52–60 (2002)
5-Finger Exoskeleton for Assembly Training in Augmented Reality Siam Charoenseang and Sarut Panjan Institute of Field Robotics, King Mongkut's University of Technology Thonburi, 126 Pracha-u-thit, Bangmod, Tungkru, Bangkok, Thailand 10140
[email protected],
[email protected]
Abstract. This paper proposes an augmented reality based exoskeleton for virtual object assembly training. This proposed hand exoskeleton consists of 9 DOF joints which can provide force feedback to all 5 fingers at the same time. This device has ability to simulate shape, size, and weight of the virtual objects. In this augmented reality system, user can assembly virtual objects in real workspace which is superimposed with computer graphics information. During virtual object assembly training, user can receive force feedback which is synchronized with physics simulation. Since this proposed system can provide both visual and kinesthesia senses, it will help the users to improve their assembly skills effectively. Keywords: Exoskeleton Device, Augment Reality, Force Feedback.
1 Introduction In general, object assembly training requires several resources such as materials, equipment, and trainers. The simulation is one of training solutions which can save costs, times, and damages occurred during training. However, most of simulators do not provide sufficient realistics and senses. Hence, this paper proposes an augmented reality based exoskeleton for virtual object assembly training. This system can provide more realistics and senses such as visual and haptic during operation. Objects for assembling task are simulated in the form of computer graphics superimposed on the real environment. Furthermore, this system can provide force feedback while the trainee assembles virtual objects. Force feedback technology in general can be categorized into 2 styles which are wearable and non-wearable. The wearable force feedback devices are usually in the form of hand, arm, and whole body exoskeletons. The non-wearable force feedback devices are usually in the form of force feedback stylus, joystick, and small robot arm. Immersion CyberGrasp mounts a force feedback device and 3D tracking device on Immersion CyberGlove [1]. It uses cables to transfer power from motor to the exoskeleton device. This device is lightweight and its motors are mounted on its base separately. Koyama, T. proposed a hand exoskeleton for generating force R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 30–39, 2011. © Springer-Verlag Berlin Heidelberg 2011
5-Finger Exoskeleton for Assembly Training in Augmented Reality
31
feedback [2]. This device uses the passive actuators, clutches, for simulating smooth force feedback. Bouzit, M. implemented small active pneumatic actuators for generating force feedback to a hand exoskeleton [3]. This exoskeleton has a small size and light weight. Ganesh Sankaranarayanan and Suzanne Weghorst proposed an augmented reality system with force feedback for teaching chemistry and molecular biology [4]. This system simulates the geometry and flexibility of organic compounds and then uses Phantom haptic device to create force feedback. Matt Adcock, Matthew Hutchins, and Chris Gunn used augmented reality with force feedback for designing, advising, and surveying among users [5]. This device uses pneumatic actuators for creating force feedback. All previous non-wearable exoskeleton devices, which are implemented with haptic device and augment reality for generating force feedback to user, cannot simulate force feedback to each joint of hand. Hence, this paper proposes an augmented reality system using wearable hand exoskeleton for generating force feedback to user during virtual assembly task.
2 System Overview Figure 1 shows the configuration of the proposed system. This system consists of a hand exoskeleton device which is used to generate force feedback to the user. The exoskeleton also sends finger’s angles and receives braking angles from the main computer. Markers are used to track the positions and orientations of virtual objects and user’s hand. Video camera is used to receive video image from the real environment. Camera is mounted on LCD Glasses Display which is used to show the graphics in the same view of user’s. Graphics is updated by the physics engine using the Bullet software library [6].
8VHU 9LGHR&DPHUD
/&'*ODVVHV'LVSOD\
0DLQ&RPSXWHU ([RVNHOHWRQ 0DUNHU
Fig. 1. System Overview
3 System Components The system consists of two main components which are hardware and software components. Hardware includes an exoskeleton device with controller, an LCD
32
S. Charoenseang and S. Panjan
Glasses Display, a video camera, force sensors, and markers. Software includes graphics manager and vision manager.
+DUGZDUH
6RIWZDUH
Fig. 2. System Components
In Figure 2, the system receives video image from the video camera, does image processing to find targets’ positions and orientations, and generate computer graphics superimposed on video image. It also sends force feedback in form of braking angles to all motors on exoskeleton device. 3.1 Exoskeleton Device Exoskeleton device is used to generate force feedback to a user. It receives commands from the main computer through the exoskeleton controller for controlling its motors. The controller also receives sensed forces from strain gages for adjusting tensions of the cables. In general of object manipulation by hand, finger no.1 can rotate about X and Z axes but fingers no.2-5 can rotate only about Z axis. The last joint of each fingers no.2-5 cannot be controlled independently. Rotation of the last joint depends on the previous joint’s rotation. Hence, mechanical structure of the proposed exoskeleton device is designed so that exoskeletons of fingers no.2-5 can generate 2-DOF force feedback at first joint and fingertip. To simplify the mechanical structure exoskeleton of finger no.1, this exoskeleton can generate only 1-DOF force feedback at the fingertip. Computer graphics of virtual finger no.1 and fingers no. 2-5 are designed with 2 DOFs and 3 DOFs, respectively as shown in Figure 3. In addition, the movements of virtual fingers are updated correspondingly to real finger’s. Physics engine uses forward kinematics from Equation 1 to calculate the position and orientation of each finger from the D-H parameters as shown in Table 1 and 2 [7]. Since the last joints of all fingers always move relatively with the middle joints, inverse kinematics can be calculated by converting 2-DOF configuration to 1-DOF
5-Finger Exoskeleton for Assembly Training in Augmented Reality
33
configuration of finger no. 1 and 3-DOF configuration to 2-DOF configuration of fingers no. 2-5 as shown in Figure 4.
Fig. 3. Frames and axes of hand
Table 1. Finger no.1’s DH-parameters
Ϭ ϵϬ ͲϵϬ
Ϭ Ϭ
Ϭ
Ϭ
Ϭ
Ϭ
ϰϱ
Ϭ
Table 2. Finger no.2-5’s DH-parameters
Ϭ
Ϭ
ϵϬ
Ϭ
Ϭ
ͲϵϬ
Ϭ
Ϭ
Ϭ
Ϭ
Ϭ
Ϭ
Ϭ
(1)
34
S. Charoenseang and S. Panjan
Fig. 4. Plane geometry associated with a finger
Equation 2 is used to calculate the distance from fingertip to base. Equation 3 is used to calculate the rotation angle of the fingertip with respect to the first joint. The first angle between base joint and middle joint can be obtained using the Equation 4-5. Equation 6 is used to find the second angle between middle joint and fingertip. Inverse kinematics is used to calculate the braking angle when a collision occurs.
M =
x2 + y2
α1 = tan −1
(2)
y x
(3)
la = (l22 + l32 )
(4)
⎡ l a2 − l12 − M 2 ⎤ ⎥ ⎣ − 2l1 M ⎦
θ1 = α 1 − cos −1 ⎢
⎡M 2 − l2 − l2 ⎤ 1 a ⎥ 2 l l − 1 a ⎣⎢ ⎦⎥
θ 2 = 180 − cos −1 ⎢
(5)
(6)
Strain gage is mounted on each joint of the exoskeleton as shown in Figure 5-a. Strain gages are used to receive force which acts on each joint of exoskeleton. Nine digital servo motors in exoskeleton device are used to transfer force to the user by adjusting the cable’s tension. Each digital servo motor with maximum torque at 8kg/cm is installed on its base separately from exoskeleton device as shown in Figure 6. The overview of exoskeleton system can be shown in Figure 7.
5-Finger Exoskeleton for Assembly Training in Augmented Reality
35
Fig. 5. (a) Strain gages mounted on the exoskeleton
(b) Close-up view
Digital Servo Motors
Fig. 6. Servo motors on exoskeleton’s base
Fig. 7. Overview of exoskeleton system
Exoskeleton controller’s MCU, which is STM32 ARM Cortex-M3 core-based family of microcontrollers, is for receiving 12 bit A/D data from each strain gage and controlling all motors. It can communicate with computer via serial port at 115,200bps and interface with motors via rx/tx pins. The main control loop is programmed to receive all force data from the 9 strain gages and braking angles for
36
S. Charoenseang and S. Panjan
motors from the main computer. If value of each strain gage is less than zero, each motor will pull the cable for adjusting the tension of cable. If value of each strain gage is more than zero and motor angle is less than the breaking angle, motor will release the cable motors. If motor angle is more than the breaking angle, each motor will hold its position. Exoskeleton controller also returns the motor angles to the main computer for updating graphics. 3.2 Vision Manager The Logitech 2 MP Portable Webcam C905[8] is used to capture video image and send it to the main computer. This camera is mounted on LCD Glasses Display in order to synchronize between the user’s view and camera’s view. Video capture resolution is 640x480 pixels and graphics refresh rate is 30 frames per seconds as shown in Figure 8. The vision manager applies the ARToolkit software library [9] to locate markers and send the marker’s position and orientation to the graphics manager.
Fig. 8. Video Display
3.3 Graphics Manager Graphics manager is responsible for rendering virtual objects and virtual hand on a marker using OpenGL as shown in Figure 9 (a) and (b). Bullet physics engine included in the graphics manager is used to detect collisions and calculate reaction force from virtual hand’s manipulation. Virtual hand is a VRML-based model with separated link models. Angle of each finger read from the exoskeleton device is sent to the graphics manager via the serial communication. Position and orientation of each finger model can be calculated from forward kinematics explained in section 3.1. The calculated position and orientation are used to update virtual hand’s position and orientation in physics simulation.
5-Finger Exoskeleton for Assembly Training in Augmented Reality
37
Hole
Hand
Peg
Fig. 9. (a) Virtual objects in physics simulation (b) Virtual hand in physics simulation
4 Experimental Results 4.1 Sensor Data Map This experiment is set to explore the relationship between force and A/D data. First, strain gages are fixed on a piece of clear acrylics. Forces with range of 0-80 N are applied to the tip of acrylics as shown in Figure 5-b. The experimental results of force and corresponding A/D data are plotted in Figure 10. (Unit)
Fig. 10. Data mapping between force and A/D data
In Figure 10, the horizontal axis represents force applied to the strain gage and the vertical axis represents A/D data read from the exoskeleton controller. The results show that strain gage can return data in a linear fashion. 4.2 Maximum Force Feedback This experiment is set to explore the maximum force feedback provided by the exoskeleton device. First, the user wears the exoskeleton to do grasping while the motors are set to hold their original positions. The exoskeleton controller then queries the maximum forced from strain gaged.
38
S. Charoenseang and S. Panjan
Motor ID
Fig. 11. Maximum force feedback from motors
In Figure 11, the horizontal axis represents motor IDs and the vertical axis represents forces exerted on each joint. The results show that exoskeleton device can generate maximum force feedback up to 50 N. 4.3 Virtual Assembly Task This experiment is set to test the virtual object assembly task. In this experiment, the user is allowed to use the proposed exoskeleton device to manipulate the virtual objects in the real environment. The goal of this virtual assembly task is to put virtual pegs in holes with force feedback effect. All virtual objects with physics simulation are augmented on the real markers as shown in Figure 12-a. User can receive force feedback while he/she manipulates the virtual objects as shown in Figure 12-b.
(a) Before grasping virtual object
(b) Grasping virtual object
(c) Virtual object in the hole
Fig. 12. Virtual assembly task with force feedback
Figure 12-c shows the completion of one virtual peg assembled in to hole. This operation can be applied for training the user in more complex assembly task with augmented information. Furthermore, the graphics refresh rate is about 25 frames per seconds.
5 Conclusions and Future Works This research proposed an augment reality with force feedback system for virtual object assembly task. Exoskeleton device was designed and built to generate 9-DOF
5-Finger Exoskeleton for Assembly Training in Augmented Reality
39
force feedback to the user’s hand. It can generate with maximum forces up to 5N for each finger. Virtual objects in physics simulation can be superimposed on the tracked real markers. Graphics refresh rate is about 25 frames per seconds. Several assembly trainings can be applied using this proposed system. In the training, the user can use the hand exoskeleton to manipulate virtual objects with force feedback in the real environment. This provides more realistics and improves the training performances. Future works of this research would cover virtual soft object manipulation, enhanced graphics user interface, and markerless augmented reality implementation. Acknowledgments. This research work is financially supported by the National Science and Technology Development Agency, Thailand.
References 1. Zhou, Z., Wan, H., Gao, S., Peng, Q.: A realistic force rendering algorithm for CyberGrasp, p. 6. IEEE, Los Alamitos (2006) 2. Koyama, T., Yamano, I., Takemura, K., Maeno, T.: Multi-fingered exoskeleton haptic device using passive force feedback for dexterous teleoperation. 3, 2905–2910 (2002) 3. Monroy, M., Oyarzabal, M., Ferre, M., Campos, A., Barrio, J.: MasterFinger: Multi-finger Haptic Interface for Collaborative Environments. Haptics: Perception, Devices and Scenarios, 411–419 (2008) 4. Sankaranarayanan, G., Weghorst, S., Sanner, M., Gillet, A., Olson, A.: Role of haptics in teaching structural molecular biology (2003) 5. Adcock, M., Hutchins, M., Gunn, C.: Augmented reality haptics: Using ARToolKit for display of haptic applications, pp. 1–2. IEEE, Los Alamitos (2004) 6. Coumans, E.: Bullet 2.76 Physics SDK Manual (2010), http://www.bulletphysics.com 7. Craig, J.J.: Introduction to robotics: mechanics and control (1986) 8. Logitech Portable Webcam C905, http://www.logitech.com/en-us/webcamcommunications/webcams/devices/6600 9. ARToolKit Library (2002), http://www.hitl.washington.edu/artoolkit/download/
Remote Context Monitoring of Actions and Behaviors in a Location through 3D Visualization in Real-Time John Conomikes1, Zachary Pacheco1, Salvador Barrera2, Juan Antonio Cantu2, Lucy Beatriz Gomez2, Christian de los Reyes2, Juan Manuel Mendez-Villarreal2 Takeo Shime3, Yuki Kamiya3, Hedeki Kawai3, Kazuo Kunieda3, and Keiji Yamada3 1
Carnegie Mellon University, Entertainment Technology Center (ETC), 800 Technology Drive, Pittsburgh, PA, 15219, USA 2 Universidad de Monterrey (UDEM), Engineering and Technology Division, Av. Morones Prieto 4500 Pte. San Pedro Garza Garcia, C.P. 66238, N.L. Mexico 3 NEC C&C Innovation Research Laboratories, 8916-47, Takayama-Cho, Ikoma, Nara 630-0101, Japan {JohnConomikes,zakpacheco}@gmail.com, {sbarrea1,jcantaya,lgomez20,xpiotiav,jmndezvi}@udem.net,
[email protected],
[email protected],
[email protected],
[email protected],
[email protected]
Abstract. The foal of this [project is to present huge amounts of data, not parseable by a single person and present it in an interactive 3D recreation of the events that the sensors detected using a 3D rendering engine known as Panda3D. "Remote Context Monitoring of Actions and Behavior in a Location Through the Usage of 3D Visualization in Real-time" is a software applications designed to read large amounts of data from a database and use that data to recreate the context that the events occurred to improve understanding of the data. Keywords: 3D, Visualization, Remote, Monitoring, Panda3D, Real-Time.
1 Introduction This prototype is the result of a long project development made at the Entertainment Technology Center where work was done in conjunction with NEC and the Universidad de Monterrey. While there is a lot of work in this field one of the unique angles of this project is the type of data is designed to build the recreation from. This data is from NEC's LifeLog system which tracks a wide variety of detailed information on what each employee in the monitored space does daily on a second to second basis. Additionally, the data can be viewed from anywhere in the world, not just the monitored laboratory. R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 40–44, 2011. © Springer-Verlag Berlin Heidelberg 2011
Remote Context Monitoring of Actions and Behaviors
41
Fig. 1. Initial 3D shaded model for the Southern laboratory
2 Methodology One of the requirements for this project is the ability to view the current state of the office, i.e. keeping up with the sensor data in real-time. Due to the large amounts of data that must be parsed every frame a rolling parsing system had to be implemented where only a portion of the data is parsed and updated each frame rather than all of it in a single frame per second. This is done because the number of frames per second must be kept above 20 in order to maintain a smooth appearance. This gives us only 50 ms of parsing time, minus the overhead of rendering the 3D environment.
Fig. 2. Initial UI design
42
J. Conomikes et al.
As the sensors only poll data at most once per second, this system allows us to keep the data real-time without sacrificing frame rate. Originally it was thought to use threading to help alleviate this problem, however the 3D rendering engine used (Panda3D) has very limited inherent support for threading so this was not possible. Another problem that was tackled was that of a user interface, as the people using this tool may not be high end computer users and there is a large amount of data available to analyze. We went over a large number of different designs (see Figure 2 above for an example of one of the previous user interface designs) before settling on this latest one which combines ease of use (similar to Office 2007[1] style tabbed buttons) while still allowing the user a large amount of freedom to show and hide data as needed. See Figure 3 below for the final user interface design of the software.
Fig. 3. Final UI design
3 System Architecture Our entire system is built on NEC's LifeLog system which is responsible for gathering the large amount of data that is needed for the software to operate. See Figure 4 below for a view of the ceiling with installed sensors.
Fig. 4. Ceiling of the South Laboratory with installed sensors
Employee location is detected through the use of IR emitters on employees and receivers mounted on the ceiling, though approximately 25% of all location data is "8022" which is the code for a person who is not detected by any IR receiver on the premises.
Remote Context Monitoring of Actions and Behaviors
43
Ambient sound level data is collected by over 90 microphones installed in the ceiling. There are also over 30 cameras (like the one shown in Figure 5 below) in place on the ceiling to provide up to 10 images per second.
Fig. 5. Close up of one of the many cameras installed in the ceiling
All E-mails send to or from monitored employees are also stored, though addressees that are not monitored are stored only as “Company Employee" or "Recipient Outside Company". Additionally, extensive information is pulled from the computer operations of each monitored employee. Statistics such as key presses, mouse clicks and mouse movements in the past second. Further, they track the currently active process running on the computer and the most recently accessed file. Even all of the currently running processes in the background. Finally they log all of the employee's internet access, though this last piece of information can be disabled by the employee. Finally, each employee has a wireless button that they carry with them that records when it was pressed and if pressed for more than one second, it also reports the duration of the press. Also, while not related to people, 16 RFID readers are used to track the location of resources (e.g. books, laptops) which have RFID tags on them, as they move around the office. It also tracks which employee is using each particular resource. The flow of information is quite simple, the LifeLog system polls the sensors for their latest information. It then takes this information, timestamps it and outputs it to a simplified YAML[2] format and stores this information on a server. Out program then connects to the server and requests the files required to view the time the user wishes to view, loads the needed information into memory in python data structures and displays the recreated events to the user. Due to security restrictions at NEC, the data is only accessible locally or through a Virtual Private Network (VPN) connection. However, since the only remote action that is being performed with the software is reading data from the server, with less strict security measures, the software can function anywhere without the need for any special access permissions.
44
J. Conomikes et al.
4 Experimental Results In testing the software it was found that starting the software up takes approximately one minute per hour of data the user wishes to view. This is because the user needs to be able to jump around to any point in the data and the only way this could be done seamlessly while playing the data is to load all needed data up front. However, after this load time, the user can easy jump to any point in time for the loaded data, in addition to being able to view the most recent data. This load time could also be reduced by having direct, local access to the server or lengthened by a slow internet connection.
5 Comments and Conclusion While the system does use a large concentration of sensors in a small area and is generally very invasive, it does mean there are many promising opportunities for future research to improve on both the technology and software. While not ready for industry yet, with the inclusion of other research as well further improvement of the current software this seems to be a promising technology and may prove to be the next big step in combining multiple different information gathering technologies.
References [1] Ebara, Y., Watashiba, Y., Koyamada, K., Sakai, K., Doi, A.: Remote Visualization Using Resource Monitoring Technique for Volume Rendering of Large Datasets. In: 2004 Symposium on Applications and the Internet (SAINT 2004), p. 309 (2004) [2] Hibbard, B.: Visad: connecting people to computations and people to people. SIGGRAPH Computer Graphics 32(3), 10–12 (1998)
Spatial Clearance Verification Using 3D Laser Range Scanner and Augmented Reality Hirotake Ishii1, Shuhei Aoyama1, Yoshihito Ono1, Weida Yan1, Hiroshi Shimoda1, and Masanori Izumi2 1 Graduate School of Energy Science, Kyoto University, Yoshida Monmachi, Sakyo-ku, Kyoto-shi, 606-8501 Kyoto, Japan 2 Fugen Decommissioning Engineering Center, Japan Atomic Energy Agency, Myojin-cho, Tsuruga-shi, 914-8510 Fukui, Japan {hirotake,aoyama,ono,yanweida,shimoda}@ei.energy.kyoto-u.ac.jp,
[email protected]
Abstract. A spatial clearance verification system for supporting nuclear power plant dismantling work was developed and evaluated by a subjective evaluation. The system employs a three-dimensional laser range scanner to obtain threedimensional surface models of work environment and dismantling targets. The system also employs Augmented Reality to allow field workers to perform simulation of transportation and temporal placement of dismantling targets using the obtained models to verify spatial clearance in actual work environments. The developed system was evaluated by field workers. The results show that the system is acceptable and useful to confirm that dismantling targets can be transported through narrow passages and can be placed in limited temporal workspaces. It was also found that the extension of the system is desirable to make it possible for multiple workers to use the system simultaneously to share the image of the dismantling work. Keywords: Augmented Reality, Laser Range Scanner, Nuclear Power Plants, Decommissioning, Spatial Clearance Verification.
1 Introduction After the service period of a nuclear power plant terminates, the nuclear power plant must be decommissioned. Because some parts of a nuclear power plant remain radioactive, the procedure of its decommissioning differs from that of general industrial plants. Each part of the nuclear power plant must be dismantled one by one by following a dismantling plan made in advance. In some cases, it is desirable to dismantle large plant components into small pieces at different location from their original location; the components are removed from their bases and transported to appropriate workspaces. However, nuclear power plants are not designed to be easily dismantled. Passages are very narrow and workspace is not large enough. Large components may collide with passages and workspace during transportation and placement. Moreover, dismantled components need to be stored at a temporal space R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 45–54, 2011. © Springer-Verlag Berlin Heidelberg 2011
46
H. Ishii et al.
for a certain period before they are transported to outside of the plant because their radioactive level must be checked. The space for the temporal storage is also not large enough. Therefore it is necessary to verify that the dismantled components can be transported through narrow passages, can be placed in a limited space before performing dismantling work. But the verification is not easy because there are various components in nuclear power plants and their shapes are much different. In this study, to make it easy for field workers to perform the verification, a spatial clearance verification system was developed and evaluated by a subjective evaluation. The system employs a three dimensional (3D) laser range scanner to obtain 3D surface point clouds of work environment and dismantling targets, and then builds polygon models. Augmented Reality (AR) technology is also employed to allow field workers to perform transportation and temporal placement simulation intuitively using the obtained models to verify spatial clearance between the work environment and the dismantling targets in actual work environments. The developed system was used along with a scenario by field workers who are working for dismantling a nuclear power plant and an interview and questionnaire survey were conducted to confirm whether the system is effective or not, how acceptable the system is, or what problems arise in practical use.
2 Related Work Various studies have been conducted to apply AR to maintenance tasks in nuclear power plants [1]. In [2], a mobile AR system is investigated as an alternative to paperbased systems to retrieve maintenance procedure from online servers. In [3], a mobile AR system to support maintenance task of a power distribution panel is proposed. The authors have proposed some AR systems to support workers in nuclear power plants [4][5][6]. In [4], an AR support system for water system isolation task is proposed and evaluated. In [5], AR technology is used to support field workers to refer cutting line of dismantling target and record the work progress. In [6], field workers are supported to make a plan of preparation for dismantling work by deciding how to layout scaffolding and greenhouses. In this study, the authors focus on a spatial clearance verification task as a new support target in which real time interaction between virtual objects and real environment need to be realized.
3 Spatial Clearance Verification System 3.1 Basic Design Most crucial requirement for spatial clearance verification is to make it possible to perform the verification using accurate 3D models of work environment and dismantling targets. The 3D models are used to detect collisions between work environment and dismantling targets. One possible way to obtain the 3D models is to use existing CAD that was made when the plant was designed. But the CAD usually includes only large components and is not updated since it was made; they do not
Spatial Clearance Verification Using 3D Laser Range Scanner and Augmented Reality
47
represent the current status of the plant properly. Therefore, the authors decided to employ 3D laser range scanner to make 3D models of work environment and dismantling targets. Concerning an interface for performing the verification, one possible way is to develop GUI application with which users can manipulate 3D models in a virtual environment. But such interface may be difficult to use because it is necessary to indicate 3D position and orientation of dismantling targets. Moreover, it is difficult to obtain concrete image of spatial relation between work environment and dismantling targets. In this study, therefore, the authors aimed at developing an AR-based application that can be used in actual work environment. The transportation path and layout of the dismantling target can be investigated intuitively by manipulating real objects, and the users can confirm which part of the work environment and dismantling targets collide each other in an intuitive way. The whole system can be divided into two subsystems; Modeling Subsystem and Verification Subsystem. 3.2 Modeling Subsystem The Modeling Subsystem is used to build 3D surface polygon models of work environment and dismantling targets. These models are used to detect collisions during using the Verification Subsystem. The accuracy of the models is not necessary to be an order of millimeter but should be better than an order of meter. It is not clear how much accurate the models should be, the authors, therefore, tried to make the total cost of the system reasonably low, and then tried to make the models as accurate as possible with the available hardware. Further study is necessary to reveal the required accuracy of the models used for the spatial verification. The Modeling Subsystem consists of a laser range scanner, a motion base and a color camera to obtain 3D point clouds of work environment and dismantling targets, and a software to make 3D polygon models from the obtained point clouds as shown in Figure 1. The hardware specifications are shown in Table 1. The laser range scanner employed in this study is a kind of line scanner and can obtain 3D positions of surrounding environment in a 2D plane. Therefore, the scanner is mounted on a motion base; the motion base rotates the scanner to obtain point clouds of whole surrounding environment. The color camera is used to capture visual images. The position and orientation of the camera when the images are captured are also recorded. The obtained point clouds are based on a local coordinate system which origin is the intersection of rotational axis of the motion base when they are obtained. But the point clouds need to be based on a world coordinate system when they are used for the spatial verification. In this study, the authors employed a camera tracking technique proposed in [7]. Multiple markers are pasted in work environment and their position and orientation based on the world coordinate are measured in advance. By capturing these markers with the color camera, the position and orientation of the camera is estimated. Then positions of the obtained point clouds are transformed into the world coordinate.
48
H. Ishii et al.
Laser Range Scanner
Table 1. Hardware specifications for Modeling Subsystem
Color Camera
Laser range scanner Motion base Motion Base
Interface for making polygon models
Camera
Fig. 1. Configuration of Modeling Subsystem
Vendor Model Scan angle Angular res. Max. error Vendor Model Angular res. Vendor Model Resolution Focal Length
SICK Inc. LMS100-10000 270 deg. 0.25 deg. 40mm FLIR Systems Inc. PTU-D46-70 0.013 deg. PointGreyResearch Inc. CMLN-13S2C-CS 1280×960 4.15mm
Another problem is that a single point cloud does not include enough points of work environment and dismantling targets. Only one side of work environment and dismantling targets can be measured at once. It is necessary to use the scanner at multiple positions to obtain whole surface of work environment and dismantling targets. The obtained point clouds need to be combined into one point cloud. One possible solution is to use the camera tracking again. If the camera can capture the markers at all measuring positions, the point clouds can be combined without any additional operation because all the point clouds are based on the world coordinate. But in some cases, it is difficult to capture markers. In this study, the authors tried to use ICP (Iterative Closest Point) algorithm to transform one point cloud to be matched with another point cloud that is already transformed into the world coordinate. But in our case, ICP algorithm can not be directly used because the point cloud obtained in nuclear power plants includes much noise and two point clouds do not always contain enough part of the environment in common. Therefore, a GUI application was developed to set an initial transform of the target point cloud by hand, and then two point clouds are combined with the following algorithm. (It is assumed that Cloud1 is already transformed into the world coordinate. The goal is to transform Cloud2 into the world coordinate.) Step 1. Smooth Cloud2 to remove random error of the measurement. Step 2. Locate a sphere which radius is 200 cm randomly inside Clouds2 and clip the points that are inside of the sphere. Step 3. Perform ICP algorithm to adjust the clipped points to Cloud1 and obtain its transformation matrix. Step 4. Apply the transformation matrix to all the points of Cloud2. Step 5. Count the number of points of Cloud2 which distance from nearest point of Cloud1 is less than 5 cm. Step 6. Repeat Step2 to Step5 10 times and choose the transformation matrix with which the number of points in Step5 is largest. Step 7. Apply the transformation matrix to all the points of Cloud2. Step 8. Repeat Step 2 to Step7 until the number of points in Step5 does not increase.
Spatial Clearance Verification Using 3D Laser Range Scanner and Augmented Reality
49
After applying the above algorithm, an area that contains the necessary points is set by hand. Then the clipped point cloud is converted into polygon model with Quadric Clustering Algorithm [8]. Concerning the polygon model for the dismantling targets, it is necessary to make a texture to increase its visibility. In this study, the texture is automatically generated using the captured images during obtaining point clouds of the dismantling targets. Figure 2 shows an example polygon models made with the Modeling Subsystem.
Work environment (Partially extracted for better visibility)
(With texture) (Without texture) Dismantling target
Fig. 2. Polygon models obtained with Modeling Subsystem
3.3 Verification Subsystem The Verification Subsystem is used to conduct simulations of transportation and placement of dismantling targets in actual work environments intuitively using Augmented Reality technology. The most significant feature of the Verification Subsystem is a function to detect collisions between virtual dismantling targets and real work environment. Figure 3 shows a conceptual image of the Verification Subsystem. The system consists of a marker cube, a tablet PC, a camera and environmental markers. The marker cube is used to indicate 3D position and orientation of a virtual dismantling target. The table PC is mounted on a tripod and a dolly, which enables users to move the system easily. Six markers are pasted on the marker cube and used to measure the relative position and orientation between the marker cube and the camera. The environmental markers pasted in work environment are used to measure the position and orientation of the camera relative to the work environment. For both the marker cube and environmental markers, the markers proposed in [7] are used. The system is supposed to be used by two workers; a cube operator and a system operator. When the camera captures the marker cube and the environmental markers, 3D models of the dismantling target made with the Modeling Subsystem is superimposed on the camera image based on the current position and orientation of the marker cube. When the cube operator moves the marker cube, the superimposed model follows its movement. When the virtual dismantling target collides with the work environment, the collided position is visualized as shown in Figure 4. The yellow area shows the collided part of the virtual dismantling target and the red area shows the collided part of the work environment. (At the initial state, 3D model of the Work Environment is invisible and the user can see the camera image. When collision occurs, only the nearest polygon from the collided position is made visible and its color is changed to red.)
50
H. Ishii et al.
Table 2 shows the hardware specifications used in the Verification Subsystem. To capture wide view angle images of the work environment, a lens that has short focal length is used. It results on the necessity to use the large markers (41cm×41cm) to make the tracking of the camera and the marker cube accurate and stable.
Environmental Markers
This part collides with the dismantling target
Cube Operator Marker Cube
Camera Tablet PC Tripod and dolly
Superimposed image on Tablet PC System Operator
Fig. 3. Conceptual image of Verification Subsystem Table 2. Hardware Verification Subsystem
Tablet PC
Camera
Vendor Model CPU GPU Memory Vendor Model Resolution Focal Length
specifications
Dismantling target model is superimposed
Fig. 4. Visualization of collided part
for
Panasonic Corp. CF-C1AEAADR Core i5-520M Intel HD Graphics 1GB PointGreyResearch Inc. CMLN-13S2C-CS 1280×960 3.12mm
Fig. 5. Interface for verification
By using the marker cube, it is expected that the position and orientation of the virtual dismantling target can be changed intuitively. But there may be a case that it is difficult to move the virtual dismantling target only with the marker cube. For example, the intended position is too high or very small adjustment is necessary. Therefore, in this study, GUI is also implemented as shown in Figure 5. The system operator can change the position and orientation of the virtual dismantling target by using the buttons and also can drag the virtual dismantling target with a stylus pen. In addition, following functions are also implemented. 1. A function to record the 3D position and orientation of the virtual dismantling target. The superimposed image is also recorded simultaneously. 2. A function to make the virtual dismantling target invisible. 3. A function to reset all the indication of the collided part. (The color of the virtual dismantling target is set to its original color and the model of the work environment is made invisible.)
Spatial Clearance Verification Using 3D Laser Range Scanner and Augmented Reality
51
The application was developed on an operating system Windows 7 (Microsoft Corp.) using compiling software Visual C++ 2008 (Microsoft Corp.). Open GL, Visualization Tool Kit Library [9] and Bullet Physics Library [10] were used to render 3D models, implement ICP algorithm and conduct collision detection respectively.
4 Evaluation 4.1 Objective It is expected that it is possible for field workers to simulate transportation and placement of dismantling targets using the proposed system. However, it remains unknown how acceptable the system is for actual field workers, what problems arise in practical use. An evaluation experiment was conducted to answer these questions. In this evaluation, the authors mainly focused on the evaluation of the Verification Subsystem because the pre-evaluation showed that combining multiple point clouds by hand using the Modeling Subsystem is difficult for novice users. The Modeling Subsystem will be improved and evaluated as a future work. 4.2 Method Before the evaluation, the experimenters pasted environmental markers and measured their position and orientation relative to the work environment using Marker Automatic Measurement System [11]. The experimenters demonstrated how to use the Modeling Subsystem and the Verification Subsystem for about 10 minutes each. Then four evaluators used the Modeling Subsystem and the Verification Subsystem with the assumption that one plant component will be dismantled. The evaluators used the Modeling Subsystem only to obtain point clouds and did not try to combine the point clouds into one point cloud. The polygon models used with the Verification Subsystem were prepared in advance by the experimenters. Each evaluator played only a role of the system operator. The experimenter played a role of the cube operator. After using the system, the evaluators answered questionnaire, then an interview and a group discussion were conducted. The dismantling target was assumed to be a water purification tank as shown in the right hand side of Figure 3. The evaluators were asked to use the Verification Subsystem under the assumption that the tank will be removed from its base, placed temporarily at the near space, and then transported through a narrow passage. Of the four evaluators, three (Evaluator A, B and C) were staffs at Fugen Decommissioning Engineering Center. One (Evaluator D) was a human interface expert working at a university. 4.3 Questionnaire and Results The questionnaire includes 36 items for system function and usability as shown in Table 3. Evaluators answer each question as 1 – 5 (1. completely disagree; 2. disagree; 3. fair; 4. agree; 5. completely agree). In addition, free description is added to the end of the questionnaire. Respondents describe other problems and points to be improved.
52
H. Ishii et al.
Each evaluator used the system for about 40 minutes. Table 3 presents the results of the questionnaire. Table 4 presents answers of the free description, interview and group discussion. Table 3. Questionnaire results Questionnaire Q1 Is it easy to set up the system? Q2 Is it easy to remove the system? Q3 The situation of temporal placement becomes easy to be understood by superimposing the dismantling target over the camera view. Q4 The situation of transportation becomes easy to be understood by superimposing the dismantling target over the camera view. Q5 It is easy to recognize the collided position on the dismantling target by making the collided position yellow. Q6 It is easy to recognize the collided position in the work environment by making the collided position red. Q7 It is effective to make it possible to change the position and orientation of dismantling target by moving the marker cube. Q8 It is easy to translate the dismantling target by using the marker cube. Q9 It is easy to rotate the dismantling target by using the marker cube. Q10 It is effective to translate the dismantling target using a stylus pen. Q11 It is effective to rotate the dismantling target using a stylus pen. Q12 It is easy to translate the dismantling target using a stylus pen. Q13 It is rotate to translate the dismantling target using a stylus pen. Q14 It is easy to operate the system using a stylus pen. Q15 It is effective to translate dismantling target using the buttons. Q16 It is easy to translate dismantling target using the buttons. Q17 It is effective to set the position and orientation of dismantling target at its initial position using the button. Q18 It is effective to record the position and orientation of dismantling target. Q19 It is easy to record the position and orientation of dismantling target. Q20 It is effective to refer the recorded position and orientation of dismantling target visually. Q21 It is easy to refer the recorded position and orientation of dismantling target visually. Q22 It is effective to choose the recorded capture images using the buttons. Q23 It is easy to choose the recorded capture images using the buttons. Q24 The function is effective to make dismantling target invisible. Q25 The function is effective to reset the color of dismantling target. Q26 The size of the area to display the camera image is adequate. Q27 The size of the PC display is adequate. Q28 The size of the system is adequate and it is easy to carry in. Q29 The size of the buttons is adequate. Q30 The system can be used easily even if it is the first use. Q31 The system response is quick enough. Q32 It is easy to rotate the system to change your viewpoint. Q33 It is easy to move the system to change your viewpoint. Q34 It is effective to make dismantling target models by measuring with the system and use them for the verification. Q35 It is effective to verify temporal placement and transportation work by referring dismantling target model at actual work environment. Q36 I could use the system without feeling stress.
A 5 5 5
Evaluator B C 4 5 4 5 4 4
D 5 5 5
5
5
4
5
4
2
5
4
5
4
5
5
4
2
4
5
4 2 5 5 5 3 5 5 5 4
2 4 4 4 4 5 3 3 4 5
3 3 5 5 5 5 3 4 5 5
5 5 5 5 5 3 4 5 5 5
5
5
5
5
5 5
5 5
5 5
5 5
5
5
5
4
5 5 5 5 5 5 5 5 4 5 5 4 5
5 5 5 5 5 5 4 5 4 4 4 5 4
5 5 5 5 4 4 4 3 4 5 4 4 5
5 5 5 5 5 5 5 5 4 5 5 5 5
5
4
5
5
3
3
5
4
Spatial Clearance Verification Using 3D Laser Range Scanner and Augmented Reality
53
Table 4. Free description and interview results (Partially extracted) Evaluator A A1 It is difficult to tell the cube operator how to move the marker cube only by gesture. A2 It is difficult to conduct detail operations using the stylus pen especially for the model rotation. A3 The models should be more stable when the camera does not move. Evaluator B B1 It is a little difficult to notice the change of the color. It may be better to change only the color of the work environment. B2 The marker cube is not necessary if the same operation can be done with the buttons. B3 It is better if the virtual model follows the marker cube more quickly. Evaluator C C1 The size of the marker cube should be smaller. C2 Sometimes it was difficult to see the display because of the reflection of the light. C3 It is better if it is possible to change the amount of model movements by the button operation. Evaluator D D1 Using the marker cube is intuitive. D2 The system is useful to confirm that dismantling targets can be transported through passages. D3 Changing the color of the dismantling target is useful to decide which part of the dismantling target should be cut to be transported through a narrow passage. D4 The system will be more useful if multiple workers can use the system simultaneously. This extension will enables us to check what other workers will see from their positions.
4.4 Discussion As shown in Table 3, all evaluators gave positive responses to almost all questionnaire items. But for several items, some evaluators gave negative responses. Evaluator B gave a negative response to Q5. For Q5, he also gave a comment B1 as in Table 4. The authors decided to change the colors of both dismantling target and work environment because it will give more information to the workers. In fact, Evaluator D gave a comment D3 that is a positive response to changing the color of the dismantling target. Therefore, it will be better to add a function to enable and disable the color of dismantling target and work environment separately. Evaluator B gave negative responses to Q7 and Q8. He also gave a comment B2. On the other hand, Evaluator D gave a positive comment D1 to the marker cube. The possible cause of this difference is that Evaluator B is much younger than Evaluator D and very familiar with computers. Evaluator B is good at using GUI therefore he may think that the marker cube is not necessary. Evaluator A gave a negative response to Q9. He also gave a comment A1. It was difficult to give orders by voice because the work environment is very noisy. Therefore, the evaluators must give orders to the cube operator by gestures. But the authors did not teach the evaluators anything about which gesture should be used to give orders to the cube operator. A set of standard gestures should be designed and shared between the cube operator and the system operator in advance. Evaluator D gave an interesting comment D4. It is easy to make it possible for multiple workers to use the system by introducing multiple hardwares and exchanging information via wireless network. This extension will enable us to share the work image that is very important to increase the safety and efficiency.
54
H. Ishii et al.
5 Summary and Future Works In this study, a spatial verification support system using a 3D laser range scanner and Augmented Reality was developed and evaluated by a subjective evaluation. The results show that the system is basically acceptable and useful for the spatial verification. Artificial marker based tracking was employed in this study, because the authors intended to prioritize stability and accuracy rather than practicability. For practical use, it is necessary to decrease the number of markers and make it possible for workers to move more freely. Another problem is that there is a case that the scanner can not be used to make surface models of dismantling targets; the target is at high location or obstructed by other components. One possible solution is to employ a modeling method using only small cameras. One promising extension of the system is to make it possible for multiple workers to use the system simultaneously. This extension will enable workers to share the image of dismantling work that is very important to increase the safety and efficiency of the dismantling work. Acknowledgments. This work was partially supported by KAKENHI (No. 22700122).
References 1. Ishii, H.: Augmented Reality: Fundamentals and Nuclear Related Applications. International Journal of Nuclear Safety and Simulation 1(4), 316–327 (2010) 2. Dutoit, H., Creighton, O., Klinker, G., Kobylinski, R., Vilsmeier, C., Bruegge, B.: Architectural issues in mobile augmented reality systems: a prototyping case study. In: Software Engineering Conference, pp. 341–344 (2001) 3. Nakagawa, T., Sano, T., Nakatani, Y.: Plant Maintenance Support System by Augmented Reality. In: IEEE International Conference on Systems, Man, and Cybernetics, vol. 1, pp. 768–773 (1999) 4. Shimoda, H., Ishii, H., Yamazaki, Y., Yoshikawa, H.: An Experimental Comparison and Evaluation of AR Information Presentation Devices for a NPP Maintenance Support System. In: 11th International Conference on Human-Computer Interaction (2005) 5. Ishii, H., Shimoda, H., Nakai, T., Izumi, M., Bian, Z., Morishita, Y.: Proposal and Evaluation of a Supporting Method for NPP Decommissioning Work by Augmented Reality. In: 12th World Multi-Conference on Systemics, Cybernetics, vol. 6, pp. 157–162 (2008) 6. Ishii, H., Oshita, S., Yan, W., Shimoda, H., Izumi, M.: Development and evaluation of a dismantling planning support system based on augmented reality technology. In: 3rd International Symposium on Symbiotic Nuclear Power Systems for 21st Century (2010) 7. Ishii, H., Yan, W., Yang, S., Shimoda, H., Izumi, M.: Wide Area Tracking Method for Augmented Reality Supporting Nuclear Power Plant Maintenance Work. International Journal of Nuclear Safety and Simulation 1(1), 45–51 (2010) 8. Lindstrom, P.: Out-of-core simplification of large polygonal models. In: 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 259–262 (2000) 9. Visualization Tool Kit, http://www.vtk.org/ 10. Bullet Physics Library, http://bulletphysics.org/ 11. Yan, W., Yang, S., Ishii, H., Shimoda, H., Izumi, M.: Development and Experimental Evaluation of an Automatic Marker Registration System for Tracking of Augmented Reality. International Journal of Nuclear Safety and Simulation 1(1), 52–62 (2010)
Development of Mobile AR Tour Application for the National Palace Museum of Korea Jae-Beom Kim and Changhoon Park Dept. of Game Engineering, Hoseo University, 165 Sechul-ri, Baebang-myun, Asan, Chungnam 336-795, Korea
[email protected],
[email protected]
Abstract. We present the mobile augmented reality tour application (MART) to provide intuitive interface for the tourist. And, a context-awareness is used for smart guide. In this paper, we discuss practical ways of recognizing the context correctly with overcoming the limitation of the sensors. First, semi-automatic context recognition is proposed to explore context ontology based on user experience. Second, multiple sensors context-awareness enables to construct context ontology by using multiple sensor. And, we introduce the iphone tour application for the national palace museum of korea. Keywords: Mobile, Augmented Reality, Tour, Semi-automatic context recognition, Multi-sensor context-awareness.
1 Introduction We introduce an ongoing project to develop an mobile AR tour application for the national palace museum of korea running on the iphone. Every exhibit in the museum has its own name and history. For richer experience, this application is based on the augmented reality to make that content available to tourists interacting with exhibits by enhancing one’s current perception of reality. Moreover, we also support AR content authoring in situ to share their experiences of exhibits. When the visitor see a exhibit through iPhone’s camera, relevant information to the captured real images will be provided. To achieve this, the tour application is developed based on a client-server architecture. The client sends the query image to a remote server for recognition process, which extract visual features from the image and perform the image mach against large database of reference images by using SIFT(Scale-Invariant Feature Transform) algorithm. Once the matching image is found, the client render and overlay computer-generated virtual elements about the objects in it. And, the client continuously tracks the viewing pose, relative to the real object for image registration. Compass and gyroscope sensors of iPhone 4 are used for tracking. R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 55–60, 2011. © Springer-Verlag Berlin Heidelberg 2011
56
J.-B. Kim and C. Park
Fig. 1. Overview of the Mobile Augmented Reality Tour (MART) Application
2 Context-Awareness for MART We have been researching mobile augmented reality tour(MART) applications to provide intuitive interface to the visitor. And, the context-awareness is used to support a smart tour guide. In this paper, we discuss practical ways of recognizing the context correctly with overcoming the limitation of the sensors. And, this approach is implemented in the iphone tour application for the national palace museum of korea. Table 1. Three key steps for context-awareness Step
Input
Output
1. Context Recommendation
Name of the sensor data (automatic)
Candidate Contexts
2. Context Exploration
User input (manual)
Best Matching Context
3. Resources Offer
User input (manual)
Multimedia, 3D Model, Other applications ...
First step is to recommend candidate contexts by using the name of the captured data from the sensor. This name can represent identification and characteristics of the sensor data. For example, this name can be retrieved from GPS coordinates with the help of google places API that returns information about a “place”. In second step, the user can find the best matching context for the situation. Because of the limitations of sensor, it is difficult to recognize all contexts by using only the sensor. So, it is allowed for the user to explore the ontology based context manually. Third step provides the list of resources available for the specific context. 2.1 Semi-automatic Recognition of the Context This paper propose an efficient way of exploring context ontology based on user experience. We use the past experience to minimize the cost of context exploration of
Development of Mobile AR Tour Application
57
second step mentioned previous section. Interesting contexts receive a higher reference count that stands for the user’s visiting frequency. And, these contexts are more likely to appear at the top for exploration. For example, the “public place” context can not be provided directly from GPS sensor. Instead, the user can find “public place” context in the ontology from the high part of the “Museum”. And, if there is no service for indoor location, “ticket booth” context cannot be provided directory by using only the sensor. But, the user can find “ticket booth” context from the low part of “Museum” context. After all, context ontology includes the context reduced by sensor or not. This semi-automatic approach will enable provide appropriate contexts to the user quickly with overcoming the limitations of sensor. To apply the experience, context ontology records how many times the context is referenced by the user. And the order of displaying context is depends on this value. In addition to this, the experience of friends are also be considered with different weight ratio for the calculation of interesting. This approach based on experience will be expected to reduce not only the cost of context exploration but also support the sharing of experience. 2.2 Multiple Sensors Context-Awareness We propose a way of constructing context ontology to define more concrete contexts by using multiple sensor. To achieve this, context is limited related to at mote one kind of sensor. If there are two sensors for context recognition, we can find two contexts in the ontology where there is a path between them. For example, the visitor can take a picture with current location by using camera and GPS sensor. Then, we can find the name of captured image and the name of location. Several contexts can be founded by using these names. And, we will provide a context if there is a path between two contexts. This means that there is a sensor hierarchy. Low level context is affected by high level context. High-level sensor affect on it;s lower-level contexts. After all, we can define concrete contexts by combining multiple sensors on the context ontology. We can find the name of object by using camera. If there are two more than contexts reduced by this name, this means that there are the same things in the world. Then, we can restrict the scope by using GPS by adding the context reduced by camera into the low level of the context by GPS.
3 Mobile AR Tour Application In this section, we introduce the key implementation method of the iphone tour application for the national palace museum of korea. The AR Media Player makes content available to tourists interacting with exhibits by enhancing one’s current perception of reality. And, In-situ authoring and commenting support AR content authoring in situ to share their experiences of exhibits.
58
J.-B. Kim and C. Park
3.1 AR Media Player The client consists of 3 layers: live camera view, 3d graphic rendering and touch input. First, we make a layer to display video preview coming from the camera. Second layer is for the rendered image of virtual world. In virtual world, interactive virtual character will explain about what the camera is seeing. We ported OpenSceneGraph to the iPhone for real-time 3D rendering. OpenSceneGraph is based on the concept of scene graph, providing high performance rendering, multiple file type loaders and so on. And, This layer clears the color buffer setting the alpha to 0 to draw 3D scene on the top of the camera view layer. So, the camera view layer will be shown in the background. The third layer is provided for GUI.
Fig. 3. AR media player consisting of three layers: live camera view, 3D rendering and GUI
3.2 Image Query The tour application send a query image automatically without user input. If the live captured image on the screen is identified by the server, the green wireframe rectangle will be displayed like the below figure. This approach is very intuitive and natural, but the cost of network should be considered. To reduce bandwidth usage, we change the size and resolution of image for query. And, the client use acceleration and compass sensor to decide when is the best time to send query image. The movement of iphone enables to detect when the user focuses attention on the particular exhibit or not. So, we can control the frequency of sending the query image.
Development of Mobile AR Tour Application
59
Fig. 4. Image query running on the iPhone 4 without user input
3.3 In-situ AR Authoring The client provide an interface for in-situ authoring of AR contents on the iphone as the below figure. This interface enables to create visitor’s own contents for a specific exhibit on the spot. And this content can be shared with others who are also interested in the same exhibit. We will suggest an efficient and easy way of in-situ authoring with overcoming the limitation of mobile devices.
Fig. 5. In-situ AR authoring and commenting on iPhone
4 Conclusion In this paper, we presented practical way of recognizing the context correctly with overcoming the limitation of the sensors. Semi-automatic recognition of the context is proposed to reduce not only the cost of context exploration but also support the sharing of experience. And, we introduced multiple sensor based context-awareness to define more concrete contexts by using multiple sensor. Promising results were demonstrated in the iphone tour application for the national palace museum of Korea.
60
J.-B. Kim and C. Park
Acknowledgement. “This research was supported by the Academic Research fund of Hoseo University in 2009” (20090082)
References 1. Park, D.J., Hwang, S.H., Kim, A.R., Chang, B.M.: A Context-Aware Smart Tourist Guide Application for an Old Palace. In: Proceedings of ICCIT (2007) 2. Adomavicius, G., Tuzhilin, A.: Context-Aware Recommender Systems. Technical Report, http://ids.csom.umn.edu 3. Seo, B.K., Kim, K., Park, J., Park, J.I.: A tracking framework for augmented reality tours on cultural heritage sites. In: Proceedings of VRCAI (2010) 4. Riboni, D., Bettini, C.: Context-Aware Activity Recognition through a Combination of Ontological and Statistical Reasoning. In: Zhang, D., Portmann, M., Tan, A.-H., Indulska, J. (eds.) UIC 2009. LNCS, vol. 5585, pp. 39–53. Springer, Heidelberg (2009) 5. Gellersen, H., Schmidt, A., Beigl, M.: Multi-Sensor Context-Awareness in Mobile Devices and Smart Artefacts. Mobile Networks and Applications 7(5), 341–351 (2002) 6. Lim, B.: Improving trust in context-aware applications with intelligibility. In: Proceedings of Ubicomp, pp. 477–480 (2010)
A Vision-Based Mobile Augmented Reality System for Baseball Games Seong-Oh Lee, Sang Chul Ahn, Jae-In Hwang, and Hyoung-Gon Kim Imaging Media Research Center, Korea Institute of Science and Technology, Seoul, Korea {solee,asc,hji,hgk}@imrc.kist.re.kr
Abstract. In this paper we propose a new mobile augmented-reality system that will address the need of users in viewing baseball games with enhanced contents. The overall goal of the system is to augment meaningful information on each player position on a mobile device display. To this end, the system takes two main steps which are homography estimation and automatic player detection. This system is based on still images taken by mobile phone. The system can handle various images that are taken from different angles with a large variation in size and pose of players and the playground, and different lighting conditions. We have implemented the system on a mobile platform. The whole steps are processed within two seconds. Keywords: Mobile augmented-reality, baseball game, still image, homography, human detection, computer vision.
1 Introduction A spectator sport is a sport that is characterized by the presence of spectators, or watchers, at its matches. If additional information can be provided, it will be more fun when viewing spectator sports. How about applying a mobile augmented-reality system (MARS) to spectator sports? Augmented Reality is widely used for sports games, like football, soccer, and swimming except baseball. Therefore, we want to focus on baseball games. Hurwitz and co-workers proposed a conceptual MARS that targets baseball games [1]. However, any implementation methods have not been presented. Previous research literatures include several papers on Augmented Reality (AR) technology for sports entertainment. Demiris et al. used computer vision techniques to create a mixed reality view of the athletes attempt [2]. Inamoto et al. focused on generating virtual scenes using multiple synchronous video sequences of a given sports game [3]. Some researchers tried to synthesize virtual sports scenes from TV broadcasted video [4], [5]. These systems, however, were not designed for real-time broadcasting. Han et al. tried to build a real-time AR system for court-net sports like tennis [6]. Most of the previous works were applied to TV broadcasting of sports. There have been no AR systems for on-site sports entertainment. In this paper we propose a MARS for baseball games in stadium environments. The overall goal of the system is to augment meaningful information with each player R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 61–68, 2011. © Springer-Verlag Berlin Heidelberg 2011
62
S.-O. Lee et al.
on a captured playfield image during a game. This information includes name, team, position, and statistics of players and games, which are available via the Web and local information server installed in stadium. Our system is currently based on still images of playfields, which are taken by a mobile phone. The images are can be from different angles, having a large variation in size and pose of players and playground, and different lighting conditions. The rest of this paper is structured as follows. Section 2 gives an overview and detailed description of the proposed system. The experimental results on the baseball field images are provided in Section 3, and Section 4 concludes the paper.
2 The Proposed System Figure 1 shows the architecture of the proposed system. This system starts the processing with capturing a still image from a mobile device. We use a still image because of two reasons. The first reason is that users may have some difficulties in holding mobile devices without shaking for a long time while interacting with augmented contents on a live video frames. The second reason is that a still image has higher resolution than an image frame of a video sequence. In general, users take a picture in a long distance during a baseball game. For detecting players, we need a sufficient image resolution of the players. The captured still image is then analyzed to estimate a homography between a playfield template and the imaged playfield, and to detect the location of each player. If the analysis is performed successfully, the game contents are received by accessing the information server. A user can touch an interested player on the mobile phone screen. A best candidate of the corresponding player, then, is found by a simple method combining the detected player location and the game information with some boundary constraints. Finally, team and name of the player is augmented above the touched player. Detail information is displayed on a new screen when the user touches the player’s name. We use a new screen, because the screen size is too small to display the whole information on the field image. Figure 5 shows an example of the results.
Fig. 1. The proposed system architecture
2.1 Planar Homography Estimation One of the main techniques for AR technology is the homography estimation. In general, there are two different approaches for the homography estimation from a
A Vision-Based Mobile Augmented Reality System for Baseball Games
63
single image. First one is a marker-based approach that uses image patterns that are specially designed for homography estimation. Second one is a markerless approach that does not use those patterns, but is restricted to natural images that contain distinctive local patterns. However, baseball playfield images include formalized geometric primitives that are hard to distinguish between an input frame from the reference frame based on local patterns. Therefore, we propose a baseball playfield registration method by matching the playfield shape, which consists of geometric primitives. This method is divided into three steps. First, contours are extracted based on edges between the dominant color (e.g. grass) and others. Secondly, geometric primitives, like lines and ellipses, are estimated by using parameter estimation methods. Third, homography is estimated by matching those geometric primitives. Edge Contours Extraction. Unlike other sports playfield that have well-defined white line structure, grass and soil colors are two dominant colors in baseball field [7]. The infield edge pixels define most of the shape structure. Actually, foul lines are designated with a white line. It is not enough to estimate the projective transformation. To detect the edge pixels, grass-soil playfield segmentation approach is considered [8]. However, according to the empirical analysis of the colors, grass pixels have dominant component of green in RGB color space. By setting a pixel with larger green component than red component as grass, we get a reliable pixel classification result as shown in Figure 2(b). Noise removal is followed by applying a Median filtering. Note that we do not filter out the background areas, such as sky and spectators, because the homography estimation step removes these areas automatically. After pixel classification, an edge detection algorithm is applied to detect edge pixels. There are many methods in the literature to detect edges from an image. In this case, a simple edge detection method that detects pixels of grass area adjacent to other area is developed. We set as edge the pixels that have both of grass and other components in a 3x3 window. The detected edges are shown in Figure 2(c). Finally, edge pixels are linked together into lists of sequential edge points, one list for each edge-contour for discriminating the connectivity. Note that small segments and holes are removed by discarding contours that have smaller length than 50. Geometric Primitives Estimation. The infield structure of a baseball field consists of two different types of shape, line and ellipse. Starting with the detected edge contours, line and ellipse parameters are extracted. Brief descriptions of the estimation methods are as follows. Line segmentation method is used to form straight-line segments from an edgecontour by slightly modifying Peter Kovesis implementation [9]. The start and end positions of a line segment are determined, and the line-parameters are further refined with a least-square line fitting algorithm. Finally, nearby line segments with similar parameters are joined. The final line segmentation results are shown in Figure 2(d). There are two possible ellipses, the pitcher’s mound and the home plate, in a baseball field. It is hard to detect the elliptical shape of home plate in general, because
64
S.-O. Lee et al.
it is not separated into a single edge-contour. Therefore, in our system, the pitcher’s mound is considered as the best detectable ellipse in a playfield. A direct least squares ellipse fitting algorithm is utilized in each edge-contour for ellipse parameter estimation [10]. Then, we can find the pitcher’s mound as the ellipse with minimum error smaller than a pre-defined threshold by using ellipse fitness function. Finally, the estimated ellipse is verified by fine matching based on sum of squared difference (SSD). Note that we assume that the observed image contains the pitcher’s mound. The final detected ellipse is shown in red in Figure 2(d) (The figure is best viewed in color).
Fig. 2. Contours extraction and geometric primitives estimation: input image (a), classified grass pixels (white) (b), detected edges (c), detected lines (yellow) and an ellipse (red) (d)
Homography Estimation. A diamond shape which consists of four line segments is located inside of the infield and a circle, pitcher’s mound, exists at the very center of the diamond. Outside the diamond, two foul lines are located. Hence, we define the playfield model composed of six line segments and a circle. The defined model is shown in Figure 3(a). Now, homography estimation is thought as a matter of finding correspondences between the model and a set of extracted shapes from the observed image. Our solution utilizes four line correspondences with two sets of parallel lines in the diamond shape of the playfield. A transformation matrix is determined immediately by using the normalized direct linear transformation [11].
A Vision-Based Mobile Augmented Reality System for Baseball Games
65
Fig. 3. The defined playfield model (6 lines and a circle) (left) and the geometrical constraints (green: selected lines, red: removed lines) (right)
Searching for the best correspondence requires a combinatorial search that can be computationally complex. Hence, we try geometrical constraints. Since we don’t know any shape correspondences except the pitcher’s mound, metric and scale properties are recovered roughly using relationship between a circle and an ellipse without considering perspective parameters. Then, all the extracted line segments are sorted in counter-clock wise order by an absolute angle of line joining the center of the ellipse and the center of each line segment. And we remove the line segments beyond the scope of pre-defined length from the center of the ellipse to the center of each line segment. We also applied some minor constraints by utilizing similar techniques proposed in the literature [7]. These constraints resolve the image reflection problem and reduce the number of search significantly as shown in Figure 3, where many lines are removed by geometrical constraints. For each configuration that satisfies these constraints, we compute the transformation matrix and the complete model matching error as described in [7]. The transformation matrix that gives the minimum error is selected as the best transformation. Finally, the estimated homography is verified by fine matching based on SSD. Figure 4 shows a transformed playfield model that is drawn over an input image using the estimated homography. 2.2 Player Detection For automatic player detection in our framework, we use the AdaBoost learning based on histograms of oriented gradients that gives somewhat satisfied detection rate and fast search speed [13]. Dalal & Triggs show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection [14]. A feature selection algorithm, AdaBoost, is performed to automatically select a small set of discriminative HOG features with orientation information in order to achieve robust detection results. More details of this employed approach can be found in [13]. This approach is designed to use a larger set of blocks that vary in sizes, locations and aspect ratios. Therefore, it is possible to detect variable-size players in images. If we know the search block size, it improves the detection accuracy and reduces the
66
S.-O. Lee et al.
searching time. In the proposed system, the search block size is calculated by using the average height of Korean baseball players (i.e. 182.9 cm) [12]. At first, a camera pose is estimated using a robust pose estimation technique from a planar target [15]. Input parameters for the pose estimation are the four corresponding lines that are used to estimate a homography. Next, a search block size at each pixel location is calculated approximately using a given camera pose and average height. In a baseball game, most of the interested players are inside a baseball field. The detected players outside the field are not considered. An example of player detection results is shown in Figure 4.
Fig. 4. Homography estimation and player detection: a transformed playfield model (left), detected players (green box) and a player outside the playfield (red box) (right)
3 Experimental Results We have tested the proposed algorithm using photos taken with Apple iPhone 3GS and 4 on a PC with an Intel 2.67 GHz Core I7 CPU. The pictures were taken at Jamsil and Mokdong baseball stadiums in Seoul, Korea. Images were resized two different resolutions, 640 x 480 and 960 x 720, that are used to estimate homography and to detect players including outfielders respectively. Homography estimation time always remained between 50 and 100 ms. The time costs to detect all players are much longer than this. However, there is no need to search all the pixels inside the baseball field, because only an interested player is searched within the small region that is selected by a user. We also implemented the system on a mobile platform (Apple iPhone 4). The whole steps were processed within two seconds. The information server manages contexts of baseball games held in Korea. The mobile device connects to the information server via wireless network after the image processing step. As we know, the information server does not provide the exact location of each player. Therefore, we roughly matched the detected player with the given information by inference based on the team, the position, and the detected location. Figure 5 shows results of the system after touching an interested player on the mobile phone screen.
A Vision-Based Mobile Augmented Reality System for Baseball Games
67
Fig. 5. The implemented system on a mobile platform: the upper screen displays team, name (over the player), and position (upper-right) in Korean text after touching an interested player and the lower screen displays the detail information of the player
4 Conclusion and Future Work We have described the vision-based Augmented Reality system that displays supplementary information of players on a mobile device during a baseball game. Since homography estimation plays an important role in this system, we propose a new estimation method to fit a baseball field. As a player detection method, we employ the fast and robust algorithm based on Adaboost learning that gives somewhat satisfied detection rate and search speed. However, sometimes we fail to detect players. Further improvement of the detection rate remains as a future work. We have successfully implemented the system on a mobile platform and tested the system two different stadiums. Our current system does not cover every baseball stadiums, because the proposed pixel classification algorithm is based on the playfield consists of grass and soil. However, we found that there are various types of playfield in the world. For example, some stadiums have a playfield that is painted white lines on a green field. Therefore, our next goal is to develop a system that satisfies these various types of playfield.
68
S.-O. Lee et al.
References 1. Hurwitz, A., Jeffs, A.: EYEPLY: Baseball proof of concept - Mobile augmentation for entertainment and shopping venues. In: IEEE International Symposium on ISMAR-AMH 2009, pp. 55–56 (2009) 2. Demiris, A.M., Garcia, C., Malerczyk, C., Klein, K., Walczak, K., Kerbiriou, P., Bouville, C., Traka, M., Reusens, E., Boyle, E., Wingbermuhle, J., Ioannidis, N.: Sprinting Along with the Olympic Champions: Personalized, Interactive Broadcasting using Mixed Reality Techniques and MPEG-4. In: Proc. of BIS 2002, Business Information Systems (2002) 3. Inamoto, N., Saito, H.: Free viewpoint video synthesis and presentation of sporting events for mixed reality entertainment. In: Proc. of ACM ACE, vol. 74, pp. 42–50 (2004) 4. Matsui, K., Iwase, M., Agata, M., Tanaka, T., Ohnishi, N.: Soccer image sequence computed by a virtual camera. In: Proc. of CVPR, pp. 860–865 (1998) 5. Kammann, T.D.: Interactive Augmented Reality in Digital Broadcasting Environments. Diploma Thesis, University of Koblenz-Landau (2005) 6. Han, J., Farin, D., de With, P.H.N.: A Real-Time Augmented-Reality System for Sports Broadcast Video Enhancement. In: Proc. of ACM Multimedia, pp. 337–340 (2007) 7. Farin, D., Han, J., de With, P.: Fast Camera Calibration for the Analysis of Sport Sequences. In: IEEE Int. Conf. Multimedia Expo (ICME 2005), pp. 482–485 (2005) 8. Kuo, C.-M., Hung, M.-H., Hsieh, C.-H.: Baseball Playfield Segmentation Using Adaptive Gaussian Mixture Models. In: International Conference on Innovative Computing, Information and Control, pp. 360–363 (2008) 9. Nguyen, T.M., Ahuja, S., Wu, Q.M.: A real-time ellipse detection based on edge grouping. In: Proc. of the IEEE International Conference on Systems, Man and Cybernetics, pp. 3280–3286 (2009) 10. Halir, R., Flusser, J.: Numerically stable direct least squares fitting of ellipses. In: 6th International Conference on Computer Graphics and Visualization (1998) 11. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004) 12. Korea Baseball Organization: Guide Book (2010), http://www.koreabaseball.com 13. Zhu, Q., Avidan, S., Yeh, M.C., Cheng, K.T.: Fast Human Detection Using a Cascade of Histograms of Oriented Gradients. In: IEEE Conf. on CVPR, pp. 1491–1498. IEEE Computer Society Press, Los Alamitos (2006) 14. Dalal, N., Triggs, B.: Histogram of Oriented Gradients for Human Detection. In: IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, vol. 2, pp. 886–893 (2005) 15. Schweighofer, G., Pinz, A.: Robust Pose Estimation from a Planar Target. IEEE Trans. on Pattern Analysis and Machine Intelligence 28, 2024–2030 (2005)
Social Augmented Reality for Sensor Visualization in Ubiquitous Virtual Reality* Youngho Lee1, Jongmyung Choi1, Sehwan Kim2, Seunghun Lee3, and Say Jang4 1
Mokpo National University, Jeonnam, Korea {youngho,jmchoi}@mokpo.ac.kr 2 WorldViz, Santa Barbara, CA 93101, USA
[email protected] 3 Korea Aerospace Research Institute, Korea
[email protected] 4 Samsung Electronics Co., Ltd., Korea
[email protected]
Abstract. There have been several research activities on data visualization exploiting augmented reality technologies. However, most researches are focused on tracking and visualization itself, yet do not much discuss social community with augmented reality. In this paper, we propose a social augmented reality architecture that selectively visualizes sensor information based on the user’s social network community. We show three scenarios: information from sensors embedded in mobile devices, from sensors in environment, and from social community. We expect that the proposed architecture will have a crucial role in visualizing thousands of sensor data selectively according to the user’s social network community. Keywords: Ubiquitous virtual reality, context-awareness, augmented reality, social community.
1 Introduction Recently, computing paradigm shows its trend that the technologies including ubiquitous virtual reality, social community analysis, and augmented reality combine the real world and the virtual world [1,2]. A smart object is a hidden intelligent object that recognizes user’s presence and provides services to immediate needs. With smart objects, users are allowed to interact with a whole environment with expecting the highly intelligent responses. With the changing of computing paradigms, mobile devices which were proposed in the Mark Weiser’s paper are commercialized in our daily lives [11]. Especially, mobile devices are not only small devices for voice communication between human and human but also user interfaces to access social community [10]. *
This paper was supported by Research Funds of Mokpo National University in 2010.
R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 69–75, 2011. © Springer-Verlag Berlin Heidelberg 2011
70
Y. Lee et al.
There are several research activities on visualization of sensor data using augmented reality technology. Gunnarsson et al. developed a prototype system for visual inspection of hidden structures using a mobile phone wireless ZigBee sensor network [3]. Claros et al., Goldsmith et al., and Yazar et al. demonstrated AR interface for visualizing wireless sensor information in a room [4-6]. However, previous researches show possibility that augmented reality is good to visualize sensor data, they didn’t discussed how to visualize rich data. In real applications, sensors are installed in large scale environments such as bridge, mountain, or city in some case. Therefore it is very hard to visualize sensor data as user want to. In this paper, we propose a social augmented reality architecture that visualizes sensor information based on the user’s social network community selectively. Three possible scenarios are presented to design the architecture: about visualization of information from sensors embedded in mobile devices, from sensors in environment, and from social community. It is based on Context-aware Cognitive Agent Architecture for real-time and intelligent responses of user interfaces [8]. This architecture enables users interact with sensor data through an augmented reality user interface in various ways of intelligence by exploiting social community analysis. This paper is organized as followings. In Section 2, we briefly introduce related works in ubiquitous virtual reality. Service Scenarios of Social AR for Sensor Visualization is presented in Section 3, and Context-aware Cognitive Agent Architecture for Social AR is in Section 4. Conclusion is in Section 5.
2 Related Works 2.1 Smart Objects and Social Community in Ubiquitous Virtual Reality Ubiquitous Virtual Reality (Ubiquitous VR) has been researched to apply the concept of virtual reality into ubiquitous computing environments (real world) [9]. Lee et al. presented three key characteristics of Ubiquitous VR based on reality, context, and human activity [2]. Reality-virtuality continuum was introduced by Milgram. According to Milgram’s idea, real world is ‘any environment consisting solely of real objects, and includes whatever might be observed when viewing a real-world scene either directly in person’. Context is defined as ‘any information that can be used to characterize the situation of an entity, where an entity can be a person, place, or physical or computational object’. Context can be represented as static-dynamic continuum. We call static context if it describes information such as user profile. On the other hand, if it describes wisdom obtained by intelligent analysis, it is called dynamic context. Human activity could be classified into personal, group, community and social activity. It can be represented a personal-social continuum. Ubiquitous VR supports human social connections with highest-level user context (wisdom) in mixed reality. A smart object is a hidden intelligent object that recognizes user’s presence and provides information to their immediate needs by using its sensors and processor. It assumes that things necessary for daily life embedded microprocessors, and they are
Social Augmented Reality for Sensor Visualization in Ubiquitous Virtual Reality
71
connected over wired/wireless network. It also assumes that user interfaces control environmental conditions and support user interaction in a natural and personal way. Fig1 shows an idea which combines three major research area, augmented reality, social community analysis, and smart objects.
Fig. 1. Augmented Reality, Smart objects, and Social Community
2.2 Context-Aware Cognitive Agent Architecture for Ambient User Interfaces Cognitive Agent Architecture for virtual and smart environment was proposed for realizing seamless interaction in ubiquitous virtual reality [8]. It is a cognitively motivated vertically layered two-pass agent architecture for realizing responsiveness, reactivity, and pro-activeness of smart objects, smart environments, virtual characters, and virtual place controllers. Direct responsiveness is bounded to time frame of visual continuity (about 40 msec). Immediate reaction is requested from user’s command and it could take more than 40msec, with a second. Pro-activity is schedule events and it could take any amount of time, five sec, a min., or a day.
Fig. 2. Context-aware cognitive agent architecture for ambient user interfaces in ubiquitous virtual reality [2]
72
Y. Lee et al.
Context-aware Cognitive Agent Architecture (CCAA) is designed for real-time and intelligent responses of ambient user interfaces based on context-aware agent architecture in Ubiquitous VR [2]. The three layers are AR (augmented reality) layer, CA (context-aware) layer, and AI layer. This architecture enables ambient smart objects to interact with users in various ways of intelligence by exploiting context and AI techniques.
3 Service Scenarios of Social AR for Sensor Visualization 3.1 Service Scenarios In this section, we gather service scenarios of social sensor AR systems and elicit some functional and non-functional requirements. Social sensor AR systems are complex systems that utilize social network concept, sensor network, and augmented reality. The idea comes from that too much information will raise visualization problems and people would like to watch information from their community or by selected based on their social community. The information can come from sensors in the environments or from social network services such as facebook or twitter.
Fig. 3. Process of social augmented reality
We can think of some service scenarios of social AR system for sensor visualization. There are two cases: sensors could embed in mobile AR devices or sensors are located in the environment. The first scenario is about outdoor activity and health related service. Asthma is one of chronic and critical illness, and it is closely related to pollen, but it is hard to see in our naked eyes. Here is the scenario. • Service Scenario 1. Kelly wants to take a walk to a park around her home with her daughter, Jane. However, whenever she goes out to the park, she is worried about Jane’s asthma attack caused by pollen. So she checks the pollen count at the Internet before going out, but the information is not so correct because they provide information about
Social Augmented Reality for Sensor Visualization in Ubiquitous Virtual Reality
73
broad area, not a specific area such as the park. Now she can see the pollen count via sensor AR system before going to the park. And she finds out that it is very easy to explain to her daughter why Jane cannot go to the park when the pollen count is high by showing the pollen monster images on the system. After then she share the pollen information with her asthma community in Facebook, so that other member can check it before going out to the park. The second scenario is about indoor activity and education related service. Library is equipped with RFID tracking system for searching books and wireless network is available. • Service Scenario 2 Hyo who is living in city is going to library to read books. There are thousands of books which include adventures, drama, fantasy, history and so on. The books are managed by RFID management system. So, whenever people move books, the management system recognizes book’s new location automatically. Let’s assume that she is a new member of science fiction circle. But she doesn’t know what to read. While she looks around the library, Social AR user interface shows memos written by her friends in the circle and information who read the book to her. It also recommends books to her. That information is very useful to select books to read. The third one is scenario in a conference site and getting information from social network. The conference site could be outdoor or indoor if there is proper location tracking system. • Service Scenario 3 Hun is attending international conference for his research and business. So, he is looking for persons who have interest in similar research topic. First, he opens his profile which includes research topics, paper lists, and contact information. Privacy setting is important to prevent those information is opened without willingness. Now, Hun watches where the persons are located in the site roughly with their information.
4 Context-Aware Cognitive Agent Architecture for Social AR in Ubiquitous Virtual Reality Based on the Context-aware Cognitive Agent Architecture [8], we extend it for the possible scenarios. The fourth layer, community layer is added in the original architecture. Fig 4 shows the four layers. Social Network Construction module receives context (processed information) from lower layers and constructs user’s social network. Social Network Analysis module reduces and optimizes the network to the current user’s needs.
74
Y. Lee et al.
Fig. 4. Context-aware Cognitive Agent Architecture for Social AR
Social Augmented Reality for Sensor Visualization in Ubiquitous Virtual Reality
75
5 Conclusion and Future Works In this paper, we propose social augmented reality architecture which visualizes sensor information based on the user’s social network community selectively. Several scenarios are suggested to figure out necessary functions. However, our work is still ongoing project. We expect the proposed architecture will be improved for the future applications.
References 1. Lee, Y., Oh, S., Shin, C., Woo, W.: Recent Trends in Ubiquitous Virtual Reality. In: International Symposium on Ubiquitous Virtual Reality, pp. 33–36 (2008) 2. Lee, Y., Oh, S., Shin, C., Woo, W.: Ubiquitous Virtual Reality and Its Key Dimension. In: International Workshop on Ubiquitous Virtual Reality, pp. 5–8 (2009) 3. Gunnarsson, A., Rauhala, M., Henrysson, A., Ynnerman, A.: Visualization of sensor data using mobile phone augmented reality. In: 5th IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR 2006), pp. 233–234. IEEE Computer Society, Washington, DC (2006) 4. Claros, D., Haro, M., Domínguez, M., Trazegnies, C., Urdiales, C., Hernández, F.: Augmented Reality Visualization Interface for Biometric Wireless Sensor Networks. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 1074–1081. Springer, Heidelberg (2007) 5. Goldsmith, D., Liarokapis, F., Malone, G., Kemp, J.: Augmented Reality Environmental Monitoring Using Wireless Sensor Networks. In: 12th International Conference Information Visualisation, pp. 539–544 6. Yazar, D., Tsiftes, N., Osterlind, F., Finne, N., Eriksson, J., Dunkels, A.: Augmenting reality with IP-based sensor networks. In: 9th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN 2010), pp. 440–441 (2010) 7. Dow, S., Mehta, M., Lausier, A., MacIntyre, B., Mateas, M.: Initial lessons from AR Façade, an interactive augmented reality drama. In: ACM SIGCHI International Conference on Advances in Computer Entertainment Technology, June 14-16 (2006) 8. Lee, Y., Schmidtke, H.R., Woo, W.: Realizing Seamless Interaction: a Cognitive Agent Architecture for Virtual and Smart Environments. In: International Symposium on Ubiquitous Virtual Reality, pp. 5–6 (2007) 9. Kim, S., Lee, Y., Woo, W.: How to Realize Ubiquitous VR? In: Pervasive: TSI Workshop, pp. 493–504 (2006) 10. Choi, J., Moon, J.: MyGuide: A Mobile Context-Aware Exhibit Guide System. In: Gervasi, O., Murgante, B., Laganà, A., Taniar, D., Mun, Y., Gavrilova, M.L. (eds.) ICCSA 2008, Part II. LNCS, vol. 5073, pp. 348–359. Springer, Heidelberg (2008) 11. Weiser, M.: The Computer for the Twenty-First Century. Scientific American, 94–10 (September 1991)
Digital Diorama: AR Exhibition System to Convey Background Information for Museums Takuji Narumi1, Oribe Hayashi2, Kazuhiro Kasada2, Mitsuhiko Yamazaki2, Tomohiro Tanikawa2, and Michitaka Hirose2 1
Graduate School of Engineering, The University of Tokyo / JSPS 7-3-1 Hongo Bunkyo-ku, Tokyo Japan 2 Graduate School of Information Science and Technology, The University of Tokyo 7-3-1 Hongo Bunkyo-ku, Tokyo Japan {narumi,olive,kasada,myama,tani,hirose}@cyber.t.u-tokyo.ac.jp
Abstract. In this paper, we propose a MR museum exhibition system, the “Digital Diorama” system, to convey background information intuitively. The The system aims to offer more features than the function of existing dioramas in museum exhibitions by using mixed reality technology. The system superimposes computer generated diorama scene reconstructed from related image/video materials onto real exhibits. First, we implement and evaluate location estimation methods of photos and movies are taken in past time. Then, we implement and install two types of prototype system at the estimated position to superimpose virtual scenes onto real exhibit in the Railway Museum. By looking into an eyehole type device of the proposed system, visitors can feel as if they time-trip around the exhibited steam locomotive and understand historical differences between current and previous appearance. Keywords: Mixed Reality, Museum Exhibition, Digital Museum.
1 Introduction In every museum, a great deal of informational materials about museum exhibits has been preserved as texts, pictures, videos, 3D models and so on. Curators have tried to convey this large amount of information to the visitors by using instruction boards or audio/visual guidance devices within their exhibitions. However, such conventional information assistance methodologies cannot tell or show vivid background information about target exhibits, for example, the state of society of that time or a usage scene of the object. Meanwhile, with rapid growth in information technologies, mixed reality technologies had developed and popularized in last decade. Today, we can present a high-quality virtual experience in real-time and real-environment by using next generation display and interaction system: auto-stereoscopic 3D displays, gesture input devices, marker-less tracking system, etc. Thus, museums are very interested in the introduction of these technologies to tell the rich background information about their exhibits. There are some research projects about this kind of exhibition systems featuring digital technology. R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 76–86, 2011. © Springer-Verlag Berlin Heidelberg 2011
Digital Diorama: AR Exhibition System to Convey Background Information for Museums
77
In this paper, we introduce a MR system superimposing virtual environment onto real exhibits - the Digital Diorama system. In a museum exhibition, a diorama is a technology for showing usage scenes and situations of the exhibits by building a set or painting background image like a film. The Digital Diorama system aims to offer more features than the function of existing dioramas in museum exhibitions. In particular, our proposed system superimposes computer generated diorama scene on an exhibit by using HMD, a projector, etc. With this approach, the system can present vivid scenes or situations to visitors with real exhibits: how it was used, how it was made and how they moved. Based on this concept, we made prototype system superimposing virtual environment which was reconstructed from concerning photographs/videos on the real exhibit (Fig. 1). The Digital Diorama system consists of two main component technologies. One is methodology deriving the location in which the target image/video material was taken. In order to integrate a virtual scene from related photos or movies with the exhibit, we propose an interactive method for estimating the relative position where the source photos or movies are taken. The other is methodology superimposing the scene and the exhibit. By placing eyehole type device consist of a HMD and a webcam at the estimated position and presenting superimposed the scene and the exhibit, user can experience the past historical event scene intuitively. This paper describes implementation and evaluation of this Digital Diorama system, mainly focusing on conveying historical scene of museum exhibits in the Railway Museum, Japan.
Fig. 1. Concept of Digital Diorama system
2 Related Works There are some research projects about exhibition systems with cutting-edge display technologies. For example, Virtual Showcase proposed by Oliber Bimber et al. is a storytelling exhibition system [1]. However few of these research projects are actually introduced in the museums, because they do not follow a trend of museum's exhibition method and conventional curators cannot understand how to utilize them. These research projects aim to construct system similar to display cases or panels that are used in museums now and curators can utilize them naturally. Furthermore, these systems intend to show small exhibit.
78
T. Narumi et al.
2.1 Constructing Digital Data and AR Exhibit for Museum Even though we can come up with the conceptual idea of digital system for the small exhibits easily, conveying background information about large exhibits in museums is complex and confusing. Exhibits are very large and look very different according to where we stand. The exhibit may be mutually away. For those reason, to treat large exhibits with digital diorama is difficult. Then we contemplate how to convey background information. In the case that there is a white flat wall, we can use a projector for background telling. And there are many research projects about projector based AR techniques. For example, O. Bimber et al. use projector in a historical museum to explain and interact with pictorial artworks [2]. This research can show information about the artwork but their system requires a diffused projection screen when we want to cover large area. Hand held projectors are useful to present individual information for each person. Yoshida et al. propound hand held projector interaction system [3]. They estimate positional posture information against the wall based on the axis sift between the lens center of the projector and camera. Since the camera has to acquire the standard image, this system is restricted by optical environment. Moreover in a lot of cases, like spatially large museum, we may not have wall to project data or it may be too bright to watch projected information. We cannot present circumstance or background of the exhibits with it because there are no projection screens behind the exhibit. Therefore we decide to use a HMD and a camera for our digital diorama system. An example case of approach to apply VR technology to a certain place with special feature is ”Open Air Exhibition” [4] which was discussed in Japan Expo. 2005. “Open Air Exhibition” is a prototype to create outdoor exhibition space using wearable computers worn by visitors. It does not require a pavilion for the exhibition. And there are some research projects to duplicate past time in the cultural heritage. For example, Ikeuchi et al. reconstruct Kawaradera at Asuka area which is a buddhist temple in Japan constructed in the seventh century with AR technology [5]. Papagiannakis reproduce people’s livings at time when pompeii have not been buried by the volcanic ash [6]. Furthermore archeoguide [7] system provides not only Augmented Reality reconstructions of ancient ruins but on-site help, based on user’s position and orientation in the cultural site. Though these research projects consider the characteristic of the place, their object is so old that they can only use CG for the restoration of the exhibit. We decided to use photographs/movies which show the exhibit of other days. Image Based Rendering technique [8] is very useful to construct the scenery of the past time with photos. If we have much enough pictures of the target exhibits we can use manual or automatic IBR process [9-11]. This technique uses feature points to reconstruct building not only outdoor appearance but also indoors. They are extremely useful to preserve appearance of exhibit or scenery exist today, but it is not possible to use them to reproduce a past appearance of them of which very few or even no photos remained. From the same reason, cyber city walk [12] and google street view [13] that aim to construct photorealistic virtual city have little meaning for our object. But their accumulated data is very important and they will be useful for our research when time passes and appearance of the cities changes.
Digital Diorama: AR Exhibition System to Convey Background Information for Museums
79
2.2 Estimation and Superimpose Methods There is a research about superimposing background image like Depth keying [14]. This system uses Chroma key that is very orthodox way to superimpose background. The weak drawback of this method is to require a blue sheet behind exhibits. To superimpose photos/movies to a video image, natural connection between them is also important. And if there is a little gap between target photos/movies and a video image and the estimation of the point where the photo was taken, simple blending method is not enough because a big blurring is caused as long as the taking a picture position is not the same. Poisson image editing [15] is the way to generate natural middle flame. With this method, we can use rough estimation for digital diorama. Digital Diorama system requires the estimation the point where the photo was taken. 8-point algorithm [16] enables us to estimate the relative position based on the photo. When we want to superimpose CG in video image, we can use marker less tracking like PTAM [17]. If we can detect relative direction to the real world, we can superimpose a virtual object to a video image.
3 Concept and Implementation In a museum exhibition, the diorama is the technique to convey usage scene and situation of the exhibits to visitors by constructing a three-dimensional full-size or miniature model or painting a background image of the exhibits like a film. Digital diorama aims to realize the same function by using mixed reality technology. In particular, our proposed system superimposes computer generated diorama scene on a exhibits by using a HMD, a projector, etc. With this approach, the system can present the vivid scene or situation to visitors with real exhibits: how to be used, how to be made and how to be moved them. Based on this concept, we implemented a prototype system superimposing reconstructed virtual environment from concerning photographs and videos on the real exhibit. 3.1 The Method of Locating the Point in Which the Photo Was Taken In order to connect old photographs or videos with the exhibit, we proposed a system for estimating the relative position where the photos or videos are taken. We call an old photograph or video as a “target image,” which means the system aims to guide the user to the target image. This matching system is constructed with a mobile PC and a webcam. First, the system acquires a current image of a target exhibit from the webcam and compares it with target image materials based on feature points. Second, the system estimates the relative position where the target image is taken and guides the user to that position (Fig. 2). By repeating this procedure, the system and user specify that accurate position. In this system, the user gives three feature points for each target image and the current view. It is difficult to identify same point on each image automatically, because past and current situations are largely different. After this step, the system tracks the assigned feature points on current image and outputs directions
80
T. Narumi et al.
Fig. 2. Work flow of the estimation
Fig. 3. Coordination used in the estimation
continuously: yaw, pitch, right/left, back/forward and roll. The user moves according to the direction presented by the system and can reach the past camera position. We use Lucas & Kanade algorithm [18] to track three feature points on the video view. With this tracking method, we can estimate the relative position from the exhibit continuously. It is important to estimate the relative position by a few feature points in order to reduce the load to do it. In order to estimate relative position from a little information, the system uses these assumptions. − − − −
The height of camera position is the same past and present. The camera parameter of cameras is the same past and present. Three feature points are right angle in the real world. The object is upright.
The system outputs yaw and pitch instructions so that the average coordinates of past and present images coincides. It determines right/left instructions by using the angle of three feature points. The frame of reference is shown in Fig. 3. Eye directions of the target image and current image are on XZ plane and they look toward the origin. Three feature points are on XY plane. The angle between the past eye direction and Z axis is θtarget. A is camera parameter matrix. The distance from the original to the past viewpoint is dtarget and the distance from the original to the current viewpoint is dcurrent. If the absolute coordinate of a feature point is (X, Y, 0) and its coordinate in the image plane is (u’, v’, s), their relationship is showed in Equation 1 and 2. (1)
(2) The system determines θtarget and absolute coordinates of three feature points by using this equation and the assumption. Secondly, it calculates θcurrent so that the angle made by three feature points in the current image plane doesn’t contradict the absolute coordinates. θcurrent is the angle between the current eye direction and Z axis. dtarget is
Digital Diorama: AR Exhibition System to Convey Background Information for Museums
81
also assumed in this calculation. The distance Dright is the distance to move right, and it is determined by Equation 3. (3) When the system outputs back/forward instructions, it uses scale difference of coordinates in the images. lt1, lt2 are the distance of AB, BC in the image plane of the target picture. lc1, lc2 are the distance of AB, BC in the image plane of the current picture. The distance Dforward to move forward is determined by Equation 4. (4) The system outputs roll instructions so that the slopes of AB and BC in the image planes coincide. We did a simulation to make sure that the method to estimate camera position is valid. The unit of length is m in this simulation. Firstly, past viewpoint was dtarget = 50, θtarget = π/6. The absolute coordinates were A(-10, 10, 0), B(10, 10, 0), C(10, -10, 0). We shifted the current viewpoint in this simulation by dcurrent = 40, 50, 60, -π/6 < θcurrent < π/3. Past and current viewpoints were assumed that they look at the origin. In a simulation, the current viewpoint moved according to the right/left, back/forward directions from the system, and finally reached a point which the system estimated to be a past camera position. The result is showed in Fig. 4. The average error, which means the distance from past viewpoint and the result of guidance, was 0.76m. The ratio of dtarget to this error is 1.5. Next, we did the other simulation by changing absolute coordinates of feature points. In order to change the angle of feature points, absolute coordinates of A and B were changed as follows: (A(-10, 15, 0), B(10, 5, 0)) (A(-10, 14, 0), B(10, 6, 0))…(A(-10, 6, 0), B(10, 14, 0)) (A(-10, 5, 0), B(10, 15, 0)). The past viewpoint was dtarget = 50, θtarget = π/6 and we shifted the current viewpoint in this simulation by dcurrent = 50, -π/6
Fig. 4. The result of guidance in a simulation
Fig. 5. The relationship between the angle and the errors
82
T. Narumi et al.
Then we did an experiment of estimating camera positions of 26 past photos in The Railway Museum. The photos are about a steam locomotive C57. As a result, we succeeded in estimating 22 camera positions of past photos. The reason of failure in three photos is interruption by other exhibits. Failure in one photo is due to incorrect assumption. It cost us 4 minutes 14 seconds on average to estimate camera position of the past photos. These results are shown in Fig. 6.
Fig. 6. Matching result between image materials and the position around steam locomotive exhibit
Fig. 7. Prototype for superimposing omnidirectional video
3.2 Implementation of the Digital Diorama System To present superimposed real exhibits on virtual reconstructed scene, we installed two types of prototype display device at the estimated position in front of the exhibits of the Railway Museum (Fig. 7). The outline of the system merging past event scene and current scene of an exhibit is shown in Fig. 8.
Fig. 8. System chart of the Digital Diorama
The first prototype system consists of a webcam and a HMD for presenting integrated diorama scene by using ordinary photos/movies taken in past time. This system was fixed at the estimated camera position and attitude which determined from the target photo/movie and exhibit by using the method mentioned in 3.1. Visitor can look into this system as if he/she is looking into an observation window, and he/she can observe the atmosphere of old days through this window. The other prototype system is constructed for omnidirectional and panoramic movies. For more immersive experience, wide field of view and interaction such as looking around is effective. On the other hand, wide angle lens cameras and
Digital Diorama: AR Exhibition System to Convey Background Information for Museums
83
mosaicing algorithms are popular and amount of such images accumulated in data bases around the world are increasing. Thus, generating or taking omnidirectional/ panoramic scene is easier than before. To experience such omnidirectional scene, we construct this system which is similar to binocular telescope located in a observation deck. Display part of this system is same as that of first prototype system. This system has 2-DOF rotating mechanism and user can decide his/her view direction by moving display part freely (Fig. 7). According to the rotating angle from each sensor of rotating mechanism, this system presents proper view of the generated scene at that direction. By looking into this system, visitor can look around the composed omnidirectional scene through the binocular telescope interface. For this system, we propose two superimposing methods described below. Method A: Simple Alpha Blending (Fig. 9 top). First, user watch live video image of current exhibit through eyehole type device. Then the picture of both background and the exhibit is superimposed by alpha blending. At the end, user watches the video image taken when the exhibit is in use. Method B: Stencil Superimposing (Fig. 9 bottom). User can watch current exhibit first. Then pictures background not including past exhibits figure is superimposed to video image by alpha blending. At last, exhibit area also replaced by exhibits figure in video images.
Fig. 9. Superimpose picture to video image
To evaluate the efficiency of the Digital Diorama system, we ask seven subject to experience the generated diorama scene through these two prototype system in the Railway Museum. We installed these prototype systems at the estimated camera position. Then subjects watch movies that connect present C57 with past C57 image. And we took a questionnaire from them after experiencing our exhibition system. The results of questionnaire are shown in Fig. 10. The individual elements of the system, the validity of the camera positioning, smooth blending, video stability, get high evaluation. This means that the system can connect past images with present exhibits naturally. The result of questionnaire shows that there are problems about our display device. The subjects said that the resolution (800 x 600) was not high and they feel some kind of distortion when they looked through a HMD. This problem can be solved by the improvement of display devices.
84
T. Narumi et al.
We got medium evaluation about the question whether the subjects can image the past situation. Some subjects said that if there were more contents, they can image past situation more vividly. The question whether the subjects got absorbed in the movies on a HMD also got medium evaluation. Moreover, many subjects said that the system was good for knowing the background contexts of exhibits. Large part of the users preferred Method B which superimposed to understand background information. On the other hand, pseudo scene that the exhibit looked moving occurred with Method A. In fact exhibit’s driving wheel looked turning that did not turn actually when the movie of running exhibits is superimposed. About both of these systems, the subjects said alpha blending with perfectly geometrically compatible image was very impressive. So large amount of them replied the system helped understanding background information about exhibits more effective than only watching pictures of the exhibit. Users also said that the voice and noise in movie is very important to understand the scenery of the past years the exhibits was in use.
Fig. 10. The result of user feedback (Left: Technical section, Right: Impression). Error bar means variance and yellow point means the answer from a curator.
4 Conclusion and Future Works In this paper, we proposed the Digital Diorama system to convey background information intuitively. The system superimposes real exhibits on computer generated diorama scene reconstructed from related image/video materials. Our proposed system is divided in two procedures. In order to switching and superimposing real exhibits and past photos seamlessly, we implement a sub-system for estimating the camera position where photos are taken. Thus, we implement and install two types of prototype system at estimated position to superimposing virtual scene and real exhibit in the Railway Museum. Seven subjects, including museum curator, experienced the constructed museum exhibition system, and answered our questionnaire. According to the results of questionnaire, camera position estimation and seamless image connection got good evaluation. And many subjects said that our system was good for knowing the contexts of exhibits. In future works, we brush up our prototype system for providing more high quality realistic scene more effectively by using projector-based AR, 3D acoustic device, marker-less tracking etc. By reconstructing 3D virtual environment [12] from different types of image/video materials, user can slightly move around the exhibition
Digital Diorama: AR Exhibition System to Convey Background Information for Museums
85
while experiencing past event scene or time-tripping between current and past age. Also by allowing lighting condition [19] and occlusion [20] in superimposing process, the system provide more realistic diorama scene to the visitors. Finally we aim to introduce these systems for practical use in the museum. Acknowledgements. This research is partly supported by “Mixed Realty Digital Museum” project of MEXT of Japan. The authors would like to thank all the members of our project. Especially, Torahiko Kasai and Kunio Aoki, the Railway Museum.
References 1. Bimber, O., Encarnacao, L.M., Schmalstieg, D.: The virtual showcase as a new platform for augmented reality digital storytelling. In: Proc. of the Workshop on Virtual Environments 2003, vol. 39, pp. 87–95 (2003) 2. Bimber, O., Coriand, F., Kleppe, A., Bruns, E., Zollmann, S., Langlotz, T.: Superimposing pictorial artwork with projected imagery. In: ACM SIGGRAPH 2005 Courses, p. 6 (2005) 3. Yoshida, T., Nii, H., Kawakami, N., Tachi, S.: Twincle: Interface for using handheld projectors to interact with physycal surfaces. In: ACM SIGGRAPH 2009 Emerging Technologies (2009) 4. Ueoka, R., Hirose, M., Kuma, K., Sone, M., Kohiyama, K., Kawamura, T., Hiroto, K.: Wearable computer application for open air exhibition in expo 2005. In: Proc. of the 2nd IEEE Pacific Rim Conference on Multimedia, pp. 8–15 (2001) 5. Kakuta, T., Oishi, T., Ikeuchi, K.: Virtual kawaradera: Fast shadow texture for augmented reality. In: Proc. of Intl. Society on Virtual Systems and MultiMedia 2004, pp. 141–150 (2004) 6. Papagiannakis, G., Schertenleib, S., O’Kennedy, B., Arevalo-Poizat, M., MagnenatThalmann, N., Stoddart, A., Thalmann, D.: Mixing virtual and real scenes in the site of ancient pompeii. Computer Animation & Virtual Worlds 16, 11–24 (2005) 7. Vlahakis, V., Karigiannis, J., Tsotros, M., Gounaris, M., Almeida, L., Stricker, D., Gleue, T., Christou, I.T., Carlucci, R., Ioannidis, N.: Archeoguide: first results of an augmented reality, mobile computing system in cultural heritage sites. In: Proc. of the 2001 Conference on Virtual Reality, Archeology and Cultural Heritage, pp. 131–140 (2001) 8. Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry- and image-based approach. In: Proc. of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 11–20 (1996) 9. Aoki, T., Tanikawa, T., Hirose, M.: Virtual 3D world construction by inter-connecting photograh-based 3D modeles. In: Proc. of IEEE Virtual Reality 2008, pp. 243–244 (2008) 10. Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Reconstructing building interiors from images. In: Twelfth IEEE Intl. Conference on Computer Vision, ICCV 2009 (2009) 11. Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: Exploring photo collections in 3D. In: Proc. of SIGGRAPH 2006, pp. 835–846 (2006) 12. Hirose, M., Watanabe, S., Endo, T.: Generation of wide-range virtual spaces using photographic images. In: Proc. of 4th IEEE Virtual Reality Annual Intl. Symposium, pp. 234–241 (1998) 13. Google Street View, http://maps.google.com/ 14. Gvili, R., Kaplan, A., Ofek, E., Yahav, G.: Depth keying. In: SPIE, pp. 564–574 (2003) 15. Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. ACM Trans. Graph. 22(3), 313– 318 (2003)
86
T. Narumi et al.
16. Hartley, R.I.: In defense of the eight-point algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 580–593 (1997) 17. Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: Proc. of ISMAR 2007, pp. 1–10 (2007) 18. Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the 1981 DARPA Image Understanding Workshop, pp. 121–130 (1981) 19. Kakuta, T., Oishi, T., Ikeuchi, K.: Shading and shadowing of architecture in mixed reality. In: Proceedings of ISMAR 2005, pp. 200–201 (2005) 20. Kiyokawa, K., Billinghurst, M., Campbell, B., Woods, E.: An occlusion-capable optical see-through head mount display for supporting co-located collaboration. In: Proc. of ISMAR 2003, pp. 133–141 (2003)
Augmented Reality: An Advantageous Option for Complex Training and Maintenance Operations in Aeronautic Related Processes Horacio Rios, Mauricio Hincapié, Andrea Caponio, Emilio Mercado, and Eduardo González Mendívil Instituto Tecnológico y de Estudios Superiores de Monterrey, Ave. Eugenio Garza Sada 2501 Sur Col. Tecnologico C.P. 64849 — Monterrey, Nuevo León, México {hrios1987,maurhin}@gmail.com,
[email protected],
[email protected],
[email protected]
Abstract. The purpose of this article is to present the comparison between three different methodologies for the transfer of knowledge of complex operations in aeronautical processes that are related to maintenance and training. The first of them is the use of the Traditional Teaching Techniques that uses manuals and printed instructions to perform an assembly task; the second one, is the use of audiovisual tools to give more information to operators; and finally, the use of an Augmented Reality (AR) application to achieve the same goal with the enhancing of real environment with virtual content. We developed an AR application that operates in a regular laptop with stable results and provides useful information to the user during the 4 hours of training; also basic statistical analysis was done to compare the results of our AR application. Keywords: Augmented Reality, Maintenance, Training, Aeronautic Field.
1 Introduction During the past 20 years there has been a technological revolution like no other. The advances continuously done in technology and information are changing our daily lives in an unprecedented way that cannot be compared. Currently, Aviation industries are looking for new technologies that can achieve repeatable results for learning processes to train people for repair and maintenance operations, where the driving path is the cost of implementing such tools. The reason to have priority in a low cost/high efficiency goal is the recession that we suffered in 2009 that is highlighted as a great concern since the Aircraft MRO (Maintenance, repair & overhaul) Market Forecast from OAG [12] estimated that the growing rate in this part of the industry will be just 2.3% compared to the 6% per annum in previous years. In the present work we are analyzing a solution to improve the cognitive process of a maintenance operator in the aviation business by using Augmented Reality technology in the assembly of a training kit of an RV-10 airplane that is commercially sold for practicing assembly operations of this aircraft. R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 87–96, 2011. © Springer-Verlag Berlin Heidelberg 2011
88
H. Rios et al.
This paper begins by examining related work and applications of augmented reality systems and how they reacted to controlled environments and in situ prototypes. In the next section we present a detailed description of the components of an AR application using markers and how they are related to each other in our case of study. In the following segment of this work the training tool kit for the airplane RV-10 is presented as the study case and the results of the analysis are offered for the 4 hours assembly training of the RV kit using a low cost system that was created to guide the operator through every operation that includes preparing tools, removing edges, deburring holes, riveting and drilling components. We then present the comparison of results against other technologies tested in our research group; those are the Traditional Technique with print instructions (TTT) and the Multimedia Design Guide (MMDG). The last one is based on the enhancing of the content presented to the student by adding audiovisual tools in order to have better learning and a natural cognitive process.
2 Augmented Reality Background Augmented Reality (AR) has become the center of expectation of many industries that have assembly or maintenance processes because of the feasibility of an AR application. We present in next section the state of the art of AR applications in maintenance as well as different prototypes and technologies used to create them. 2.1 Maintenance Applications of AR Systems In 2004 Zenati and colleagues [15] described a prototype of maintenance system based on distributed augmented reality in industrial context; this research group achieved good results of the virtual overlays, by using optical tracking system and calibration procedures based on computer vision techniques, where the job of the user was to align in the scene the perceived virtual marker with a physical target, they used head mounted display to perform the maintenance task. In other work they presented the UML notation as the option to go for ergonomic and software design process basis for the AR applications through the V-model and then provide the operator for details that cannot be access during a maintenance task. In the other hand in Columbia University, Henderson and Feiner [8] presented the design, implementation, and user testing of a prototype augmented reality application on the United States Marine Corps (USMC) mechanics operating inside an armored vehicle turret. The prototype uses a tracked head-worn display to augment a mechanic´s natural view with text, labels, arrows, and animated sequences designed to facilitated task comprehension and execution, they researches use 10 tracking cameras to achieve ample coverage of the position of the user´s head without concerning to much with disadvantages of adding a large number of cameras to the turret. Some important findings in the research were that the augmented reality condition allowed mechanics to locate tasks more quickly than when using improved version of currently employed methods, and in some instances, resulted in less overall head movement. In [9], the same authors work in testing opportunistic controls (OC) for user interaction technique. This is a tangible user interface that leverages
Augmented Reality: An Advantageous Option for Complex Training
89
properties of an object that determines how can be used; this prototype implementation was able to support faster completion times than those of the baseline. In the case of study presented by Wang [14], the author proposes AR applications for equipment maintenance. The system is based on the projection and detection of infra-red markers to replace the air filter and give maintenance to the gear-box in a milling machine. Later, an interesting approach where Abate and colleagues [0] provide a ASSYST framework, which aims to enhance existing Collaborative Working Environments by developing a maintenance paradigm based on the integration of augmented reality technologies and a tele-assistance network architecture enabling to support radar system operators during system failure detection, diagnosis and fixing. The author uses an approach where, the support for maintenance procedures is provided by means of a virtual assistant visualized into the operational field and able to communicate with human operator through an interaction paradigm close to the way in which humans are used to collaborate, improving the results that a screen based interface can offer. A system involving markerless CAD based tracking was developed by Platonov and colleagues [13]. The application was based on an off the shelf notebook, a wireless mobile setup consisting of a monocular wide-angle video camera and an analog video transmission system; it used well known concepts and algorithms for the AR application and tested for the maintenance of a BMW7 Series engines, resulting in stable behavior. Another example is the purpose of the research done at Embry-Rydle Aeronautical School [6], for analyzing the use of an augmented reality system as a training medium for novice Aviation Maintenance Trainee. Some interesting findings form their job were that AR system could reduce the cost for training and retraining personnel by complementing human information processing and assisting with performance of job tasks. In the work presented by Machhiarella and colleagues [10], AR was used to develop augmented scenes framework that can complement human information processing and such complement can reveal itself in training efficiency applicable to a wide variety of flight and maintenance tasks. During the investigation the authors managed to determine that AR-based learning effects long-term memory by reducing the amount of information forgotten after a seven-day intervening time between an immediate-recall test and long-term-retention-recall test. However, the author makes a very important point with the suggestion that further research is necessary to isolate human variability associated with cognition, learning and application of AR-based technologies as training and learning paradigm for the aerospace industry. Another type of AR system application is shown in the framework SEAR (speechenabled AR) that uses flexible and scalable vision-based localization techniques to offer maintenance technicians a seamless multimodal user interface. The user interface juxtaposes a graphical AR view with a context-sensitive speech dialogue; this work was done by Goosse et. al [5]. Also De Crescenzio and research group [2] implemented an AR system that uses SURF (Speeded-Up Robust Features) that processes a subsampled video of 320x230 at approximately 10 fps making a tradeoff for accuracy and allowing fluid rendering of augmented graphical content. They tested the system in the daily inspections of Cessna C.172P, an airplane that flight
90
H. Rios et al.
schools often use. As shown, many research groups are impacting the advances of the state of the art of the Augmented Reality technology, but there is still much work to do and the biggest concern of industry is the cost that the application can have in real maintenance operations and not controlled work environments. In the following sections an explanation about our case of study for a four hour length maintenance operation with low cost technology is presented. 2.2 AR Components As shown previously there many types of applications with different content for augmented reality. This section will present the different components that share as a common in AR applications, even though some of the components may be overlooked depending of a certain application these will be the most common parts taken into account when developing an AR prototype. The four main components will be the display, tracker, content and marker. Each element has a fundamental role in augmented reality applications and the description is explained further on. • Display. The principal sense stimulated in an AR application is sight. This means that the content must be shown in a display and according to this, the application may change along with the interaction human-machine. Therefore is very important to have a correct display that suits the information that is going to be transmitted. • AR tracking. In order to accurately overlay the content in the real environment it is essential to identify the environment and therefore to update the contents location. That is the main purpose of the tracking software and tools in an AR application are to determine where the marker is, depending of the position, the content may or may no change. This change is measured based in a reference point, which is normally called a marker. • AR content. The information that can be displayed in an AR application is the same content that can be displayed in a computer like videos, music, 3d models, animations.The content is subject to the author´s needs and objetives. • Marker. Is the reference point to track changes in an environment and therefore can be used as a reference to deploy content in an application. There are many types of marker types and technologies; one that has been used more commonly is the geometric marker. One of the advantages if this marker is that it is easy to use and the data processing requirements are affordable with common hardware (e.g. home desktop). For markerless applications the object is used as marker to deploy the AR content.
3 Design of Experiment 3.1 Methodology The methodology followed during the course of the investigation is shown in Figure 1, the first and very important step is to perform a proper research of the state of the art of the technology available for Augmented Reality. After we selected the case of study, from this point all the decisions will take in consideration the study
Augmented Reality: An Advantageous Option for Complex Training
91
case selected, the software selection include the programs required to create 3D models and animations, also it must be considered the software that will receive all the objects created and will compile the application script and runs the application as such. The next step is to create a story board, in the story board the different steps are defined, there is a brief description of the steps, the animation and the screen layout. Using the storyboard as a guide the models and animations are created, concurrently the program script is created to test functions and models. When the application prototype is done a test is perform to check all steps and make any corrections needed to have the final version of the application. The definition of the experimentation can be started right after the study case is selected and simultaneously with the application development. In this part the type of experiment is defined, what are the evaluation parameters, the number of experiments, the location and Fig. 1. Methodology followed to testing profile. The last step in the process is create the AR application for RVgathering all the data for proper analysis and 10 training kit finally we must reach the conclusions from the experiments. The complete integration of our system is what outlines trustable results, this integration may be seeing as a loop of harmonization between the human part of the system, the station, the complete set of components and the tools that are used to complete the task. If any of the components of the loop breaks the equilibrium point for any reason, the AR system will be affected and the task may not be completed with success, for example, having a person who is not willing to work with the AR technology could lead to greater assembly times or the destruction of the system, also having a bad setup of the work environment while using AR will generate stress in the Fig. 2. Interaction loop operator when he tries to fit the technology in the wrong between components of the AR application of media. The loop of interaction of our AR application is RV-10 training kit showed in Figure 2. 3.2 Experiment Description The interaction between the elements shown in Figure 2 is what defines the reliability of our data. We present here the description of all the components of our experiment: Human participation: the test subject is the person with an engineering background we do not present age as a restrain. The AR station is made up of a computer Acer Aspire 3680 with intel Celeron M processor, 1.5 MB DDR2 RAM memory, Intel Graphics media accelerator 950. The laptop was used thinking of the portability of the
92
H. Rios et al.
system. The camera used is a Micro Innovation Basic Camera model IC50C. The markers are black and white design to be recognized by the software, the geometric design is printed in a white sheet of regular paper with a laser printer. Software used to create the application is the effort of a joint collaboration by UPV at Valencia in Spain and Tecnologico de Monterrey (ITESM at Monterrey) in México. This script was written in C-lite language and is run under the compiler Fig. 3. RV-10 AR experiment gamestudio A7 version. The application can run (left) drilling operation, (Right) under a self-executable archive allowing the riveting operation application to run in different computers. The RV training kit was assembled with all the parts and accessories (rivets). This is bought directly from the supplier Vans aircraft aviation. This kit is sold for the training in the basic operation done during the assembly of an aircraft and is integrated by 12 aluminum parts, the kit is shown in figure 3 when the drilling operation and riveting is being performed. The tools needed to perform the assembly correctly are the following: Rivet gun, hand driller, rivet cutter, priming machine, C-clamps, metal cutting snips and deburring tool, an important remark is that the assembly must be done indoor with good illumination. The experiment consisted in one test subject performing the experiment. The person was introduced to the background of the instruction kit. Explanation of the different tools was done along safety instructions. Then an introduction of the software was done were the basic instructions were explained, for example, the operator were trained to keep markers without obstructions if they wanted to play the AR animation, also basic commands with mouse and marker were shown, next the assembly kit is presented and the test subject is allowed to start the work, the trainer answered any questions regarding the AR application but not about the RV assembly kit and the way is supposed to be united.
Fig. 4. (Left) Display of the AR application for RV training kit, (Right) Parts of the kit that form the assembly
The AR application for RV kit is form by 12 steps where 11 of them are manual operations that include preparing the tools, removing edges, deburring holes, riveting and drilling components. The list of operation for each block is displayed at the side of the Figure with a brief description of each operation.
Augmented Reality: An Advantageous Option for Complex Training
93
4 Experiment Results According to the design of experiment (DOE) theory any process can be divided in certain parts in order to study and understand their impact to the process. A process has inputs which are transformed into outputs by a configuration of parameters. The parameters that can be modified in order to change the output of the process are called factors. The factors should be parametric in order to measure the changes made and thus the difference in the output according to Guerra [11]. The factors can have different configurations or different values within the process to change the output, these configurations are called levels and each one or more of the factors can have different levels which are tested during the experiment. In the case of the experiment performed and according to the theory of design of experiments the experiment can be described like this. There is one factor evaluated during the experiment, it Fig. 5. RV-10 Toolkit assembly process is the method used to transfer the flow diagram knowledge. The different configurations available for this factor are Traditional Teaching Technique (TTT), Multimedia Design Guide Method by Clark [3], (MMDG: usage of audio and video to enhance learning) and AR methods, therefore this are the levels. This means the process have one factor with 3 levels. The outputs of the process measured during the experiments were the time, errors and questions. These 3 outputs are the 3 quantifiable parameters. Another aspect of the experiment is where it physically took place; the location of the experiment was the Manufacturing Lab specifically in the designated work area for the RV Project, because it has enough space and provides good conditions to perform the experiment. The testing subjects, in this case the users, are a total of 7 male engineering students. None of the students have any prior knowledge of the assembly, the process or the tools used for the instruction kit. The users were all appointed individually for the experiments; they were under the supervisor who introduces them to the augmented reality instruction kit and its interface. The supervisor details to the user the kit’s control commands and the different types of content (text, video, 3D) displayed. The supervisor makes an introduction of all the tools the user will use during the assembly. Once the time starts running the users begin with the adaptation process of the instruction kit interface, this time is included in the total time of the assembly, and then start with all the steps in the guide. The supervisor is with user at all times and is allowed to answer any question made. The supervisor will help the user in certain operations of the guide because these operations must be done by 2 people; nonetheless the supervisor is an objective assistant and helps the user according to his instructions.
94
H. Rios et al.
The supervisor will be in charge of gathering data for the questionnaire, the time, number of errors and number of questions is collected by the supervisor during the assembly. Afterwards, when the user finishes the assembly, the second part of the questionnaire will be completed with 4 open questions. The results of the experiment are presented in two categories, first one the quantifiable and the second one the qualitative. The quantifiable data is shown in numeric data grouped in table 1. The three measured parameters were: the time that the person took to complete the assembly, this parameter pretend to establish if augmented reality improves the understanding of assembly by the user allowing to make it faster. The errors found in the assembly that include from the misplacement of a part to a bad riveting or misalignment. The third parameter is the questions made by the user that could not be answered with the AR guide. Table 1. Results from the AR method Experiment 1 2 3 4 5 6 Total average
Time (min) 248 210 236 284 197 166 223.5
Error (qty) 3 2 2 2 0 0 1.5
Questions (qty) 4 3 3 5 1 2 3
The analysis of the data extracted from the experiment, began with a goodness of fit test to check if the sample follows a normal distribution. The KolmogorovSmirnoff normality test with a confidence level of 95% was done to validate the sample. The test was done using the Minitab tools for data analysis. These tests were done for the 3 quantitative parameters, Time, Errors and Questions. The results show that the parameters follow a normal distribution except the error parameter. We compare our data against the TTT and MMDG method that was tested before the AR application in [11] by M. Guerra, the results showed in Figure are normalize to 1 for the maximum value taken for each parameter, for the time recorded to complete the assignment for the AR and MMDG was 223.5 and 218 minutes respectively giving us a difference of 4.5 minutes, that fall between the confidence interval done by the t-test sample data (95% for this parameter) therefore it can be assume that both methods have a similar time of assembly but the data shows a difference from the TTT method with 243 minutes . From the errors comparison we conclude that AR represent the option to go in aeronautical assembly with only 1.5 errors on average against TTT and MMDG with 2 and 6 errors. The number of questions was the parameter that showed the biggest difference in the comparison, the question made by the user in the AR (3 questions) method were 2.3 times lower than the MMDG method (7 questions), this indicates that the users have more information available thus there is lesser need for external assistance which is great for 223 minutes assignment, for the TTT method we got 5 times more questions when compare to the AR method, this is a reflection of the understanding of the assembly and better performance of the steps.
Augmented Reality: An Advantageous Option for Complex Training
95
It is important to emphasize that the statistical study is in process and will be presented in future work, but the results that we obtain until this moment showed an excellent tendency for the AR technology when compare to other teaching techniques, we are in the process to obtain the sample needed to get 95% of the confidence interval in all the parameters, the data presented Fig. 6. Comparison of results to complete assembly between AR, in this work will help to get that number because MMDG and TTT method it give us a perspective of the variation of the assembly process in real conditions. For the qualitative results we selected two main topics to address, the impact in the motivation factor by the use of augmented reality as well as the easiness factor. These topics were evaluated using open questions to the users in a questionnaire. In most cases the users were initially motivated by the new technology, they were eager to use it and thus in the first minutes of the experiment they tend to focus in how to use to use the Augmented Reality rather than the experiment per se. The initial motivation by users was overcome in some cases by the length of the assembly; especially in cases were the assembly took more than 4 hours. All users considered the augmented reality a tool that made it easier to understand the instruction kit, but also commented that the control and display tools can be improved and change to mouse control instead of marker control.
5 Conclusions We developed an AR system for aeronautic maintenance and training using a regular laptop computer to show that this technology can be used without high capacity video cards and demonstrate the feasibility in aircraft training. During the experiments the users of the AR application had a better appreciation of spatial and depth concept; this means that during the assembly, when involving smaller parts in the initial steps it was easier for them to understand and to perform the correct step. In comparison with the multimedia method, the pictures are in some cases insufficient to explain completely a step. Users have a period of assimilation of the new technology, the interface is easy to understand, yet it needs improvement to completely fulfill the user necessities, especially regarding the control of the 3d models and animations. When manipulating the controls of the interface in every occasion the users preferred the mouse over the markers for control. We conclude that users perform the steps with a higher grade of confidence, although they are not always right, they are able to generate a complete picture/idea in their mind and replicate it physically, in the other hand using MMDG the users still have doubts about procedure and/or parts thus performing with an amount of doubt thus lowering the confidence of the user in the assembly done. As shown, the number of operators chosen as test subjects showed good results but the intention on this paper is to present the progress of the work done until this moment and the tendency of the results when compared to the Traditional Testing Method during a 4 hour maintenance task, now that we have data from real work
96
H. Rios et al.
conditions and the point of view of different background persons, we are able to estimate the sample test needed to present all the parameters with 95% of confidence interval and demonstrate that our sample is representative of the population in the MRO for the aviation industry, this results will be presented in future work.
References 1. Abate, A., Nappi, M., Loia, V., Ricciardi, S., Boccola, E.: ASSYST: Avatar baSed SYStem maintenance. In: Radar Conference (2008) 2. De Crecenzio, F., Fantini, M., Persiani, F., Stefano, L.D., Azzari, P., Salti, S.: Augmented Reality for Aircraft Maintenance Training and Operations Support 3. Clark, W.: Using multimedia and cooperative learning in and out of class. In: Frontiers in Education Conference Worcester Polythecnic Institute, MA 4. Ghadirian, P., Bishop, I.D.: Integration of augmented reality and GIS: A new approach to realistic landscape visualisation. Landscape and Urban Planning 86, 226–232 (2008) 5. Goose, S., Sudarsky, S., Zhang, X., Nabad, N.: Speech-enabled augmented reality supporting mobile industrial maintenance. Pervasive Computing, 65–70 (2003) 6. Haritos, T., Macchiarella, N.: A mobile application of augmented reality for aerospace maintenance training. In: Digital Avionics Systems Conference DASC (2005) 7. Henderson, S.J., Feiner, S.: Evaluating the benefits of augmented reality for task localization in maintenance of an armored personnel carrier turret. In: 8th IEE International Symposium on ISMAR 2009, pp. 135–144 (2009) 8. Steven, H., Feiner, S.: Exploring the Benefits of Augmented Reality Documentation for Maintenance and Repair. IEEE Transactions On Visualization And Computer Graphics X (2011) 9. Steven, H., Feiner, S.: Opportunistic Tangible User Interfaces for Augmented Reality. IEE Transaction on Visualization and Computer Graphics (2010) 10. Machiarella, N.D.: Effectiveness of video-based augmented reality as a learning paradigm for aerospace maintenance training (2004) 11. Moreno, G., Miguel, A.: Applying Knowledge Management and Using Multimedia for Developing Aircraft Equipment (Master degree thesis, ITESM) (2008) 12. OAG Aviation. June 11 (2009), http://www.oagaviation.com/News/ Press-Room/Air-Transport-Recession-Results-in-3-Years-ofLost-MRO-Market-Growth (accessed 2011) 13. Platonov, J., Heibel, H., Meier, P., Grollmann, B.: A mobile markerless AR system for maintenance and repair. In: Mixed and Augmented Reality ISMAR 2006, pp. 105–108 (2006) 14. Wang, H.: Distributed Augmented Reality for visualization collaborative construction task (2008) 15. Zenati, N., Zerhouni, N., Achour, K.: Asistance to maintenance in industrial process using an augmented reality system. Industrial Technology 2, 848–852 (2004)
Enhancing Marker-Based AR Technology Jonghoon Seo, Jinwook Shim, Ji Hye Choi, James Park, and Tack-don Han 134, Sinchon-dong, Seodaemun-gu, Seoul, Korea {jonghoon.seo,jin99foryou,asellachoi, james.park,hantack}@msl.yonsei.ac.kr
Abstract. In this paper, we propose a method that solves both jittering and occlusion problems which is the biggest issue in marker based augmented reality technology. Because we adjust the pose estimation by using multiple keypoints that exist in the marker based on cells, we can predict the strong pose on jittering. Additionally, we can solve the occlusion problem by applying tracking technology. Keywords: Marker-based AR, Augmented Reality, Tracking.
1 Introduction Augmented reality(AR) technology combines digital information with real environment and provides insufficient information in real world.[1] Unlike virtual reality(VR) technology which substitutes the real environment with the graphics produced by a computer, AR technology annotates onto the real environment [2]. In order to implement augmented reality technology, many component technologies are needed – Detection, Registration, Tracking and so on. To implement practically, Registration and Tracking technology should be researched. Marker based tracking which uses visually fiducial marker in real world, offers more robust Registration and Tracking quality. Although this technology uses visually obtrusive marker in real world, it can offer more precise and faster tracking quality. So it is used in various commercial AR applications. Conventional marker-based tracking, however, has used minimum number of features to estimate camera pose, so it is fragile by noise. Also, it has performed detection in every frame without tracking technology. So it could not perform augmenting when detection is failed. In this paper, to overcome these marker-inherent problems, we propose a hybrid method with marker-less tracking technology. In marker-less tracking, multiple features are used to estimate camera pose, so it provides more robust estimation than marker-based tracking. Also, marker-less tracking adopts tracking technology, it can continue augmenting even detectionfailure condition. In this paper, we adopt those marker-less tracking technology in marker-based tracking technology. We used multiple feature points in marker-based tracking to provide more robust against noise, and implemented feature-tracking method to avoid detection failure condition. R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 97–104, 2011. © Springer-Verlag Berlin Heidelberg 2011
98
J. Seo et al.
In chapter 2, the related existing marker-based AR technology is described. We will explain the proposed method in chapter 3 and method which solves the jittering problem in chapter 3.1. We will explain the method that can solve occlusion problem in chapter 3.2. Chapter 4 and 5 shows the results obtained though experiments using our method and conclusion, in respectively.
2 Related Works The study of augmented reality marker based study was researched actively because of the advantages of correctness and the speed. The marker that was developed with various type of shape such as circle or LED finally settled to the squared typed marker and it has been researched from many researchers after Matrix[3] & ARToolkit[4] was developed. These markers calculate recognizes the contour of the square and identify it as the marker after the inside of the pattern is acceptable. It uses the augmenting 3D object method with the 3D pose estimation by calculating the perspective distortion of four vertexes. For these markers, the ARToolKit[4] that was invented at HITLab is most famous in HCI field. ARTag[5] was developed with improving the ARToolKit’s recognition performance. In addition, T.U.Graz developed the Unobtrusive Marker[6] and solved the unrealistic problem with the marker. GIST developed the Simple Frame Marker[7] to increase the readability of marker. However, these researches have a purpose to improve the performance of the marker itself.
Fig. 1. Various AR Marker Systems
However, the problems of augment reality based on marker still existed even through these systems. In the Augment Reality based on marker, it predicts the 3D
Enhancing Marker-Based AR Technology
99
pose by using the specific limited numbers of points so the change of pose with video noises and there were jittering problems existed because of that. To solve this problem, there were some cases using the signal processing theory.[8] Nevertheless, these methods have a problem with the big amount of calculation and the unnaturalness of movement since it uses the history information to reduce the effect of noises. Additionally, without any tracking process, the Instability problem[9] which the object disappears even though marker exists when the marker fails the detection since it calculates the code area about the area where it is detected. This problem is considered as a serious problem on Interaction when the marker is covered with a hand. In marker-less tracking, there are not exist these problems. Since markerless tracking technology uses many numbers of specific points for pose prediction and matching, the noise effect decreases through the average effect.[10] In addition, to improve the slow velocity during the process of matching, the tracking process applies and tracking can be provided strongly in the Detection Fail situation such as occlusion through many specific points and tracking technology.
3 Proposed Method We propose a robust pose estimation method which uses multiple keypoints and an occlusion-reduction method which uses feature-tracking. 3.1 Jittering Reduction Method The augment reality analyses the media, calculates the pose & the location of camera on World Coordinate System, and moves the object which will be augmented by moving the Graphics Camera based on this. For this, the augment reality extracts the feature points from video and estimates the camera’s location and pose based on these feature points. As mentioned above, in previous marker tracking technology, it predicts the pose and location of camera by using only four vertexes of marker’s most outer side. Because of video noise, the result of prediction can be changed and the jittering problem of augmented object occurred through this. The suggesting method reduces jittering through average effect by using the more feature points and hybrids the markerless tracking technology. Especially, many of the good points to select as the feature exists since each cell that are designed as digital cell (i.e. ARTag, ARToolKit Plus, etc) have a clear boundary. The suggesting method predicts the pose by using the boundary of cell as the feature points.
Fig. 2. Feature points which used to estimate the camera pose. Black discs represent conventional keypoints, and which dots are additional keypoints we adopted
100
J. Seo et al.
However, to predict thee pose with additional points, the marker-coordinatee of additional feature points have h to be calculated. For this, the following steps are processed.
Fig. 3. Multiplle keypoints based robust pose estimation process
It performs the corner detection on the area where it recognized as the markerr. In this paper, we used [11] method. m With this method, we can find the points that are strong in Cornerness on thee image. (1) Additionally, it calculatess the image coordinates of Ideal Corners in area tthat recognized as the marker. Ideal I Corners are points that are located in the boundaaries of cell in marker. It can be appeared on corner or edge or inside area depends on the ID. The marker- coordinatees of Ideal corners are already known. The Ideal corners are transformed depends on th he area of marker we can obtain the ideal corners that are transformed by the pose. (2) With matching the cornerrs(Pc) and ideal corners(Pi) that are calculated with this method, we calculate the filtered f keypoints(Pf). These filtered keypoints are stroong with cornerness on the imaage and able to know the marker coordinates that cann be mapped. (3)
Enhancing Marker-Based AR Technology
101
Since these calculated feature points can know the image coordinate and marker coordinate, it can be used with pose prediction. With using the multiple points, it can predict the pose and can extract the pose prediction feature points 3.2 Robust Marker Tracking Method The previous augmented reality based on markers augments an object by finding the marker with performing standard detection methods. These methods have disadvantages of disappearance of object because of detection failure, and the failure is occurred by several reason - e.g. occlusion, shadow, light change, etc. We solved this problem by adopting the feature tracking technology. The augmented reality toolkit operates in the input image, detects the marker, and keeps the detected feature points as the keypoints. When the detection failure occurs while the tracking, the KLT feature tracking algorithm is performed based on those keypoints already kept before, and pose is estimated by those tracked keypoints. Because this process is performed after a marker is detected in previous frame, the system can obtain the ID of marker and needs only feature points of the marker, so the augmentation can be worked continuously. Figure 4 shows the previous method (left), and the case when we keep tracking (right) when failure occurs.
Fig. 4. Feature tracking process to overcome detection failure. Left comparison is same with conventional method, and the right comparison shows tracking method when failure occurs.
4 Experiments To prove reduction jittering, we measured error of pose in 100 frames when the marker and the camera were still. ARToolKit adopts none jitter reduction method. Also ARToolKit Plus uses more robust pose estimation algorithm. Table 1. Average error of pose in 100 still frames ARToolKit 3.24
ARToolKit Plus 2.76
Proposed Method 0.98
102
J. Seo et al.
ARToolKit shows the most amount of error in 100 frames, and ARToolKit Plus provides less error. They, however, still larger than proposed method, because they took effect by image noise. In proposed method, we still suffer effect of noise, but the amount is less than others. Also, to prove overcoming the occlusion problem, we performed the experiment from the paper of [9]. In [9], it called the case of failing the augmenting with occlusion in augment reality based on marker as corner case and defined as figure 5. In ARToolKit cases, it failed even though it is only occluded the one edge and ARTag failed when two side of edges were occluded.
Fig. 5. Occlusion Corner Cases[9]. When edges are occluded, detection goes failure.
We applied the proposed method about those corner cases.
Fig. 6. Result of proposed method. It provides robust to occlusion.
Enhancing Marker-Based AR Technology
103
When we applied suggesting method, the marker was strongly does the tracking even though both edges were covered. However, since it does the tracking the feature points, there were some problems with the distorted shaped object when the feature points moves. This has to be solved as the future work.
Fig. 7. Future work. When the tracking point is moves, augmented object is distorted.
5 Conclusion In this paper, we have proposed two methods to overcome marker-based AR inherent problems – i.e. jittering and occlusion instability. When marker based AR is developed to commercial application, these problems are serious. We have solved these problem by hybrid with marker-less tracking technology. We implemented a multiple point based pose estimation method to reduce jitter, and adopted feature tracking method to overcome detection failure. In ordinary condition, they worked well, but, in some special condition, they need to be improved more. Acknowledgement. This work (2010-0027654) was supported by Mid-career Researcher Program through NRF grant funded by the MEST.
References 1. Azuma, R.T.: A survey of augmented reality. Presence: Teleoperators and Virtual Environment 6(4), 355–385 (1997) 2. Milgram, P., Takemura, H., Utsumi, A., Kishino, F.: Augmented Reality: A class of displays on the reality-virtuality continuum. In: Proceedings of Telemanipulator and Telepresence Technologies, pp. 34–2351 (2007) (retrieved) 3. Rekimoto, J.: Matrix: A Realitime Object Identification and Registration Method for Augmented Reality. In: Proceeding of APCHI 1998, pp. 63–68 (1998) 4. Kato, H., Billinghurst, M.: Marker Tracking and HMD Calibration for a Video based Augmented Reality Conferencing System. In: Proceeding of IWAR 1999, pp. 85–94 (1999) 5. Fiala, M.: ARTag, a fiducial marker system using digital techniques. In: Proceedings of Computer Vision and Pattern Recognition, vol. 2, pp. 590–596 (2005) 6. Wagner, D., Langlotz, T., Schmalstieg, D.: Robust and unobtrusive marker tracking on mobile phones. In: Proceedings of ISMAR 2008, pp. 121–124 (2008) 7. Kim, H., Woo, W.: Simple Frame Marker for Image and Character Recognition. In: Proceedings of ISUV 2008, pp. 43–46 (2008)
104
J. Seo et al.
8. Rubio, M., Quintana, A., Pérez-Rosés, H., Quirós, R., Camahort, E.: Jittering Reduction in Marker-Based Augmented Reality Systems. In: Gavrilova, M.L., Gervasi, O., Kumar, V., Tan, C.J.K., Taniar, D., Laganá, A., Mun, Y., Choo, H. (eds.) ICCSA 2006. LNCS, vol. 3980, pp. 510–517. Springer, Heidelberg (2006) 9. Lee, S.-W., Kim, D.-C., Kim, D.-Y., Han, T.-D.: Tag Detection Algorithm for Improving a Instability Problem of an Augmented Reality. In: Proceedings of ISMAR 2006, pp. 257– 258 (2006) 10. Wagner, D., Schmalstieg, D., Bischof, H.: Multiple Target Detection and Tracking with Guaranteed Framerates on Mobile Phones. In: Proceedings of ISMAR 2009, pp. 57–64 (2009) 11. Harris, C., Stephens, M.: A Combined Corner and Edge Detector. In: Fourth Alvey Vision Conference, Manchester, UK, pp. 147–151 (1998)
MSL_AR Toolkit: AR Authoring Tool with Interactive Features Jinwook Shim, Jonghoon Seo, and Tack-don Han Dept. of Computer Science, Yonsei University, 134, Sinchon-dong, Seodaemun-gu, Seoul, Korea {jin99foryou,jonghoon.seo,hantack}@msl.yonsei.ac.kr
Abstract. We describe an authoring tool for Augmented Reality (AR) contents. In recent years there have been a number of frameworks proposed for developing Augmented Reality (AR) applications. This paper describes an authoring tool for Augmented Reality (AR) application with interactive features. We developed the AR authoring tool which provides Interactive features that we can perform the education service project and participate it actively for the participating education service. In this paper, we describe MSL_AR Authoring tool process and two kinds of interactive features. Keywords: Augmented Reality, Authoring, interaction.
1 Introduction Continuous development in IT technology has been the main driving force of the new advancement and change throughout the whole society. The fields of education that requires high quality educational content service cannot be an exception. The educational environment of the future is expected to bring the learners to a participating education service where they actively join the service to show their individual creativity, out of the existing unidirectional educational service where the users only watch and listen to the given contents passively. Without realistic interface for wide participation, today`s educational environment, however, finds it hard to break-through the limitations of the existing educational pattern. This paper focuses on a new way of education environment for user participation combining underlying Augmented Reality technologies which grant both educators and learners easy participation. We are suggesting the new type of participating educational environment by using Augmented Reality (AR) so that both educators and learners can participate easily. We can provide guidelines about complicated sequence or the process of endanger experiment by using Augmented Reality (AR) with providing the virtual information which can be hard to observe and recognize the situation. In addition, we provide the educational effect with performing the tangible experiment by using the interaction technology. We are suggesting the Augmented Reality (AR) tool that provides two kinds of interactive features. R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 105–112, 2011. © Springer-Verlag Berlin Heidelberg 2011
106
J. Shim, J. Seo, and T.-d. Han
2 Related Work Current situation of AR Authoring Tools have too short history and lack of common application compare with 3ds MAX, MAYA. AR Authoring tool has a major ripple effect on technically yet. [1] Table 1. Types of Authoring tools[1]
Low level High level
Programmers ARToolKit arTag[4] Studierstube osgART
Non-programmers DART ComposAR AMIRE MARS
ARToolKit[2] is an open-source library which provides computer vision based tracking of black square markers. However, to develop an AR application with ARToolKit requires further code for 3D model loading, interaction techniques, and other utility functions. This authoring tool is very simple structure and decoding algorithm. But, must load marker file and correlate for every marker to be detected. ARToolKit requires the developer to have c/c++ skills and needs to link with graphics and utility libraries. [2][3] osgART is mainly based on the OpenSceneGraph framework, delivering them a pre-existing large choice of multimedia content but also the ability to import from professional designing tools (e.g. Maya, 3D Studio Max). The Tracking is mainly based on computer vision, using ARToolKit (extended also for supporting inertial tracking). [5][6][7] One of the first AR authoring tools to support interactivity is DART [9], the Designer`s ARToolKit, which is a plug-in for the popular Macromedia Director software. The main aim of DART is to support application designers. DART is built to allow non-programmers to create AR experiences using the low-level AR services provided by the Director Xtras, and to integrate with existing Director behaviours and concepts. DART supports both visual programming and a scripting interface. [8][9] composAR is a PC application that allows users to easily create AR scenes. It is based on osgART [5], and it is also a test beb for the use of scripting environments for OpenSceneGraph, ARToolKit and wxWidgets. [10][11] AMIRE is an authoring tool for the efficient creation and modification of augmented reality applications. The AMIRE framework provides an interface to load and to replace a library at runtime and uses visual programming techniques to interactively develop AR applications. AMIRE is designed to allow content experts to easily build application without detailed knowledge about the underlying base technologies. [12][13]
MSL_AR Toolkit: AR Authoring Tool with Interactive Features
107
3 MSL_AR Toolkit In this section we describe the prototype of a MSL_AR toolkit process. Our goal is to develop a low-level tool that will allow programmers to build AR contents with interactive features. The users can modify configure file that needs for interaction on AR content through GUI.
Fig. 1. MSL_AR toolkit authoring tool process
Figure 1 is a diagram of a MSL_AR toolkit authoring tool overall process. The first step is to input the information to create a simple AR content through User GUI. After that it creates AR content about entered information through main frame of MSL_AR toolkit. The AR content that came out as the result of MSL_AR toolkit works with interactive feature which specified by the user. 3.1 MSL_AR Toolkit Process MSL_AR toolkit authoring tool requires coding and scripting skills with programming knowledge. This authoring tool provides main function of engine and DLL library basically. Configure file determines markers` ID and the interaction methods, etc. Authoring tool that has been developed provides interaction that uses marker occlusion and marker merge methods. Figure 2 shows a flowchart of MSL_AR toolkit authoring tool. User sets the configure file through interface. And then, user implements the MSL_AR toolkit, it creates the AR content by applying the configure file in the stage of preprocessor. When the process activates, it will find marker in the inputted video from camera and augments the object above the marker. It interacts by interactive feature with the information that user input.
108
J. Shim, J. Seo, and T.-d. Han
Fig. 2. MSL_AR toolkit flowchart
Figure 3 shows a main structure of MSL_AR toolkit authoring tool. “1class” of Figure 3 is a basic class to create AR content. “4 tracking” part is a detection and pose estimation module of AR process. The most important parts are “2initialize” and “5return” part of MSL_AR toolkit`s main structure. The "2initialize" is a part that applies the configure file which user inputted through interface in process, and the “5return” is a part that shows the AR content with interactive feature.
Fig. 3. MSL_AR toolkit authoring tool Main Structure
MSL_AR Toolkit: AR Authoring Tool with Interactive Features
109
3.2 Configure Setting Figure 4 is a GUI picture that can register the object which is connected to marker and method that users can interact to marker in MSL_AR toolkit.
Fig. 4. MSL_AR toolkit configure set up GUI
The first item of Figure 4 designates marker ID. The second item designates interactive features. The third item designates the 3D object which is connected to marker. The fourth item reduces or adds ID of marker that will be used in contents. Through this interface, user can set up the AR content interaction comfortably. Figure 5 shows the configure file content of MSL_AR toolkit that users can input through interface. #the number of marker 2 #the number of interaction patterns to be recognized 2 #marker 1 1 40.0 0.0 0.0 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 #marker 2 2 40.0 0.0 0.0 1.0000 0.0000 0.0000 50.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 #select the method to be used in interactions 1 occlusion #if interaction result object Data/patt.H20 80.0 0.0 0.0
Fig. 5. Sample of MSL_AR toolkit configure file
110
J. Shim, J. Seo, and T.-d. Han
First of all, “the number of marker” means the number of marker that users will use. The second “the number of interaction patterns to be recognized” means the number of markers that will be used in interaction. The third “marker 1” and “marker 2” mean marker ID, size of marker, relative coordinates of marker. The fourth “select the method to be used in interactions” means interactive feature that will be used in interaction. The fifth “if interaction result object” means the information of object that appears when interaction happens between markers. 3.3 Interactive Features MSL_AR toolkit authoring tool provides two kinds of interactive features. One of them is an occlusion method that performs when marker is covered. And the other is a merge method that performs when two markers are located closely. Occlusion. Figure 6 shows an occlusion interactive feature of MSL_AR toolkit authoring tool when marker was covered by user`s finger. The left figure of Figure 6 shows two markers that registered by user. The right figure shows occlusion interactive feature of MSL_AR toolkit. One marker augments the square object as a result of interaction on another marker when a marker is covered by user's finger.
Fig. 6. Occlusion Interactive feature
User selects the GUI 2nd item of Figure 4 as “occlusion” and selects the marker that will augment the object on the 3rd item. 4th item of GUI is an object that will augment above the marker. After finishing the inputting the GUI, these data are entered into “#select the method to be used in interactions” and “#if interaction result object” of Figure 5`s configure file. Merge. Figure 7 shows a merge interactive feature of MSL_AR toolkit authoring tool when two markers are located closely. The right figure is a result that is merge interactive feature of MSL_AR toolkit. When two markers merged, “Oxygen” marker augments the H20 molecule object as a result of interaction.
MSL_AR Toolkit: AR Authoring Tool with Interactive Features
111
Fig. 7. Merge Interactive feature
User selects the GUI 2nd item of Figure 4 as “merge” and the rest of the item are selected as above.
4 Conclusion The existing limitation of Augmented Reality authoring tool is fixed and provided restricted contents. Thus, there is no more additional information. The function of Augmented Reality based offers additional virtual information; that is actually difficult to observe, recognizes the situation and provides guideline about experimental procedure or order, etc. Also through interaction technology, MSL_AR toolkit increases the educational effectiveness and can be performed effectual, tangible experiments using marker. Non-programmer can produce contents easily by developing improved GUI from future work and has to study interaction method which is a variety of location-based marker. And much more active and efficient to produce AR contents, we have to make study additional interactive features such as keyboard listener and will be able to control phase of contents. Acknowledgement. This work (2010-0027654) was supported by Mid-career Researcher Program through NRF grant funded by the MEST.
References 1. Wang, Y., Langlotz, T., Bilinghurst, M., Bell, T.: An Authring Tool for Mobile Phone AR Environments. In: NZCSRSC 2009 (2009) 2. Azuma, R.: A survey of Augmented Reality. Presence: Teloperators and Virtual Environments 6(4), 355–385 (1997) 3. http://www.hitl.washington.edu/artoolkit/ 4. Fiala, M.: ARTag, a fiducial marker system using digital techniques. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 592, pp. 590–596 (2005) 5. http://www.cc.gatech.edu/ael/resources/osgar.html 6. http://www.artoolworks.com/community/osgart/ 7. Grasset, R., Looser, J., Billinghurst, M.: OSGARToolKit: tangible + transitional 3D collaborative mixed reality framework. In: Proceedings of the 2005 International Conference on Augmented Tele-Existence, ICAT 2005, Christchurch, New Zealand, December 05-08, vol. 157, pp. 257–258. ACM, New York (2005)
112
J. Shim, J. Seo, and T.-d. Han
8. http://www.gvu.gatech.edu/dart/ 9. MacIntyre, B., Gandy, M., Dow, S., Bolter, J.D.: DART: a toolkit for rapid design exploration of augmented reality experiences. In: Marks, J. (ed.) SIGGRAPH 2005, ACM SIGGRAPH 2005 Papers, Los Angeles, California, July 31-August 04, pp. 932–932. ACM, New York (2005) 10. http://www.hitlabnz.org/wiki/ComposAR 11. Dongpyo, H., Looser, J., Seichter, H., Billinghurst, M., Woontack, W.: A Sensor-Based Interaction for Ubiquitous Virtual Reality Systems. In: International Symposium on Ubiquitous Virtual Reality, ISUVR 2008, pp. 75–78 (2008) 12. http://www.amire.net/ 13. Grimm, P., Haller, M., Paelke, V., Reinhold, S., Reimann, C., Zauner, R.: AMIRE authoring mixed reality. In: The First IEEE International Workshop Augmented Reality Toolkit, p. 2 (2002)
Camera-Based In-situ 3D Modeling Techniques for AR Diorama in Ubiquitous Virtual Reality Atsushi Umakatsu1 , Hiroyuki Yasuhara1, Tomohiro Mashita1,2 , Kiyoshi Kiyokawa1,2, and Haruo Takemura1,2 1
Graduate School of Information Science and Technology, Osaka University, 1-5 Yamadaoka, Suita, Osaka 565-0871, Japan 2 Cybermedia Center, Osaka University, 1-32 Machikaneyama, Toyonaka, Osaka 560-0043, Japan {mashita,kiyo,takemura}@ime.cmc.osaka-u.ac.jp
Abstract. We have been studying an in-situ 3D modeling and authoring system, AR Diorama. In the AR Diorama system, a user is able to reconstruct a 3D model of a real object of concern and describe behaviors of the model by stroke input. In this article, we will introduce two ongoing studies on interactive 3D reconstruction techniques. First technique is feature-based. Natural feature points are first extracted and tracked. A convex hull is then obtained from the feature points based on Delaunay tetrahedralisation. The polygon mesh is carved to approximate the target object based on a feature-point visibility test. Second technique is region-based. Foreground and background color distribution models are first estimated to extract an object region. Then a 3D model of the target object is reconstructed by silhouette carving. Experimental results show that the two techniques can reconstruct a better 3D model interactively compared with our previous system. Keywords: AR authoring, AR Diorama, 3D reconstruction.
1
Introduction
We have been studying an in-situ 3D modeling and authoring system, AR Diorama [1]. In the AR Diorama system, a user is able to reconstruct a 3D model of a real object of concern and describe behaviors of the model by stroke input. Being able to combine real, virtual and virtualized objects, AR Diorama has a variety of applications including city planning, disaster planning, interior design, and entertainment. We target smart phones and tablet computers with a touch screen and a camera as a platform of a future AR Diorama system. Most augmented reality (AR) systems to date can play only AR contents that have been prepared in advance of usage. Some AR systems provide in-situ authoring functionalities [2]. However, it is still difficult to handle real objects as part of AR contents on demand. For our purpose, online in-situ 3D reconstruction is necessary. There exist a variety of hardware devices for acquiring a 3D model of a real object in a short time such as real-time 2D rangefinders. However, a special R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 113–122, 2011. c Springer-Verlag Berlin Heidelberg 2011
114
A. Umakatsu et al.
hardware device is not desired in our scenario. In addition, acquiring geometry of an entire scene is not enough. In AR Diorama, we would like to reconstruct only an object of interest. Introducing a minimal human intervention is a reasonable approach for this segmentation problem since AR Diorama inherently involves computer human interaction. On the other hand, single camera-based interactive 3D reconstruction techniques have been intensively studied recently in the literatures of augmented reality (AR), mixed reality (MR) and ubiquitous virtual reality (UVR) [3,4,5,6]. These techniques only require a single standard camera to extract a target model geometry.
2
AR Diorama
Figure 1 shows an overview of the AR Diorama system architecture [1]. In the following, its user interaction techniques and a 3D reconstruction algorighm are briefly explained.
SLAM
Feature Points
+
Camera Pose
Reconstruction
Virtual Scene
Stroke Input
Stroke
Positioning / Editing
Camera
Mouse
User Interface Module
Legend
AR Diorama
Scene Editing Module
Process
Data
Device
Fig. 1. AR Diorama system architecture [1]
2.1
User Interaction
AR Diorama supports a few simple stroke input-based interaction techniques to reconstruct and edit the virtual scene. First, a user needs to specify a stage, on which all virtual objects are placed, by circling the area of interest on screen. Then a stage is automatically created based on 3D positions of feature points in the area. Then the user is able to reconstruct a real object by again simply circling it. The polygon mesh of the reconstructed model is composed of feature points in the circle and the input image as texture. The reconstruction algorithm is described in more detail below. The reconstructed object is overlaid onto the original real object for later interaction. As the polygon mesh has only surfaces that are visible in the input image, the user will need to see the object from different angles and circle it again to acquire a more complete model. The reconstructed model can be saved to a file and loaded for reuse.
Camera-Based In-situ 3D Modeling Techniques for AR Diorama
115
Fig. 2. Stroke-based scene editing in AR Diorama [1]
Once the model creation is done, the user is able to translate, rotate and duplicate the model. Translation is performed by simply draw a path from a model to the destination. Rotation is performed by drawing an arc whose center is on a model. Figure 2 shows an example of translation operation. 2.2
Texture-Based Reconstruction
In the AR Diorama system, a texture-based reconstruction approach has been used [1]. Natural feature points in the object region are first extracted and tracked using the open-source PTAM (parallel tracking and mapping) library [7]. A polygon mesh is created from 3D positions of the feature points by using 2D Delaunay triangulation calculated from the corresponding camera position. When the next polygon mesh is created from a different viewpoint, they are merged into a single polygon mesh. At this time, a texture-based surface visibility test is conducted and a false surface will be removed. That is, if a similarity between an apperance of a surface in a new mesh from the corresponding viewpoint and its corresponding appearance of a surface in the current mesh rendered from the same viewpoint using a transformation matrix between the two viewpoints is lower than a threshold, that surface is considered false and removed. This is an easy-to-implement approach, however, the reconstruction accuracy is not satisfactory at all, mainly due to the limitation of 2D Delaunay triangulation. Examples of reconstructed models can be found in the middle column of Figure 3. To improve the model accuracy, we have implemented two different in-situ 3D reconstruction techniques inspired by recent related work, which we will report in next sections.
3
Feature-Based Reconstruction
First 3D reconstruction technique we have newly implemented is a feature-based one inspired by ProFORMA proposed by Pan et al [4]. In the following, its implementation details and some reconstruction results are described in order. 3.1
Implementation
Natural feature points in the scene are first extracted and tracked using the opensource PTAM (parallel tracking and mapping) library [7]. A PTAM’s internal variable outlier count is used to exclude unreliable feature points.
116
A. Umakatsu et al.
A Delaunay tetrahedralisation of the feature points is obtained using CGAL library. At this stage, a surface mesh of the obtained polygon is a convex hull of the feature points, and false tetrahedrons that do not exist in the target object need to be removed. While tracking, each triangle surface is examined and its corresponding tetrahedron will be removed if any feature point that should be behind the surface is visible. This carving process is expressed by the following equations. (Ti |Rj,k ) = (1 − Intersect(Ti , Rj,k )) (1) Pexist (Ti |v) = v
Intersect(Ti , Rj,k ) =
v
1(if Rj,k intersects Ti ) 0(otherwise)
(2)
Ti denotes the ith triangle in the model, j denotes a keyframe id for reconstruction, k denotes a feature point id, Rj,k denotes a ray in the jth keyframe from a camera position to the kth feature point, v denotes all combinations of (j, k) where the kth feature point is visible in the jth keyframe. However, this test will remove tetrahedrons wrongly that exist in the target object due to some noise in feature point positions. To cope with this problem, we have implemented a probabilistic carving algorithm found in ProFORMA [4]. After carving, texture is mapped using keyframes, that are stored automatically during tracking, onto the polygon surfaces. A keyframe is added when the camera pose is different from any other camera poses associated with existing keyframes. As the camera moves around the object, a textured polygon model that approximates the target object is acquired. In ProFORMA, feature points on the target object are easily identified because the camera is fixated. In our system, a user can move a camera freely so segmentation of the target region from the background is not trivial. As a solution, the user roughly draws round an object of concern on screen in the beginning of model creation, to specify a region to reconstruct. 3.2
Results
Two convex objects (a Rubic’s Cube and an aluminum can) and a concave object (an L-shape snack box) were reconstructed by the implemented feature based reconstruction technique, and compared against those reconstructed by the previous technique [1]. A desktop PC (AMD Athlon 64 X2 Dual Core 3800+, 4GB RAM) and a handheld camera (Point Grey Research, Flea3, 648×488@60fps) were used in the system. Figure 3 shows the results for a Rubic’s Cube and an aluminum can. Virtual models reconstructed by the previous technique have many cracks and texture dicontinuities compared with the new technique. Figure 4 shows the results for an L-shape snack box. Reconstructed object’s shape approximates that of the target object better after carving, however, some tetrahedrons wrongly remain probably due to insufficient parameter tuning for the probabilistic carving. Another conceivable reason is tracking accuracy. In our
Camera-Based In-situ 3D Modeling Techniques for AR Diorama
117
(a) Rubic’s Cube. (left) original, (middle) old technique, (right) new technique
(b) Aluminum can. (left) original, (middle) old technique, (right) new technique Fig. 3. Results for convex objects
(a) Snack box. (left) original, (middle, right) before carving
(b) Snack box. after carving Fig. 4. Results for a concave object
system, position accuracy of feature points rely on the PTAM library whereas a dedicated, robust drift-free tracking method is used in ProFORMA.
4
Region-Based Reconstruction
A feature-based approach relies on texture on the object surface, and thus not appropriate for texture-less and/or curved objects. Second technique is a
118
A. Umakatsu et al.
silhouette-based approach inspired by an interactive modeling method proposed by Bastian et al [5]. In the following, its implementation details and some reconstruction results are described in order. 4.1
Implementation
Natural feature points in the scene are first extracted and tracked again using the PTAM library. Then a user draws a stroke on screen to specify a target object to reconstruct. The stroke is used to build a set of foreground and background color distributions in the form of a Gaussian Mixture Model, and the image is segmented into the two types of pixels using graph-cuts [8](initial segmentation). After initial segmentation, the target object region is automatically extracted and tracked (dynamic segmentation) using again graph-cuts. In dynamic segmentation, a distance field converted from a binarized foreground image in the previous frame is used for robust estimation. In addition, stroke input based interaction techniques, Inclusion brush and Exclusion brush, are provided to manually correct the silhouette. After a silhouette of the target object is extracted, a 3D model approximating the target object is progressively reconstructed by silhouette carving. In silhouette carving, a voxel space is initially set around the object, and the 3D volume approximating the target object is iteratively carved by testing the projection of each voxel against the silhouette. This process is expressed by the following equation. vti denotes the ith voxel in frame t (v0i = 1.0), P t denotes a projection matrix, W (·) denotes a transformation from the world coordinate to the camera coordinate. i vti = vt−1 f (Itα (P t W (v i )))
(3)
Normally a voxel will remain empty once removed. To cope with unstable PTAM’s camera pose estimation, a voting scheme is introduced. That is, votes more than a threshold are required to finally remove a voxel. From the remaining voxel set, a polygon mesh is created using a Marching Cubes algorithm. Then each surface of the polygon mesh is textured based on the smallest angle between a camera pose in each keyframe and the surface normal. 4.2
Results
Using the same hardware devices as the feature-based reconstruction, it takes about 2.5 seconds from image capturing to rendering the updated textured object. However, the rendering and interaction performance is kept around 10 frames per second, thanks to a CPU-based, yet multi-threaded implementation. In the following, results of main steps of reconstruction as well as a few final reconstructed models are shown. Figure 5 shows a segmentation result in a frame, a binarized image of the target object region, and the corresponding distance field. Foreground probability decreases rapidly near the silhouette. Figure 6 and Figure 7 show an example
Camera-Based In-situ 3D Modeling Techniques for AR Diorama
(a) Segmentation result in the previous frame
(b) Binarized image
119
(c) Distance field
Fig. 5. Probability distribution in dynamic segmentation
(a) before
(b) Inclusion brush in use
(c) after
Fig. 6. Inclusion brush
(a) before
(b) Exclusion brush in use
(c) after
Fig. 7. Exclusion brush
usage of Inclusion and Exclusion brushes, respectively. A user is able to add (remove) an area to (from) the foreground region interactively. Figure 8 shows a series of voxel data generated by silhouette carving in order of time (left to right). As the number of keyframes increases, the volume shape is refined to approximate the targe object. Voxel color indicates texture id. Figure 9 shows a reconstructed plushie (c) and some keyframes used (a, b). Textures are mapped onto the model correctly, though some discontinuities appear. This is mainly due to brightness differences in textures mapped onto adjacent surfaces. Figure 10 shows a reconstructed paper palace (c) and some keyframes used (a, b). In this case, a concave part is not reconstructed well as indicated in a green circle in Figure 10(c). This is a typical limitation of a simple silhouette carving
120
A. Umakatsu et al.
(a) Keyframes
(b) Voxel data (color indicates texture id) Fig. 8. Silhouette carving
(a) Keyframe 1
(b) Keyframe 2
(c) Result
Fig. 9. Reconstruction of a plushie
(a) Keyframe 1
(b) Keyframe 2
(c) Result
Fig. 10. Reconstruction of a paper palace
algorithm. To tackle this, we will need to introduce photometric constraints or combine with a feature-based approach.
Camera-Based In-situ 3D Modeling Techniques for AR Diorama
(a) Keyframe 1
121
(b) Result
Fig. 11. Reconstruction of an apple
Figure 11 shows a reconstructed apple (b) and a keyframe used (a). In this case no feature points were found in the foreground region and both our previous and new feature based techniques did not work. A region based technique is suitable for reconstructing such a feature-less object as far as its color distribution is different from that of the background.
5
Conclusion
In this study, we have introduced two types of implementations and results of 3D reconstruction techniques for our AR Diorama system, inspired by the recent advancements in this field [4,5]. The feature-based technique implemented has been proven to produce a better reconstructed model compared with our previous technique. Advantages of feature-based approaches over region-based approaches include that they can reconstruct concave objects, objects whose color distribution is similar to that of the background, and potentially non-rigid objects, and that users need not to shoot an object from many directions. However, with our current implementation, it was found that some nonexistent surfaces sometimes remain probably due to insufficient parameter tuning and inaccurate feature tracking. The region-based technique has also been proven to produce a better reconstructed model compared with our previous technique. Advantages of regionbased approaches include that they can reconstruct feature-less objects such as plastic toys and fruits. As far as the color distribution of the target object is different from that of the background, region-based approaches will succeed in reconstruction. However, they cannot handle concave objects well by its nature. In the future, we will continue pursuing improving the reconstruction quality by combining feature-based and region-based approaches, extend the strokebased interaction techniques [9,10,11,12] and develop an easy-to-use, multi-purpose AR Diorama system.
References 1. Tateishi, T., Mashita, T., Kiyokawa, K., Takemura, H.: A 3D Reconstruction System using a Single Camera and Pen-Input for AR Content Authoring. In: Proc. of Human Interface Symposium 2009, vol. 0173 (2009) (in Japanese)
122
A. Umakatsu et al.
2. Lee, G.A., Nelles, C., Billinghurst, M., Kim, G.J.: Immersive Authoring of Tangible Augmented Reality Applications. In: Proc. of the 3rd IEEE International Symposium on Mixed and Augmented Reality, pp. 172–181 (2004) 3. Fudono, K., Sato, T., Yokoya, N.: Interactive 3-D Modeling System with Capturing Support Interface Using a Hand-held Video Camera. Transaction of the Virtual Reality Society of Japan 10(4), 599–608 (2005) (in Japanese) 4. Pan, Q., Reitmayr, G., Drummond, T.: ProFORMA: Probabilistic Feature-based On-line Rapid Model Acquisition. In: Proc. of the 20th British Machine Vision Conference (2009) 5. Bastian, J., Ward, B., Hill, R., Hengel, A., Dick, A.: Interactive Modelling for AR Applications. In: Proc. of the 9th IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 199–205 (2010) 6. Hengel, A., Dick, A., Thorm¨ ahlen, T., Ward, B., Torr, P.H.S.: VideoTrace: Rapid Interactive Scene Modelling from Video. ACM Transactions on Graphics 26(3), Article 86 (2007) 7. Klein, G., Murray, D.: Parallel Tracking and Mapping for Small AR Workspaces. In: Proc. of the 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 1–10 (2007) 8. Boykov, Y., Kolmogorov, V.: An Experimental Comparison of Min-cut/max-flow Algorithms for Energy Minimization in Vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(9), 1124–1137 (2004) 9. Thorne, M., Burke, D., van de Panne, M.: Motion Doodles: An Interface for Sketching Character Motion. ACM Transactions on Graphics 23(3), 424–431 (2004) 10. Cohen, J.M., Hughes, J.F., Zeleznik, R.C.: Harold: a world made of drawings. In: Proc. of the 1st International Symposium on Non-photorealistic Animation and Rendering, pp. 83–90 (2000) 11. Bergig, O., Hagbi, N., El-Sana, J., Billinghurst, M.: In-place 3D Sketching for Authoring and Augmenting Mechanical Systems. In: Proc. of the 8th IEEE International Symposium on Mixed and Augmented Reality, pp. 87–94 (2009) 12. Popovic, J., Seitz, S.M., Erdmann, M., Popovic, Z., Witkin, A.: Interactive Manipulation of Rigid Body Simulations. In: Proc. of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 209–217 (2000)
Design Criteria for AR-Based Training of Maintenance and Assembly Tasks Sabine Webel, Ulrich Bockholt, and Jens Keil Fraunhofer Institute for Computer Graphics Research IGD Fraunhoferstr. 5, 64283 Darmstadt, Germany {sabine.webel,ulrich.bockholt,jens.keil}@igd.fraunhofer.de
Abstract. As the complexity of maintenance tasks can be enormous, the efficient training of technicians in performing those tasks becomes increalingly important. Maintenance training is a classical application field of Augmented Reality explored by different research groups. Mostly technical aspects (e.g tracking, 3D augmentations) have been in focus of this research field. In our paper we present results of interdisciplinary research based on the fusion of cognitive science, psychology and computer science. We focus on analyzing the improvement of AR-based training of maintenance skills by addressing also the necessary cognitive skills. Our aim is to find criteria for the design of AR-based maintenance training systems. A preliminary evaluation of the proposed design strategies has been conducted by expert trainers from industry. Keywords: Augmented Reality, training, skill acquisition, training system, industrial applications.
1 Introduction As the complexity of maintenance and assembly tasks can be enormous, the training of the technician to acquire the necessary skills to perform those tasks efficiently is a challenging point. A good guidance of the user through the training task is one of the key features to improve the efficiency of training. Traditional training programs are often expensive in terms of effort and costs, and rather inefficient, since the training is highly theoretical. Due to the complexity of maintenance tasks, it is not enough to teach the execution of these tasks, but rather to train the underlying skills. Speed, efficiency, and transferability of training are three major demands which skill training systems should meet. In order to train the maintenance skills, the trainee’s practical performance of the training tasks is vitally important. From previous research it can be derived, that Augmented Reality (AR) is a powerful technology to support training in particular in the context of industrial service procedures. Instructions on how to assemble/disassemble a machine can directly be linked to the machines to be operated. Various approaches exist, in which the trainee is guided step-by-step through the maintenance task. Mostly technical aspects (tracking, visualization etc.) have been in focus of this research field. Furthermore, those systems function rather as guiding systems than training systems. A potential danger of Augmented Reality applications is that users become dependent R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 123–132, 2011. © Springer-Verlag Berlin Heidelberg 2011
124
S. Webel, U. Bockholt, and J. Keil
on Augmented Reality features, and as a result they might not be able to perform the task, when those features are not available or when the technology fails. That is to say, an AR-based training system must clearly differ from an AR-based guiding system; it must really train the user instead of only guiding him through the task. This can be only achieved by involving cognitive aspects in the training. Industrial maintenance and assembly can be considered as a collection of complex tasks. In most cases, these tasks involve the knowledge of specific procedures and techniques for each machine. Each technique and procedure requires cognitive memory and knowledge of the way the task should be performed as well as fine motor "knowledge" about the precise movements and forces that should be applied. Hence, the skill, which is responsible for a fast and robust acquisition of maintenance procedures, is a complex skill. In this context, procedural skills can be considered as the most important skill in industrial maintenance tasks. Procedural skills are the ability to follow repeated a set of actions step-by-step in order to achieve a specified goal. It is based on getting a good representation of a task organization: What appropriate actions should be done, when to do them and how to do them. Within a cooperation of engineering and perceptual scientists we explored the training of industrial maintenance. Here we focused on training of procedural skills. By analyzing the use of Augmented Reality technologies for enhancing the training of procedural skills, we aim for finding design criteria for developing efficient AR-based maintenance training systems. Therefore, a sample training application has been developed. We present preliminary results of the evaluation conducted by maintenance trainers from industry.
2 Related Work As the complexity of maintenance and assembly procedures can be enormous, the training of operators to perform those tasks efficiently has been in focus of many research groups. Numerous studies presented the potential of Augmented Reality based training systems and its use in guidance applications for maintenance tasks. One of the first approaches is using Augmented Reality for a photocopier maintenance task [1]. The visualization is realized using wireframe graphics and a monochrome monoscopic HMD. The tracking of objects and the user's head is provided by ultrasonic trackers. The main objective is to extend an existing two dimensional automated instruction generation system to an augmented environment. Hence, only simple graphics are superimposed instead of complicated 3D models and animations. Reiners et al. [2] introduce an Augmented Reality demonstrator for training a doorlock assembly task. The system uses CAD data directly taken from the construction/production database as well as 3D-animation and instruction data prepared within a Virtual Prototyping planning session, to facilitate the integration of the system into existing infrastructures. For the tracking they designed an optical tracking system using low cost passive markers. A Head Mounted Display functions as display device. Schwald et al. describe an AR system for training and assistance in the industrial maintenance context [3], which guides the user step-by-step through training and
Design Criteria for AR-Based Training of Maintenance and Assembly Tasks
125
maintenance tasks. Magnetic and infrared optical tracking techniques are combined to obtain a fast evaluation of the user's position in the whole set-up and a correct projection for the overlays of virtual information in the user's view. The user is equipped with a lightweight helmet, which integrates an optical see-through HMD, a microphone, headphones, and a 3D-positioning sensor. The headphones offer the user the possibility to get audio information on the procedures to achieve. Via the microphone the user can easily interact with the system by using speech recognition. The 3D-positioning sensor is used to determine the position of the objects of interest in 3D space in relation to the user’s position. That way, 3D augmentations are directly superimposed with their real counterparts, whereby the parts of interest are highlighted. Besides, also information about how to interact with the counterparts can be visualized. The paper discusses the usage of the system, the user equipment, the tracking and the display of virtual information. In [4] a spatial AR system for industrial CNC-machines, that provides real-time 3D visual feedback by using a transparent holographic element instead of using user worn equipment (like e.g. HMD). Thus, the system can simultaneously provide bright imagery and clear visibility of the tool and work piece. To improve the user's understanding of the machine operations, visualizations from process data are overlaid over the tools and work pieces, while the user can still see the real machinery in the workspace, and also information on occluded tools is provided. The system, consisting of software and hardware, requires minimal modifications to the existing machine. The projectors need only to be calibrated once in a manual calibration process. An Augmented Reality application for training and assisting in maintaining equipment is presented in [5]. Overlaid textual annotations, frames and pointing arrows provide information about machine parts of interest. That way, the user's understanding of the basic structure of the maintenance task and object is improved. A key component of the system is a binocular video see-through HMD, that the user is wearing. The tracking of the position and orientation of equipment is implemented using ARToolKit [6]. The work of Franklin [7] focuses on the application of Augmented Reality in the training domain. The test-bed is realized in the context of Forward Air Controller training. Using the system, the Forward Air Controller (trainee) can hear and visualize a synthetic aircraft and he can communicate with the simulated pilot via voice. Thus, the trainee can guide the pilot onto the correct target. The system can provide synthetic air asset stimulus and can support the generation of synthetic ground based entities. Positions and behavior of these entities can be adapted to the needs of the scenario. The author concluded that the impact of Augmented Reality for training depends on the specific requirements of the end user and in particular on the realism of the stimulation required. According to the author, this is influenced by the means of the required stimulation, the criticality on how the synthetic stimulation is used, the dynamism and complexity of the training environment and the availability of a common synthetic environment.
126
S. Webel, U. Bockholt, and J. Keil
3 Training of Procedural Skills As mentioned before, procedural skills are the ability to follow repeated set of actions step-by-step in order to achieve a specified goal and reflect the operator’s ability to obtain a good representation of task organization. This skill is needed in the performance of complex tasks as well as simple tasks. Procedural skills are based on two main components: procedural knowledge and procedural memory. Procedural knowledge enables a person to reproduce trained behavior. It is defined as the knowledge about how and when (i.e. in which order) to execute a sequence of procedures required to accomplish a particular task [8]. Procedural knowledge is stored in the procedural memory, which enables persons to preserve the learned connection between stimuli and responses and to response adaptively to the environment [8]. Generally speaking, procedural skills develop gradually over several sessions of practice (e.g. [9]) and are based on getting a good internal representation of a task organization. Therefore, the training of procedural skills should address the development of a good internal representation of the task and the execution of the single steps in the right order in early training phases. 3.1 Enhancement of Mental Model Building It has been explored that the performance of a learner of a procedural skill becomes more accurate, faster, and more flexible when he is provided with elaborated knowledge (e.g. [10],[11]). This means that the learner’s performance increases when how-it-works knowledge (“context procedures”) is provided in addition to the howto-do-it knowledge (“list procedures”) (e.g. [10]). According to Taatgen et al., when elaborated knowledge is given, the learner is able to extract representations of the system and the task, which are closer to his internal representation, and as a result performance improved [10]. This internal, psychological representation of the device to interact with can be defined as mental model [12]. In order to support the trainee’s mental model building process, the features of the task which are most important for developing a good internal representation must be presented to the trainee. It has been suggested, that "the mental model of a device is formed largely by interpreting its perceived actions and its visible structure" [13]. The mental model building is mainly influenced by two factors: the actions of the system (i.e. the task and the involved device) its visible structure. Transferring this into the context of procedural skill training, two aspects seem to be important for supporting the building of a good mental model: One is providing an abstract representation of the system, what constructs a better understanding of how it works. The other one is providing the visual representation of the system, which will strengthen the internal visual image. It has been found, that people think of assemblies as a hierarchy of parts, where parts are grouped by different functions (e.g. the legs of a chair) [11]. Hence, the hypothesis is that the displayed sub-part of the assembly task should include both the condition of the device before the current step (or rather the logical group of steps to which the current step belongs) and the condition after. This hypothesis is based on the work of Taatgen et al. [10], in which it is shown that
Design Criteria for AR-Based Training of Maintenance and Assembly Tasks
127
instructions which state pre- and post-conditions yield better performance than instructions which do not. Reviewing this it can be concluded, that the user’s mental model building process can be improved by using visualization elements providing context information.
4 Design Strategies It has been shown, that guided experience is good for learning, but an active exploration of the task has to be assured as well (e.g. [14],[15]). A too strong guidance of the trainee during training impedes an active task exploration and harms the learning process. Active exploration naturally occurs when transferring the information about the task during training is accompanied with some difficulties, forcing the trainee to independently explore the task. If such difficulties are reduced (e.g. by showing the user in detail how to solve the problem), active exploration may not take place. Strong visual guidance tools impede active exploration, because they guide the trainee in specific actions and thus inhibit the trainee’s active exploratory responses [16]. This can be illustrated using the example of a car driver guided by a route guidance system: this driver typically has less orientation than driver who is exploring the way with the help of maps and street signs. Also reproducing the way, when he has to drive it again, is more difficult for the driver who used the route guidance system. From all this it can be concluded, that the training system should include visual elements that allow for reducing the level of provided information. Furthermore, it should contain elements, which guide the trainee through the training task by improving the trainee’s comprehension and internalization of the task, while active exploration is not inhibited. 4.1 Adaptive Visual Aids An important issue when designing AR-based training systems is how much information should be visualized in the different training phases. A basic understanding of how much information the trainee needs during learning can be obtained by observing studying people. Examining the learning behavior of a student studying procedural processes using textbooks or written notations, the following characteristics can be observed: First of all, for each step the student marks a couple of words, a sentence or an excerpt in the running text and writes annotations at the side margin. He studies the process by going repeatedly through this learning material. In the first cycles, the student reads the marked text and the accordant annotations to catch information about the single steps and to put them in order. With the increasing number of performed studying cycles the information that he needs to decide and reproduce the single steps of the procedure decreases. When he starts studying he needs more detailed information about the single steps, because the learning of the single steps is in focus. With the growing development of an understanding of the single steps, the learning of how the steps fit together (i.e. of the procedure) comes increasingly to the fore.
128
S. Webel, U. Bockholt, and J. Keil
Fig. 1. Adaptive Visual Aid in the training application: a pulsing yellow circle (pointer) highlights the area of interest; the detailed instruction is given on the plane (content object)
Transferring this observation into the context of training, the mapping of the visualized information level to the different training phases can be hypothesized as follows: In early phases, a clear and detailed instruction about the current step should be provided in order to train the trainee in understanding and performing the single steps. This can be realized by using adaptive visual aids (AVA) consisting of overlaid 3D objects (pointer) and/or multimedia instructions (content) that is displayed on user demand (see Fig. 1). Alternatively, the pointer can act as object/area highlight while the content provides the detailed multimedia instruction. During the training the level of presented information should be gradually reduced (e.g. only 3D animation, then only area highlight with some buzzwords or a picture, then only area highlight, etc.). Hence, both AVA pointer and AVA content object can provide a variable amount of information. The pointer consists of at least one virtual object overlaid on the camera image (like traditional Augmented Reality overlays). Hence, it presents also the spatial component of the information. The pointer object can contain for example complete 3D animations, 3D models, or highlighting geometries (e.g. pulsing circle). The AVA content object consists of a view-aligned virtual 2D plane and different multimedia data visualized on that plane. Thus, it can provide multimedia information that is clearly recognizable for the user. The data displayed on the plane can contain text, images, videos and 3D scenes rendered in a 2D image on the plane, or any combination of those elements. That is, it can contain detailed instructions (e.g. a text description and a video showing an expert performing the task) or just a hint (e.g. a picture of the tool needed to perform the task). 4.2 Structure and Progress Information Since providing abstract, structural information about the task can improve the trainee’s mental model building process, and hence the acquisition of procedural skills (see chapter 3.1), visual elements displaying information about the structure of the training task should be included in the training system. Not only the structure of
Design Criteria for AR-Based Training of Maintenance and Assembly Tasks
129
the task, but also the relation between the current status and the structure is important. That is, the position of the current state in the whole structure should be visualized as well. Thus, the trainee gets an overview of the training task and can arrange the current step in the structure of the task and use this information to refine his internal representation of the task. One possibility to visualize structural information is the use of progress-bars. Progress-bars provide an abstract overview of the trainee’s current status in relation to the whole task (see Fig. 2).
Fig. 2. An extended progressbar showing the user’s progress inside the task and inside the mental groups (each part of the bar corresponds to a mental group of steps)
4.3 Device Display As mentioned in chapter 3.1, the presentation of context information, such as logical units of sub-tasks, and the display of the device to maintain can support the trainee’s mental model building. Moreover, the presentation of only relevant sub-parts of the device and the visualization of the pre- and post-conditions can further enhance the development of a good internal representation. Based on these findings, the use of a Device Display is suggested. The Device Display is a visual element that provides information about successive steps, or rather sub-tasks, belonging to a logical group. That is, it provides information about a good mental model of the task. This can support the user in developing his internal representation of the task. The provided information includes also the condition of the device before the current step and afterwards. Thus, using the Device Display, the user can recognize a sub-goal of the task he has to perform. This can help him to understand "what" he has to do, and hence to deduce the next step to perform. In fact, the presentation of sub-goals actually forces the trainee to deduce the next step without using a more direct visual guidance. The visualization of the Device Display is similar to the visualization of the AVA content (see Fig. 3, left). It consists of a view-aligned 2D plane and multimedia objects rendered on the top of this plane, which can be faded in/out on user demand. The objects displayed on the plane provide information about the grouped sub-tasks (i.e. mental group) and the condition of the device before and after the mental group. A text describes the objective of the mental group in a few words. For example, if the mental group comprises all steps for removing a machine cover, the text "Remove the cover of the valve" is displayed. Additionally, either a video of an expert’s performance of the grouped sub-tasks, or a 3D animation presenting the sub-tasks including the device conditions is shown in the Device Display. Thus, for each mental group the best representation can be chosen. Also a progress-bar is displayed, that shows the user’s progress inside the mental group. That way, supplemental information about the structure of the task, or rather of the mental model, is presented, what can further support the user’s mental model building process.
130
S. Webel, U. Bockholt, and J. Keil
Fig. 3. Left: a “Device Display” at the left side of the window shows a video about a mental group of steps; Right: a vibrotactile bracelet developed by DLR (German Aerospace Center)
4.4 Haptic (Vibrotactile) Hints The potential of vibrotactile feedback for spatial guidance and attention direction has been demonstrated in various works (e.g. [17]). Usually a lot of visual information has to be processed in complex working scenarios. In contrast, the tactile channel is less overloaded. Furthermore, vibrotactile feedback is a quite intuitive feedback, as the stimuli are directly mapped to body coordinates. Since it provides a soft guidance that "channels" the user to the designated target instead of directly manipulating his movements, it does not prevent the active exploration of the task. Thus, the mental model building process can be supported. Vibrotactile hints can be given by using simple devices like the vibrotactile bracelet shown in Fig. 3 (right). The bracelet developed by DLR is equipped with six vibration actuators which are placed at equal distance from each other inside the bracelet and hence also around the user‘s arm. The intensity of each actuator can be controlled individually. That way, various sensations can be generated, such as sensations indicating rotational or translational movements. Such vibrotactile feedback should be used to give the trainee additional motion hints during the task training, such as rotational or translational movement cues, and to guide the trainee to specific targets. For example, if the trainee needs to rotate his arm for performing a sub-task, the rotational direction (cw or ccw) may be difficult to recognize in a video showing an expert performing the sub-task. Receiving the same information using a vibrotactile bracelet, the trainee can easier identify the rotational direction. Also translational movements can be conveyed. Apart from that, vibrotactile feedback can also be used for presenting error feedback, such as communicating whether the right action is performed (e.g. the right tool is grasped). This can prevent the user from performing errors at an early stage. In addition, vibrotactile hints can be used to provide slight instructions by directing the trainee’s attention to a body part.
Design Criteria for AR-Based Training of Maintenance and Assembly Tasks
131
5 Preliminary Tests and Conclusion A preliminary evaluation has been conducted by four expert trainers from the food packaging industry (Sidel1). The training task is the assembly of a valve. The implemented training application consists of 32 steps, showing the sub-tasks which are necessary to assemble the valve. Haptic hints indicating rotational and translational movements of the user’s right wrist have been implemented and provided using the vibrotactile bracelet described above. The trainers performed the training task using the realized AR training platform. Afterwards they filled out a questionnaire about the usability and functionality of the training system and the design strategies. Table 1 shows an extract of this questionnaire. Table 1. Extract of the preliminary evaluation questionnaire SCALE The information provided by the platform via displayed information was enough to understand the task.
T1
T2
T3
T4
AVG
1
7
4
5
6
6
5,25
The visualization of the different operations was enough for 1 leaning the task?
7
5
5
6
6
5,5
no
no
no
no
7
8
8
Is there any critical information of the task missing?
N/A
Please rate the general visualization utilities: spatial information, step information, captions, etc.?
1
10
6
Please rate the overview strategy?
7,25
1
10
10
7
8
8
8,25
Please rate the spatial pointer strategy? (AVA pointer) 1 Please rate the content aids display strategy? (AVA content, 1 Device Display)
10
6
4
8
7
6,25
10
7
8
10
7
8
Please rate the context aids strategy? (progress bars)
1
10
6
7
6
8
6,75
Please rate the haptic hints strategy?
1
10
6
3
2
8
4,75
Please rate the playback/trainer-trainee based strategy?
1
10
6
8
10
8
8
1
10
6
6
9
8
7,25
From the functionality point of view, how do you rate the platform in overall? What percentage of the task do you consider that you have learnt? What grade would you give to the AR platform as learning system?
% 1
10
90% 70% 80% 10% 62,50% 8
7
8
5
7
We conclude from this, that the proposed design strategies, namely the use of Adaptive Visual Aids (AVAs), the provision of structure and progress information, the visualization of a Device Display and the integration of haptic hints, have a great potential for improving training of maintenance and assembly skills. The perception of the implemented haptic hints indicating movements turned out to be potentially valuable, but we have to refine the realization of the hints (i.e. the controlling of the vibration stimuli) in order to produce clear indications of the movements the trainee 1
Sidel is one of the world’s leaders of solutions for packaging liquid foods (http://www.sidel.com/).
132
S. Webel, U. Bockholt, and J. Keil
has to perform. In our future work the training platform will be optimized according to the results of the preliminary tests (i.e. improvement of haptic hints, provision of error feedback) and evaluated by technicians working at Sidel.
References 1. Feiner, S., Macintyre, B., Seligmann, D.: Knowledge-based Augmented Reality. Commun. ACM 36(7), 53–62 (1993) 2. Reiners, D., Stricker, D., Klinker, G., Müller, S.: Augmented Reality for construction tasks: Doorlock assembly. In: Proc. IEEE and ACM IWAR 1998: 1st Int. Workshop on Augmented Reality, pp. 31–46 (1998) 3. Schwald, B., Laval, B.D., Sa, T.O., Guynemer, R.: An Augmented Reality system for training and assistance to maintenance in the industrial context. In: 11th Int. Conf. in Central Europe on Computer Graphics, Visualization and Computer Vision, Plzen, Czech Republic, pp. 425–432 (2003) 4. Olwal, A., Gustafsson, J., Lindfors, C.: Spatial Augmented Reality on industrial CNCmachines. In: SPIE Conference Series (2008) 5. Ke, C., Kang, B., Chen, D., Li, X.: An Augmented Reality-Based Application for Equipment Maintenance. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 836–841. Springer, Heidelberg (2005) 6. Kato, H., Billinghurst, M.: Marker tracking and HMD calibration for a video-based Augmented Reality conferencing system. In: Proc. IEEE and ACM IWAR 1999, pp. 85– 94. IEEE Computer Society, Los Alamitos (1999) 7. Franklin, M.: The lessons learned in the application of Augmented Reality. In: Virtual Media for Military Applications, pp. 30/1–30/8 (2006) 8. Tulving, E.: How many memory systems are there? American Psychologist 40(4), 385– 398 (1985) 9. Gupta, P., Cohen, N.J.: Theoretical and computational analysis of skill learning, repetition priming, and procedural memory. Psychological Review 109(2), 401–448 (2002) 10. Taatgen, N.A., Huss, D., Dickinson, D., Anderson, J.R.: The acquisition of robust and flexible cognitive skills. Journal of Experimental Psychology: General 137(3), 548–565 (2008) 11. Agrawala, M., Phan, D., Heiser, J., Haymaker, J., Klingner, J., Hanrahan, P., Tversky, B.: Designing effective step-by-step assembly instructions. ACM Trans. Graph 22(3), 828– 837 (2003) 12. Cañas, J.J., Antolí, A., Quesada, J.F.: The role of working memory on measuring mental models of physical systems. Psicológica 22, 25–42 (2001) 13. Norman, D.: Some observations on mental models. In: Mental Models, pp. 7–14. Lawrence Erlbaum Associates, Mahwah (1983) 14. Mayer, R.E.: Multimedia Learning. Cambridge University Press, Cambridge (2001) 15. Wickens, C.D.: Multiple Resources and Mental Workload. Human Factors 50(3), 449–455 (2008) 16. Gavish, N., Yechiam, E.: The disadvantageous but appealing use of visual guidance in procedural skills training. In: Proc. AHFE (2010) 17. Weber, B., Schätzle, S., Hulin, T., Preusche, C., Deml, B.: Evaluation of a vibrotactile feedback device for spatial guidance. In: IEEE- World Haptics Conference (2011)
Object Selection in Virtual Environments Performance, Usability and Interaction with Spatial Abilities Andreas Baier1, David Wittmann2, and Martin Ende2 1
University of Regensburg, Universitätsstraße 31, 93053 Regensburg, Germany 2 Cassidian Air Systems, Rechlinerstraße 1, 85077 Manching, Germany
[email protected]
Abstract. We investigate the influence of users’ spatial orientation and space relations ability on performance with six different interaction methods for object selection in virtual environments. Three interaction methods are operated with a mouse, three with a data glove. Results show that mouse based interaction methods perform better compared to data glove based methods. Usability ratings reinforce these findings. However, performance with the mouse based methods appears to be independent from users’ spatial abilities, whereas data glove based methods are not. Keywords: Object selection, interaction method, virtual environment, input device, performance, usability, spatial ability.
1 Introduction In order to interact with virtual environments, the user must have the possibility to carry out object selection tasks, utilizing convenient interaction methods [2]. These interaction methods require particular user actions which in turn necessitate particular user abilities. Knowledge about these actions and abilities facilitates to make a decision pro or contra a particular interaction method. In this study six interaction methods based on two input devices, mouse and data glove, are developed and evaluated. The data glove typically is associated with virtual environments and as six degrees of freedom input device it allows for a variety of different interaction techniques such as gesture recognition or direct object selection. However, for a wide variety of gestures a motion tracking system is mandatory. Furthermore, the glove usually does not fit well to every person and system calibration must be conducted. The fact that the user has to wear the device can be seen as another drawback. The mouse in comparison has been designed for traditional 2D desktop applications but with appropriate mappings it also works in 3D applications. It requires a less complicated hardware setup, has not to be worn, and is suitable to basically every user. Since it is one of the most widely used devices a high degree of user familiarization can be assumed and, because it is placed on a 2D surface, less physical effort is to be expected. In addition, as its motion is only associated with three degrees of freedom, the assumption can be made that spatial abilities may have a lower impact R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 135–143, 2011. © Springer-Verlag Berlin Heidelberg 2011
136
A. Baier, D. Wittmann, and M. Ende
on task performance. In order to verify these assumptions, an assessment is carried out. Evaluation criteria are performance, usability [3] and interrelation between users’ spatial abilities and performance. Spatial abilities are measured with the Spatial Orientation Test [4] and the Space Relations Subtest [1].
2 Experimental Setup and Procedure 2.1 Participants The study has been conducted with 22 male and two female subjects with an average age of 25 years. 2.2 Hardware In order to facilitate three-dimensional perception a stereoscopic rear projection screen (246 cm screen diagonal) was used in combination with polarized glasses to present the object selection task scenery. The participants were seated in front of the projection screen with a distance of 125 cm between head and screen. The chair was equipped with fixed supports for both arms. The left support was fitted with a keyboard, the right one served as a board for the mouse and as the starting point for object selection with the data glove. The glove required an optical tracking system realized by six cameras equipped with infrared filters and appropriate light sources. It carried eight LEDs itself, one each at the tip of thumb, index and middle finger and five on the back of the hand. 2.3 Software and Procedures The basic object selection task scenery was the same for all interaction methods and participants. Nine bullets were presented (Fig. 1), eight blue and one magenta bullet, which was the target object.
Fig. 1. Exemplary object selection task scenery (target object in magenta)
The selection process was always two-staged and required a nomination and a confirmation of the target bullet. Nomination of a bullet changed its color from blue or magenta to white, and confirmation into green in case of a correct selection or else to red. Task difficulty was varied by the two factors bullet size (1.4 and 0.6 cm diameter) and object distance (25 and 65 cm from the users hand), resulting in four difficulty levels: large and near (A), large and far (B), small and near (C) and small
Object Selection in Virtual Environments
137
and far (D). By means of a training session all participants were made familiar with the interaction methods before the trials began and all participants accomplished the full set of object selection tasks. Thus, each participant conducted four trials with each interaction method leading to 24 trials in total. The presentation sequence as well as the levels of task difficulty was counterbalanced in order to avoid sequence effects. The selection of the target object was carried out as fast and precise as possible. The participants started the evaluation of an interaction method manually by pressing the F8-button on the keyboard. This also started the measurement of the selection time and the registration of errors. The measurement stopped when the target object was correctly selected. Selection of a false object caused the registration of an error. The measurement of time and registration of further errors was continued until the correct selection was made. The six interaction methods differed in terms of the nomination and confirmation processes, leading to three groups. Group 1 - Data Glove Based Interaction Methods Direct: Nomination required the user to point at the object with the tip of the index finger. In the virtual scenery the fingertips were indicated by grey colored pellets, the index finger was additionally marked with a white cone (Fig. 3a). For confirmation the fingertips of both thumb and middle finger had to be brought together. Ray: A virtual ray originated at the tip of the index finger. The ray had to be pointed onto an object in order to accomplish its nomination. Confirmation was conducted as with Direct. Group 2 - Mouse Based Interaction Methods Plane: Nomination of an object was achieved by horizontal adjustment of the grey colored selection cross (Fig. 2a) with the mouse and allocation of the vertical plane using the mouse wheel. Confirmation of the nomination was carried out by pressing the F8-button on the keyboard. Cylinder: The cursor was modeled as a two-dimensional circle on the underlying surface. It had to be positioned right under an object (Fig. 2b). When the edge of the circle passed through the center of an object projection on the underlying surface, the cursor shape changed from a circle to a semitransparent cylinder (Fig. 2c), nominating the accordant bullet. Confirmation was conducted as with the Plane.
(2a)
(2b)
(2c)
Fig. 2. Plane (2a), Cylinder without (2b) and with nomination (2c)
138
A. Baier, D. Wittmann, and M. Ende
Group 3 – List Based Interaction Methods A common feature of these interaction methods was the use of a selection list (Fig. 3). Object selection did not happen by nomination of an object itself, but by nomination of its representation from a selection list. Each row represented one bullet, displayed by a circular symbol of the appropriate color. The top row presented the left bullet, the bottom row the right one. The lists were the same for both methods. The sizes of the bullets in the scenery as well as their distances from the user varied as in the other interaction methods.
(3a)
(3b)
Fig. 3. Hand (3a) and Mouse Operated List (3b)
Hand Operated List: Object nomination required the user to point at the accordant row with the tip of the index finger. The grey row indicated the current position in the list (Fig. 3a). The color of the selected bullet changed into white accordingly. In order to confirm the nominated row and therewith selecting the associated bullet, the F8-button of the keyboard had to be pressed. Mouse Operated List: Object nomination was carried out by scrolling through the list (Fig. 3b) using the mouse wheel and confirmation as with the Hand Operated List. 2.4 Usability Measurement After the evaluation of each interaction method all participants were presented a standardized usability questionnaire according to ISO 9241-9:2002 [3]. 2.5 Spatial Ability Measurement In order to determine the space relations ability of the participants (ability to think in three dimensions) the Space Relations Subtest of the Differential Aptitude Test was applied [1]. For the measurement of the spatial orientation ability the Spatial Orientation Test was adopted (ability to take over different perspectives or orient in space) [4].
Object Selection in Virtual Environments
139
3 Results 3.1 Performance Measurement Both selection time and error rate of the hand based methods Direct and Ray increase considerably with rising task difficulty. In case of Direct, the decrease in performance between low and high task difficulty amounts to about 100 % (3.2s) and in case of the Ray to about 180 % (8.6s). Direct shows superior performance compared to Ray regarding selection time and error rate. The difference in selection time accounts for 2.8 seconds on average, considering an error rate of 0.5 errors per selection. The selection times of the mouse base methods Plane and Cylinder rise about 50 % and 90 % respectively with increasing task difficulty, whereas the error rates are very low and exhibit no noticeable difference between the two interaction methods. Concerning selection time, Cylinder performs about 2.3 seconds faster across all difficulty levels. Both Hand and Mouse Operated List show comparably low error rates with no significant difference between both methods. The gain regarding selection time with the Mouse compared to the Hand Operated List averages about 30 % (0.8s). A comparison of all interaction methods shows that the best performance is achieved by Hand and Mouse Operated List. The Cylinder and Plane show moderate performance. With regard to error rate their performance is comparable to the list based methods. Data glove based methods show relatively long selection times and high error rates. Sequencing the methods according to their performance in an anticlimactic order shows the following results: 1. Mouse Operated List, 2. Hand Operated List, 3. Cylinder, 4. Plane, 5. Direct and 6. Ray. At large mouse based methods show better performance than data glove based methods. Table 1 summarizes the results of the performance measurements. Table 1. Results of performance measurements. Selection time, error rate and standard deviation (SD) in parentheses. A-D indicate the levels of difficulty (A is the easiest, D the most difficult task condition). selection time (SD) [s] Difficulty
error rate (SD) [count]
A
B
C
D
A
B
C
D
Direct
3.2 (0.4)
3.6 (0.4)
6.5 (1.2)
6.4 (0.7)
0.3 (0.1)
0.2 (0.1)
0.9 (0.3)
0.8 (0.2)
Ray
4.7 (0.6)
6.1 (1.0)
7.1 (1.1)
13.3 (2.2)
0.7 (0.2)
0.9 (0.2)
0.7 (0.1)
1.7 (0.3)
List Hand
2.4 (0.3)
0.1 (0.1)
List Mouse
1.6 (0.1)
0.0
Plane
4.6 (0.2)
5.0 (0.4)
5.9 (0.3)
6.8 (0.5)
0.1 (0.1)
0.0
0.1 (0.1)
0.1 (0.1)
Cylinder
2.4 (0.2)
2.8 (0.3)
3.3 (0.2)
4.6 (0.7)
0.0
0.0
0.0
0.0
3.2 Usability Measurement Evaluation comprised the following items: expenditure of energy required to accomplish the task (a), constancy (b), required effort while carrying out the task (c), accuracy (d), speed (e), overall contentment (f) and utilization (g).
140
A. Baier, D. Wittmann, and M. Ende
Table 2. Results of usability evaluation. The table shows mean values (1 is worst, 7 is best). Standard deviations in parentheses. Item
List Mouse
List Hand
Cylinder
Plane
Direct
Ray
a
6.7 (0.5)
6.4 (0.9)
6.0 (0.6)
6.0 (1.1)
5.8 (1.0)
5.8 (1.3)
b
6.8 (0.4)
6.4 (0.8)
6.0 (1.1)
6.0 (0.8)
5.4 (1.2)
4.6 (1.4)
c
6.7 (0.6)
6.3 (0.7)
6.0 (1.1)
5.7 (1.2)
5.0 (1.2)
4.2 (1.5)
d
6.7 (0.7)
6.0 (1.1)
5.5 (1.3)
5.7 (1.2)
4.5 (1.6)
3.2 (1.5)
e
6.7 (0.5)
6.3 (0.8)
5.6 (1.3)
5.7 (1.0)
5.2 (1.4)
4.3 (1.3)
f
6.6 (0.6)
6.3 (0.8)
5.6 (1.3)
4.7 (1.1)
5.0 (1.6)
3.9 (1.6)
g
6.5 (0.8)
6.2 (0.9)
6.0 (1.1)
5.2 (1.2)
5.5 (1.3)
4.3 (1.3)
Total
6.7 (0.6)
6.3 (0.9)
5.8 (1.1)
5.6 (1.1)
5.2 (1.3)
4.3 (1.4)
The overall rating indicates a precedence of the Mouse and Hand Operated List. Cylinder and Plane achieve a moderate rating and Direct and Ray feature the lowest values. This order complies with the performance order. The average evaluation of accuracy (d) and speed (e) reflect the actual performance quite good, except for Plane and Cylinder. Both accuracy and speed received comparable usability judgments, whereas the actual performance regarding selection time differs appreciably. Concerning the overall contentment (f) and utilization (g) Direct ranks higher than Plane (Tab. 3). 3.3 Spatial Ability Measurement The examination of the influence of space relations and spatial orientation ability on performance shows a significant correlation between spatial orientation ability and both selection time and error rate with data glove based interaction methods (r = -.18, p < .01 and r = -.16, p < .05 respectively). For mouse and list based interaction methods there is no significant correlation. Furthermore, results reveal a significant correlation between spatial orientation ability and selection time with the Hand Operated List (r = -.21, p < .05). In case of the Mouse Operated List, there is no significant correlation. To identify the actual differences in performance between users with high and low spatial orientation abilities, two groups are formed by median split. With the data glove based interaction methods (Group 1) the users with high spatial orientation ability perform on average 18 % (1.0s) faster and make 15 % (0.11) less errors compared to the users with low spatial orientation ability. With a high task difficulty the difference regarding selection time amounts to 50 % (8.0s vs. 12.0s) and the error rate difference is 7 % (1.19 vs. 1.27). Under low task difficulty conditions no difference regarding selection time was determined, whereas the error rate difference amounts to 64 % (0.39 vs. 0.64). An analysis of the performance differences within the list based interaction methods (Group 3) shows that the users with high spatial
Object Selection in Virtual Environments
141
orientation ability need on average 14 % (0.3s) shorter time to select the target object with the Hand Operated List compared to the users with low spatial orientation ability. For the Mouse Operated List no differences can be found. Table 3. Results of the space relations and spatial orientation test scores and high and low spatial orientation ability group values after median split. M
SD
Range
N
Space Relations Test Score (0-100)
85.4
8.6
66-98
24
Spatial Orientation Test Score (0-180)
162
16.7
96-174
24
Low
151.8
20.6
96-166
12
High
170.2
2.0
168-174
12
Spatial Orientation Ability Group
4 Discussions 4.1 Performance The results of the performance measures suggest considerable differences between the evaluated interaction methods. The superiority of the list based interaction methods could be due to the unvaried levels of task difficulty. The reason for the advantage of the Mouse compared to the Hand Operated List presumably originates from differences in the nomination process, which requires considerable hand movement when using the data glove while using the scroll wheel of the mouse is not associated with major hand movement. The advantage of the Cylinder compared to the Plane regarding selection time probably arises due to the automated adjustment of the cylinder height. The worse performance regarding selection time of the data glove based interaction methods certainly arise from the high error rates that go along with these methods, which in turn may origin from the confirmation mode - bringing together the tips of the middle finger and the thumb. This finger movement seems to affect the position and the direction of the index finger, thus increasing the probability of an error. In contrast, nomination with the mouse based interaction methods is carried out with the right hand and confirmation is done with the left hand. Generally, the higher level of user familiarity with a mouse compared to a data glove tends to amplify the difference in performance. 4.2 Usability Furthermore, the subjective rating on expenditure of energy required to accomplish the task, constancy and required effort while carrying out the task, as well as overall contentment and evaluation of utilization correlate with the performance of the interaction methods. The superior evaluation of the Plane compared to the Cylinder regarding accuracy and speed may be a consequence of the missing necessity to adjust the height of the cursor while directing it to the target object. Positioning the cursor under the object implies a trial procedure, which is only finished when the cursor
142
A. Baier, D. Wittmann, and M. Ende
changes into a cylinder. The superior rating of Direct compared to Plane regarding overall contentment and utilization assumedly originates from the relatively simple operation of Direct in contrast to the complex nomination process with Plane which requires combined mouse movement and scroll wheel operation. 4.3 Spatial Ability The correlation between spatial orientation ability and performance with different data glove based interaction methods at first indicates considerable demands on the spatial orientation ability of the user with these interaction methods. The conclusion that these demands arise from the use of the data glove is not valid due to the lack of comparability between the tested methods. However, the significant interrelation between spatial orientation ability and selection time with the Hand but not the Mouse Operated List supports this assumption, because these methods are indeed comparable. Taking into account that the statistical spread of the spatial orientation ability values was relatively small, a stronger influence of spatial orientation ability on performance can be expected for a more heterogeneous user group. It also is to be taken into account that the participants were almost exclusively male. The transferability to a wide range of users may therefore be limited. For more universally applicable results, gender effects have to be investigated. 4.4 Design Recommendations Results of performance and usability measurements generally suggest mouse based interaction methods to accomplish object selection tasks in virtual environments. The question, which mouse based interaction method should finally be used, strongly depends on the task to be accomplished. Mouse Operated List is most suitable when the number of objects to be selected or their size is relatively small and when cases of occlusion appear frequently. Its use is suboptimal when a particular object has to be selected from a large number of objects due to the increasing number of rows within the list and the accordant rise in search time. Alternatively the application of the Cylinder can be suggested, whereas occlusion or superimposed assemblies of objects can also cause problems. To partly avoid these problems, the Plane, which received relatively positive usability ratings, seems to be suitable as well. Another approach could be the combination of the Cylinder and the Mouse Operated List, which could solve the occlusion and superimposition problem. The independence of mouse based interaction methods from the user's spatial orientation ability argues for these methods, especially in cases where relatively heterogeneous user groups are to be expected and a high performance is important. Data glove based interaction methods can generally not be recommended due to their relatively poor performance. If they are considered to be used, Direct is the method of choice due to its comparatively positive usability rating. Especially for easy and very easy object selection tasks this method should be convenient. In order to facilitate task fulfillment, a combination of Direct and Hand Operated List could be implemented, utilizing the list particularly to select small objects. Generally, the adoption of the Ray is not recommended but could be considered when the objects to be selected are at a distance that can not be reached physically. When considering the usage of a data glove based interaction method, the
Object Selection in Virtual Environments
143
significant influence of the user's spatial orientation ability on performance should be taken into account. Especially in operational areas where high performance and safety has to be ensured, the use of the data glove implies aptitude tests, examining the spatial orientation ability of potential users. 4.5 Outlook To enhance the interaction, improvements in the design of individual methods are necessary and combinations of interaction methods should be developed and assessed. Further analysis of the relationship between spatial orientation ability and performance with improved or other interaction methods and tasks should support the selection of a suitable interaction method, and gives important advice which criteria should be evaluated in aptitude assessments. Also gender differences should be examined systematically. Furthermore, the investigation of learning effects appears to be important, considering the novelty of the virtual environment, the analyzed interaction methods, and the high levels of familiarization with commonly used interaction devices such as mouse or keyboard. Acknowledgements. We thank the participants for taking part in the experiment and H. Neujahr, Prof. Dr. A. Zimmer, Dr. P. Sandl and C. Vernaleken for their inspiring comments.
References 1. Bennett, G.K., Seashore, H.G., Wesman, A.G.: Differential Aptitude Tests (Space Relations Subtest). The Psychological Corporation, New York (1990) 2. Bowman, D.A., Kruijff, E., LaViola, J.J., Poupyrev, I.: User Interfaces: Theory and Practice. Pearson Education, Inc., Boston (2005) 3. Ergonomic requirements for office work with visual display terminals (VDTs) - Part 9: Requirements for non-keyboard input devices (ISO 9241-9:2002) 4. Hegarty, M., Kozhevnikov, M., Waller, D.: Perspective Taking / Spatial Orientation Test. University of California, Santa Barbara (2008), Downloaded on April 16, 2010, http://www.spatiallearning.org
Effects of Menu Orientation on Pointing Behavior in Virtual Environments Nguyen-Thong Dang and Daniel Mestre Institute of Movement Sciences, CNRS and University of Aix Marseille II 163 avenue de Luminy, CP 910, 13288 Marseille cedex 9, France {thong.dang,daniel.mestre}@univmed.fr
Abstract. The present study investigated the effect of menu orientation on user performance in a menu items' selection task in virtual environments. An ISO 9241-9-based multi-tapping task was used to evaluate subjects’ performance. We focused on a local interaction task in a mixed reality context where the subject’s hand directly interacted with 3D graphical menu items. We evaluated the pointing performance of subjects across three levels of inclination: a vertical menu, a 45°-tilted menu and a horizontal menu. Both quantitative data (movement time, errors) and qualitative data were collected in the evaluation. The results showed that a horizontal orientation of the menu resulted in decreased performance (in terms of movement time and error rate), as compared to the two other conditions. Post-hoc feedback from participants, using a questionnaire confirmed this difference. This research might contribute to guidelines for the design of 3D menus in a virtual environment. Keywords: floating menu, menu orientation, local interaction, pointing, evaluation, virtual environments.
1 Introduction Graphical menus are frequently used for system control, one of the basic tasks in Virtual Environments (VEs). Typically, a command is issued to perform a particular function, to change the mode of interaction, or to change the system state [2]. Research on graphical menus in VEs currently focuses on the design of menu characteristics, among them menu appearance and structure, menu placement, menu invocation and availability, etc. [4]. Various menu systems for VEs have been proposed in the literature; however, a standard graphical menu system for VEs is still yet to come. Among different approaches for menu system design, adopting two-dimensional (2D) graphical menus in VEs presents many advantages. This approach, bringing the commonly used menu concept from 2D user interfaces to VEs, might benefit from well-established practices in 2D menu design. However, whereas traditional 2D menus are always constrained to a fixed vertical plane surface, graphical menus in a 3D spatial context, such as VEs, can be positioned and oriented in many different ways. The menu placement in an immersive environment can be world-referenced, R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 144–153, 2011. © Springer-Verlag Berlin Heidelberg 2011
Effects of Menu Orientation on Pointing Behavior in Virtual Environments
145
object-referenced, body-referenced, device-referenced, etc. [2][4]. In addition, a menu system can also be oriented in different angles with respect to the user’s viewpoint. This might also be one of the reasons why the Virtual Reality (VR) and 3D User Interfaces research community agrees on the term “floating menu” to refer to a 3D menu system in VEs. The main issue for menu placement in VEs is to define an optimal combination between menu position and menu orientation, to facilitate interaction with menu items. The present paper tackles issues of floating menus’ placement in VEs, in particular menu orientation. We conducted a study investigating the effect of menu orientation on user performance in a menu items’ selection task in VEs. We focused on a local interaction task (pointing to menu items). More specifically, we hypothesized that the inclination of a floating menu would affect users' perception of menu items’ distance and consequently the pointing action. We thus hypothesized that differences in menu orientation would lead to differences in terms of pointing time and errors. Determining whether menu orientation has any effect on users' performance, might be helpful for developers in choosing the placement of menus in VEs; especially in the context of human scale virtual environments where virtual objects (in our case menu systems) are usually within reach of users’ hand. We did not address here "at-adistance" interaction or distant pointing, in which a ray-casting technique is usually used to point to a distant menu system. In the present study, the subject’s hand directly interacted with menu items.
2 Related Work The benefit of graphical menus for interaction in a virtual environment was first showed in a study conducted by Jacoby and Ellis [10]. Since then, various studies on the design and evaluation of graphical menus in VEs have been proposed, leading to a collection of more than thirty existing menu systems at present [4]. A comprehensive review of those menu systems can be found in a survey conducted by Dachselt and Hübner [4]. However, most of the previous studies focused on the menu's appearance and structure, rather than on the issues of menu placement. We can find in the literature numerous studies about the arrangement of menu items which varies from a planar layout (for example, Kim et al. [11]; Bowman and Wingrave [3]; Ni et al. [12], to name a few), to a ring layout (for e.g. the Spin menu proposed by Gerber and Bechmann [7] ), or to a 3D layout as the Control and Command Cube (Grosjean et al. [8]), etc. Only a few studies focused on the issues of menu placement in a virtual environment. However, those studies mostly worked on the spatial reference frame of the menu system, rather than on menu orientation. Since the menu orientation is also dependent to the reference frame in certain cases (handheld menu for example), it is worth introducing shortly here some typical studies regarding the reference frame of menu systems in VEs. In 2000, Kim et al. [11] conducted a study involving three factors: menu presentation (the way menu elements are disposed on menus), input device and menu reference frame in a Head-Mounted Display (HMD). The two menu’s reference frames were: world-reference (where the menu is fixed in a position of the scene, independent to the user’s viewpoint) and viewer-reference (where menu position and
146
N.-T. Dang and D. Mestre
orientation was updated and so kept unchanged in relation to the user's viewpoint). The menu was placed at a distance and the selection of menu items could be done using a ray-casting technique. However, the study focused more on comparing interaction modalities (gesture versus tracked device) than on the issue of reference frame itself. Besides, the analysis on menu presentation and reference frame was not provided. It is thus difficult to draw any conclusion regarding the effect of different reference frames from this study. Another study conducted by Bernatchez and Robert [1] compared 5 spatial frames of reference, among them world-reference and 4 types of body-reference in a HMD. The four configurations of body-reference were (1) the menu follows the user in position only (2) the menu follows the user’s position and turns to remain facing the user (3) the menu is attached to the user’s non dominant hand and (4) the menu is following the user’s gaze direction. The menu was placed in the subject’s arm range (i.e. local interaction) and subjects controlled their hand's avatar to interact with menu elements. This study showed that the user performed the experimental task (a slider control) the best with the body-reference frame (2). It is important to note that most previous studies about the placement of floating menus (including the previous two studies) have been conducted using a HMD, which is different from a mixed-reality context, like a rear-projected VR system such as CAVE or a workbench. In those VR systems, subjects can see their own body and use their body to interact with different elements of the virtual scene. Recently, a study conducted by Das and Borst [5] investigated menu manipulation performance in a rear-projected VR system, in different layouts, with different menu placements and in a distant pointing context. The study showed that contextual pop-up menus increased performance, as compared to fixed location menus. An interesting point is that, in previous studies, the effects of the orientation of the floating menu relative to the user were not taken into consideration. In some studies [5][11], the menu was placed vertically. In other studies [1][13], the floating menu was tilted at a certain degree without any details about the choice of menu orientation. From the literature, we were not able to find the answers for our research problem, which involves both orientation of the floating menu and local interaction in a mixed reality context in an immersive environment. The experiment we conducted in this study was designed to help us understand the effect of orientation of the floating menu in a mixed reality context where the user’s hand directly touches virtual menu items.
3 Evaluation 3.1 Methodology We adopted the methodology presented in Part 9 of the ISO 9241 standard for non-keyboard input devices [9]. Specifically, we undertook a user study involving two-dimensional serial target selection. Performance was quantified by the throughput index (in bits per second (bps)) whose calculation is based on Fitts' law [6] and requires the measurement of effective index of difficulty (IDe) (cf. Formula (2)) and the average movement time (MT) (cf. Formula (1)).
Effects of Menu Orientation on Pointing Behavior in Virtual Environments
Throughput (TP ) =
IDe MT
147
(1)
Movement time is the mean trial duration over a series of target selection tasks. The effective index of difficulty is calculated based on the effective target width (We) and distance (D) of the target selected. SDx is the standard deviation of the over/ under-shoot projected onto the task axis for a given condition.
(
⎞ ⎛D IDe = log 2 ⎜⎜ + 1⎟⎟ where We = 4.133 × SDx ⎝ We ⎠
)
(2)
3.2 Participants Seven subjects (age range from 22 to 39 years old, all right-handed) participated in the evaluation, after being tested for normal vision and correct stereoscopic perception. Six have little to no experience with 3D stereo vision in a VR system. 3.3 Procedure The interpupillary distance of each subject was first measured at the beginning of the experiment. A paper with written instructions was then provided to the subject. Subjects were allowed to ask questions and for additional explanation only before the beginning of the test. After that, a calibration of a device tracking user fingers’ 3D position was carried out for each subject. Then, every subject was invited to stand in the centre of the CAVE system. Training trials were prepared in order to let subjects become acquainted with the 3D scenes and task in each condition. Afterwards, the real task began. After the experiment, each subject was requested to answer to some questions on a seven-point Likert scale. The questionnaire aimed at gathering subjective information regarding: ease of the experimental task, enjoyment, effectiveness, and frustration in relation to each experimental condition. Overall, each experimental session (including calibration phase) lasted for approximately 1 hour. 3.4 Apparatus and Task Subjects were presented with 9 circular targets, arranged in a circle on a virtual planar surface, projected in a 4-sided CAVE-like setup at the Mediterranean Virtual Reality Center (CRVM)1. The floating menu was positioned at the centre of the CAVE. The height of the virtual surface was adjusted according to the subject’s height to avoid fatigue of the subject’s arm. Subjects were free to choose their position so that they felt comfortable with the pointing task.
1
www.realite-virtuelle.univmed.fr
148
N.-T. Dang and D. Mestre
Fig. 1. Experimental task
Figure 1 illustrates the experimental target selection task. Subjects wore the A.R.T. Fingertracking device and used their index fingertip to touch the target (cf. Figure 1(b)). The order of presentation of the 9 targets was predefined as in Figure 1(a). Targets were highlighted in red (except target 1, which was highlighted in black, allowing the subject to rest) one at a time; subjects were asked to point to the highlighted target as quickly and accurately as possible using their index fingertip. Making a selection (whether a hit or a miss) ended the current trial. Stereoscopic viewing was obtained using Infitec® technology. Real-time tracking of the subject's viewpoint and fingers was obtained using an ART® system. Virtools ® software was used to build and control virtual scenarios, for experimental control and data recording. 3.5 Experimental Design Fours factors were taken into account in the study • • • •
Target width (W): 0,024 m, 0,036 m Target distance (D): 0,12 m, 0,24 m, 0,4 m Inclination of the floating menu : vertical (0°), 45°, horizontal (90°) (cf. Fig. 2) Block: 1, 2, 3, 4, 5, 6
Fig. 2. Three levels of inclination of the floating menu
Effects of Menu Orientation on Pointing Behavior in Virtual Environments
149
The combination of two values of target width and three values of target distance defined six IDs (Index of Difficulty) as follows: 2,12 (D=0,12, W=0,036), 2,58 (D=0,12, W=0,024), 2,94 (D=0,24, W=0,036), 3,46 (D=0,24, W=0,024), 3,60 (D=0,4, W=0,036), 4,14 (D=0,4, W=0,024). In total, there were 6048 recorded trials (7 subjects × 6 IDs × 3 inclinations × 6 blocks × 8 trials by ID). The dependent variables were movement time (s), error rate (percent), and throughput (bps). Results were analyzed with repeated measures ANOVAs.
4 Results and Discussion 4.1 Blocks Movement time with regards to the six trial blocks were respectively 653 ms (Standard Error (SE) = 71), 620 ms (SE=71), 627 ms (SE=68), 627 ms (SE=73), 620 ms (SE=65) and 604 ms (SE=62). The differences were not significant across the six blocks [F (5, 642) = 1,090, p = 0,6364]. Error rates corresponding to the six blocks were respectively 9,93% (SE=5.12%), 7.6% (SE=4,32%), 8,4% (SE=4,32%), 7,31% (SE=4,21%), 7,6% (SE=4,45%) and 7,6% (SE=4,32%). The differences in error rate were also not significant [F(5,642) = 0,992, p = 0,422]. As a matter of fact, there was no learning effect across blocks, indicating that subjects easily adapted to the experimental task and the input device.
653
Mouvement time 800
620
627
627
620
604
700 600 500 400 300
9,93%
7,60%
8,40%
200 7,31%
7,60%
7,60%
4
5
6
Movement time (ms)
Error rate
Error rate 1,00 0,90 0,80 0,70 0,60 0,50 0,40 0,30 0,20 0,10 0,00
100 0
1
2
3 Block
Fig. 3. Movement time and Error rate across six blocks of trial; vertical bars show standard errors
4.2 Movement Time and Error Rate Average movement time was 632 ms (SE=75), 599 ms (SE=52) and 644 ms (SE=75) in the 0°, 45° and 90° inclination of the floating menu respectively. The difference was significant (F(2,642) = 4,673, p=0,01). Post-hoc comparisons using Tukey test revealed significant differences in movement time between the 45° and 90° conditions (p<0,01). There was no difference between the 45° and 0° conditions. Average error rates corresponding to the three levels of inclination (0°, 45° and 90°) were respectively 6,2% (SE=3.56%), 6,12% (SE=3,57%) and 11,86% (SE=5,59%). The difference was significant [F(2,642) = 22,039, p < 0,001]. Post-hoc
150
N.-T. Dang and D. Mestre
comparisons (Tukey test) revealed a significant difference in error rate between the 45° and 90° conditions (p < 0,001) and between the 0° and 90° conditions (p < 0,001). There was no difference between the 45° and 0° conditions (p=0,997). Movement time 800
1,00 0,90 0,80 0,70 0,60 0,50 0,40 0,30 0,20 0,10 0,00
632
644
599
700 600 500 400 300
11,86% 6,12%
6,20%
200
Movement time (ms)
Error rate
Error rate
100 0
0
45
90
Inclination
Fig. 4. Movement time and Error rate (vertical bars show standard errors) corresponding to the three levels of inclination of the floating menu
4.3 Throughput The difference in throughput, which incorporates both speed and accuracy, was also significant (F(2,642)=6,807, p=0,001). The average throughput was 5.46 bps (SE=0,62), 5,48 bps (SE=0,52) and 5,01 bps (SE=0,64) respectively for the 0°, 45° and 90° conditions. Post-hoc comparisons using Tukey test revealed significant differences in throughput between the 45° and 90° conditions (p=0,003), and between the 0° and 90° conditions (p=0,005). There was no difference between the 45° and 0° conditions (p=0,991). Throughput
7
6
5
5,46
5,48 5,01
4
3 0
45
90
Fig. 5. Average throughput value (vertical bars show standard errors) as a function of the three levels of inclination of the floating menu
Effects of Menu Orientation on Pointing Behavior in Virtual Environments
151
4.4 Qualitative Results As stated before, after the experimental session, subjects were asked to fill a short questionnaire addressing the following questions: ease of the task, enjoyment, frustration and effectiveness with respect to the three level of inclination of the floating menu. Values reported in the histograms (cf. Fig. 6) are medians. Except frustration where high score referred to a high level of frustration from subjects, high scores in the other parameters corresponded to positive feedback. Median values regarding the ease of the task, for the 0° condition it was 5 (1st quartile = 5; 3rd quartile = 6); for the 45° condition it was 6 (5; 6) and for the 90° condition 3 (3; 5). Median values for the 0°, 45° and 90° conditions in terms of enjoyment were respectively 5 (5; 6), 5 (5; 6), and 3 (2; 3). Regarding the frustration with respect to each condition, median values were 2 (2; 3), 2 (2; 3), and 5 (5; 5) respectively for the 0°, 45° and 90° conditions. Finally, as for the effectiveness of different inclinations in supporting the pointing task, median value for the 0° condition was 5 (3; 5), for the 45° condition 6 (4; 7), and for the 90° condition 3 (2; 4). 0°
45°
90°
7 6
6
6 5
5
5
5
5
5 4 3
3
3
3 2
2
2 1 0 Ease of the Task
Enjoyment
Frustration
Effectiveness
Fig. 6. Results from the questionnaire (on a 0-7 Likert scale)
Additional analyses were performed on the questionnaire data using the Friedman test. There was a statistically significant difference in subjects’ feedback regarding the ease of the task [χ2(2) = 6,348, p = 0,042], the enjoyment [χ2(2) = 11,385, p = 0,003], the level of frustration [χ2(2) = 10,300, p = 0,006] and the effectiveness [χ2(2) = 8,400, p = 0,015]. Post-hoc comparisons using Wilcoxon signed ranks test revealed significant differences between the 45° and 90° conditions in subjects’ feedback regarding the ease of the task [Z = -2,132, p = 0,033], the enjoyment [Z = -2,414, p = 0,016], the frustration [Z = -2,220, p = 0,026], and the effectiveness [Z = -2,070, p = 0,042] . There were also significant differences between the 0° and 90° conditions in subject’s feedback regarding the enjoyment [Z = -2,410, p = 0,016], the frustration [Z = -2,041, p = 0,041], and the effectiveness [Z = -2,032, p = 0,038]. Overall, the 0° and
152
N.-T. Dang and D. Mestre
45° conditions received positive scores while the 90° condition received negative feedback from subjects. This result was in line with the quantitative results regarding the subject’s performance above.
5 Conclusion The focus of this evaluation was on the effect of menu orientation in a local pointing task. The vertical menu plane (inclination of 0°) and 45°-tilted menu plane resulted in better performance (shorter pointing time and lower error rate), as compared to the horizontal floating menu (i.e. the inclination of 90°). Even though menu items in the three conditions of inclination used in the present study were within reach of subjects' hand, inclination seemed to affect users' pointing to menu items. We suggest that a horizontal menu orientation has to be avoided since this configuration potentially lead to difficulties in judging the position of menu items and subsequently in pointing to those targets. Acknowledgement. The authors wish to thank Jean-Marie Pergandi, Pierre Mallet, Vincent Perrot at CRVM and all the participants of the evaluation. This work was carried out in the framework of the VIRTU’ART project, sponsored by the Pole PEGASE, funded by the PACA region and the French DGCIS.
References 1. Bernatchez, M., Robert, J.-M.: Impact of Spatial Reference Frames on Human Performance in Virtual Reality User Interfaces. Journal of Multimedia 3(5), 19–32 (2008) 2. Bowman, D., Kruijff, E., Joseph, J., LaViola, J., Poupyrev, I.: 3D user interfaces: theory and practice. Addison-Wesley, Reading (2004) 3. Bowman, D.A., Wingrave, C.A.: Design and evaluation of menu systems for immersive virtual environments. In: Proceedings of IEEE Virtual Reality 2001, pp. 149–156 (2001) 4. Dachselt, R., Hübner, A.: Virtual Environments: Three-dimensional menus: A survey and taxonomy. Comput. Graph. 31(1), 53–65 (2007) 5. Das, K., Borst, C.W.: An Evaluation of Menu Properties and Pointing Techniques in a Projection-Based VR Environment. In: IEEE 3D User Interfaces (3DUI), pp. 47–50 (2010) 6. Fitts, P.M.: The information capacity of the human motor system in controlling the amplitude of movement. J. Exp. Psychology 47, 381–391 (1954) 7. Gerber, D., Bechmann, D.: The Spin Menu: A Menu System for Virtual Environments. In: Proceedings of the 2005 IEEE Conference 2005 on Virtual Reality (VR 2005), pp. 271– 272. IEEE Computer Society, Washington, DC (2005) 8. Grosjean, J., Burkhardt, J., Coquillart, S., Richard, P.: Evaluation of the Command and Control Cube. In: IEEE International Conference on Multimodal Interfaces, pp. 473–478 (2002) 9. ISO, Ergonomic requirements for office work with visual display terminals (VDTs) – Part 9: Requirements for non-keyboard input device. International Organization for Standardization (2000) 10. Jacoby, R., Ellis, S.: Using Virtual Menus in a Virtual Environment. In: Proceedings of SPIE: Visual Data Interpretation, pp. 39–48 (1992)
Effects of Menu Orientation on Pointing Behavior in Virtual Environments
153
11. Kim, N., Kim, G.J., Park, C.-M., Lee, I., Lim, S.H.: Multimodal Menu Presentation and Selection in Immersive Virtual Environments. In: Proceedings of the IEEE Virtual Reality 2000 Conference (VR 2000). IEEE Computer Society, Washington, DC (2000) 12. Ni, T., McMahan, R.P., Bowman, D.A.: rapMenu: Remote Menu Selection Using Freehand Gestural Input. In: 3DUI 2008: IEEE Symposium on 3D User Interfaces, pp. 55– 58 (2008) 13. Wloka, M.M., Greenfield, E.: The Virtual Tricorder: A Uniform Interface for Virtual Reality. In: UIST 1995: Proceedings of the 8th Annual ACM Symposium on User Interface and Software Technology, pp. 39–40. ACM Press, New York (1995)
Some Evidences of the Impact of Environment’s Design Features in Routes Selection in Virtual Environments Emília Duarte1, Elisângela Vilar2, Francisco Rebelo2, Júlia Teles3, and Ana Almeida1 1
UNIDCOM/IADE – Superior School of Design, Av. D. Carlos I, no. 4, 1200-649 Lisbon, Portugal 2 Ergonomics Laboratory and 3 Mathematics Unit, FMH/Technical University of Lisbon, Estrada da Costa, 1499-002 Cruz Quebrada, Dafundo, Portugal
[email protected], {elivilar,frebelo,jteles}@fmh.utl.pt,
[email protected]
Abstract. This paper reports results from a research project investigating users’ navigation in a Virtual Environment (VE), using immersive Virtual Reality. The experiment was conducted to study the extent that certain features of the environment (i.e., colors, windows, furniture, signage, corridors’ width) may affect the way users select paths within a VE. Thirty university students participated in this study. They were requested to traverse a VE, as fast as possible and without pausing, until they reached the end. During the travel they had to make choices regarding the paths. The results confirmed that the window, corridors’ width, and exit sign factors are route predictors in the extent that they influence the paths selection. The remaining factors did not influence significantly the decisions. These findings may have implications for the design of environments to enhance wayfinding. Keywords: Virtual Reality; Wayfinding; paths selection; environmental features.
1 Introduction This paper reports results from a research project investigating users’ navigation in a Virtual Environment (VE), using immersive Virtual Reality (VR). The experiment was conducted to study the extent that certain features of the environment (i.e., colors, windows, furniture and corridors’ width) and the signage presence (i.e., Exit sign) may affect the way users select paths within a VE. In general, the studies in the area of wayfinding have two main focuses: the internal information (cognitive representation) and external information (environmental features), conceptualized by Norman [1] as “knowledge in the head” and “knowledge in the world”, being both essentials for people daily functioning. When interacting with a building for the first time, people rely on the external information (knowledge in the world), which can complement their internal information (knowledge in the head) in order to be successful in their orientation and navigation through the new environment. Based in these concepts, Conroy [2] R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 154–163, 2011. © Springer-Verlag Berlin Heidelberg 2011
Some Evidences of the Impact of Environment’s Design Features
155
suggests that for wayfinding the external information is presented in many forms and cognition levels. In a lower level of awareness, this information can be considered implicit in the overall configuration and structure of the environment. In contrast, the external knowledge is explicit in the form of, for instance, signage, in a higher level of awareness. Route selection is a fundamental stage for the wayfinding process together with the orientation and the recognition of the destination [3]. However, few studies were done based on the environmental features (external information) that may influence the decision about the path to follow and they are usually based on signage [4, 5]. Wayfinding studies have shown the importance of several cues in the navigational process, mainly in unfamiliar places, such as landmarks, areas differentiations (e.g., color, textures), and the overall floor plan [6, 7]. Furthermore, any factors that can cognitively facilitate wayfinding should be considered. As suggested by previous studies [2, 8, 9], VR offers several potential advantages over traditional means (e.g., observation, paper/pencil tests) of assessing people’s wayfinding behavior. For example, problems related with the manipulation and control of variables, data collection, as well as ecological validity can be overcome. Thus, this study was designed to examine the influence of several features of the environment, in a lower level of awareness, on paths selection, in order to determine if such variables can be considered as route predictors. Participants were asked to traverse a novel route, containing several decision points in which they had to select a path from two alternatives.
2 Method 2.1 Sample Thirty university students, 14 females and 16 males, aged 19 to 36 years old participated in this within-subjects study (mean age = 23.30, SD = 4.48), and each one made 36 trials (6 choices per factor). They had no previous experience with navigation in VEs. Participants had normal sight or had corrective lenses and no color vision deficiencies. They reported no physical or mental conditions that would prevent them from participating in a VR simulation. 2.2 Apparatus During both training and testing stages, participants were seated and viewed the VE at a resolution of 800 × 600 pixels, at 32 bits, with a FOV 30°H, 18°V and 35°D through a Head-Mounted Display (HMD) from Sony®, model PLM-S700E. The participants’ viewpoint was egocentric. Participants were free to look around the VE since a magnetic motion tracker from Ascension-Tech®, model Flock of Birds®, monitored their head movements. A joystick from Thrustmaster® was used as a navigation device. The speed of movement gradually increased from stopped to an average walk pace (1.2 m/s) to a maximum speed around 2.5 m/s. Wireless headphones from Sony®, model MDR-RF800RK, allowed them to listen to an instrumental ambient music (i.e., elevator music). The simulation was running in a Windows® graphics workstation, equipped with a graphics card (NVIDIA® Quadro FX4600). An external
156
E. Duarte et al.
monitor was used to display the same image of the VE that was being displayed to the participants. Hence, the researcher could watch the interaction, inside and outside the VE, and at the same time, take notes and manage the operation of the ErgoVR system. The ErgoVR system [10], developed in the Ergonomics Laboratory, of the FMH-Technical University of Lisbon, allowed not only the display of the VE but also the automatic collection of data such as the duration of the simulation, distance and path taken by the participants. 2.3 Virtual Environment The VE is comprised by a sequence of 36 modules, which were composed by two parallel corridors, measuring 6 m long by 2 m wide (with exception of a special narrow corridor which was 1.2 m wide), which started and ended in distribution halls (junctions). The initial hall contained the decision point where the participants needed to make a choice regarding travelling left or right. A short corridor, measuring 2.5 × 2 m, connects the modules. Thus, the layout of the VE takes the shape of a chain with 36 “O” linked by short connection corridors. Fig. 1 depicts a section of the VE.
Fig. 1. A section of the VE’s floor plan, showing the 1st and 2nd decision points and the alternative corridors
The interior surfaces of the VE were textured and illuminated to resemble a common interior passageway space. The modules were numbered to facilitate the participants’ location, at any time of the simulation, as well as to keep them informed about their progression along the VE, although they were not aware of the total number of modules. The outside image displayed in the window, as well as the exit sign, were bitmaps. The chairs, which existed in the module with furniture, were 3D objects. The base structure of the VE was designed using AutoCAD® 2009, and then it was imported into 3D Studio Max® 2009 (both from Autodesk, Inc.). The VE was then exported, using a free plug-in called OgreMax (v. 1.6.23), to be used by the ErgoVR. 2.4 Experiment Design The experiment was divided in two stages: training followed by testing. The procedures for each stage are described in the next topic.
Some Evidences of the Impact of Environment’s Design Features
157
For the testing stage, the study used a within-subjects design, comprising 36 trials (one per junction), which resulted from the combination of six factors being presented six times, in which their two alternative categories were positioned in an interchangeable way on the left/right corridors, by a sequence defined accordingly to a Latin Square. The factors and their alternative categories were: (1) color: yellow/blue; (2) window: no-window/window; (3) width: narrow (1.20 m)/large (2.00 m) corridor; (4) furniture: no-chairs/chairs; (6) signage: no-exit sign/exit sign; plus a neutral condition (5) empty corridors, equally dimensioned and white colored. Fig. 2 shows screenshots of the corridors, taken at the junctions/decision points, for each factor, with their alternative categories.
Fig. 2. Screenshots of the corridors, taken at the decision point, for each factor (1 - color; 2 - corridor’s width; 3 - window; 4 - furniture; 5 - neutral; 6 - signage)
With exception of the neutral condition (5), the other factors were hypothesized to have an impact on peoples’ wayfinding behavior and, therefore, be considered as predictors of a route selection. The signage was added to the environmental factors since its influence on navigation/wayfinding it is well known and, this way, would provide a ceiling measure to which compare the effect of the other factors. 2.5 Procedure All participants were tested individually, and the entire experiment lasted approximately 20 minutes. Before starting the training stage, a brief explanation about the study and an introduction to the equipment was given to the participants. The Ishihara Test [11] was used to detect color vision deficiencies. Next, participants were asked to sign an Informed Consent Form and to complete a demographic questionnaire. This questionnaire asked for information regarding age, gender, participants’ use of computers and prior experience with VR.
158
E. Duarte et al.
At the training stage, participants familiarized themselves with the VR equipment, for 3-5 minutes, by exploring a VE designed only for training purposes. The goal of this stage was to get the participants acquainted with the setup and to make a preliminary check for any initial indications of simulator sickness. The training VE consisted of two rooms with 65 m2 (5 × 13 m), without windows, connected by a door and containing a number of obstacles (e.g., narrow corridors, pillars, etc.), requiring some skill to be circumvented. There was no time pressure during the training stage. Subjects were encouraged to freely explore the VE, and make several turns around the pillar as smoothly and accurately as possible. At the testing stage verbal instructions were given to the participants requesting them to travel along the VE until reaching the end, as fast as possible, without pausing. Turning back was not allowed at any point of the way, but participants could request to stop the procedure at any time. During the testing stage, which required about 10 minutes, subjects took 36 decisions (trials) regarding the paths (left or right), by the same order for all of them. On reaching the end of the route, it was shown to the participants the message – “End. Thank you” – displayed on the wall. Finally, after the testing stage, subjects were asked to fill a post-hoc questionnaire to obtain subjective feedback and preferences of the participants regarding the factors (the data regarding the questionnaire is not included in this paper).
3 Results 3.1 Decision-Making at Junctions The ErgoVR software automatically registered the paths taken by the participants. The main dependent variables were the decisions regarding the factors at junctions, which were dichotomous variables taking the values 1 or 0 depending on the choices that were made by the participants regarding the factors’ categories. All 30 participants took 36 decisions, thus providing six values for each factor. Considering all the participants, the percentages of choices for each category, by factor, are shown in Table 1. Table 1. The percentages of choices made by the participants in 36 trials (6 per factor) Color Blue 41.7%
Yellow 58.3% Furniture Chair No-chair 62.2% 37.8%
Window Window No-window 85.0% 15.0% Signage Sign No-sign 83.9% 16.1%
Corridor’s width Narrow Large 26.1% 73.9% Neutral Right Left 40.6% 59.4%.
Binomial tests with Bonferroni adjustments were used to evaluate if the abovementioned factors significantly affected the route selection and, therefore, could be considered predictors of route selection. The significant values (Adj. Sig) are presented in bold type. The outputs of the Binomial tests with Bonferroni adjustments, for each factor, are presented in Table 2 through Table 7.
Some Evidences of the Impact of Environment’s Design Features
159
Table 2. The outputs of Binomial test for factor color Choice
Category
N
1 2 3 4 5 6
yellow | blue yellow | blue yellow | blue yellow | blue yellow | blue yellow | blue
11 | 19 22 | 8 22 | 8 13 | 17 17 | 13 20 | 10
Observed proportion 0.37 | 0.63 0.73 | 0.27 0.73 | 0.27 0.43 | 0.57 0.57 | 0.43 0.67 | 0.33
Test proportion 0.5 0.5 0.5 0.5 0.5 0.5
Sig.
Adj.Sig.
0.200 0.016 0.016 0.585 0.585 0.099
1 0.096 0.096 1 1 0.594
Table 3. The outputs of Binomial test for factor window Choice
Category
N
1 2 3 4 5 6
window | no window window | no window window | no window window | no window window | no window window | no window
30 | 0 22 | 8 21 | 9 29 | 1 27 | 3 24 | 6
Observed proportion 1.00 | 0.00 0.73 | 0.27 0.70 | 0.30 0.97 | 0.03 0.90 | 0.10 0.80 | 0.20
Test proportion 0.5 0.5 0.5 0.5 0.5 0.5
Sig.
Adj.Sig.
<0.001 0.016 0.043 <0.001 <0.001 0.001
<0.001 0.096 0.258 <0.001 <0.001 0.006
Table 4. The outputs of Binomial test for factor corridor’s width Choice
Category
N
1 2 3 4 5 6
narrow | large narrow | large narrow | large narrow | large narrow | large narrow | large
10 | 20 4 | 26 7 | 23 7 | 23 11 | 19 8 | 22
Observed proportion 0.33 | 0.67 0.13 | 0.87 0.23 | 0.77 0.23 | 0.77 0.37 | 0.63 0.27 | 0.73
Test proportion 0.5 0.5 0.5 0.5 0.5 0.5
Sig.
Adj.Sig.
0.099 <0.001 0.005 0.005 0.200 0.016
0.594 <0.001 0.030 0.030 1 0.096
Table 5. The outputs of Binomial test for factor furniture Choice
Category
N
1 2 3 4 5 6
chair | no chair chair | no chair chair | no chair chair | no chair chair | no chair chair | no chair
23 | 7 16 | 14 19 | 11 22 | 8 18 | 12 14 | 16
Observed proportion 0.77 | 0.23 0.53 | 0.47 0.63 | 0.37 0.73 | 0.27 0.60 | 0.40 0.47 | 0.53
Test proportion 0.5 0.5 0.5 0.5 0.5 0.5
Sig.
Adj.Sig.
0.005 0.856 0.200 0.016 0.362 0.856
0.003 1 1 0.096 1 1
160
E. Duarte et al. Table 6. The outputs of Binomial test for factor signage
Choice 1 2 3 4 5 6
Category sign | no sign sign | no sign sign | no sign sign | no sign sign | no sign sign | no sign
N 26 | 4 25 | 5 22 | 8 30 | 0 23 | 7 25 | 5
Observed proportion 0.87 | 0.13 0.83 | 0.17 0.73 | 0.27 1.00 | 0.00 0.77 | 0.23 0.83 | 0.17
Test proportion 0.5 0.5 0.5 0.5 0.5 0.5
Sig.
Adj.Sig.
<0.001 <0.001 0.016 <0.001 0.005 <0.001
<0.001 <0.001 0.096 <0.001 0.030 <0.001
Table 7. The outputs of Binomial for neutral (left vs. right) Choice 1 2 3 4 5 6
Category left | right left | right left | right left | right left | right left | right
N 15 | 15 17 | 13 21 | 9 18 | 12 20 | 10 16 | 14
Observed proportion 0.50 | 0.50 0.57 | 0.43 0.70 | 0.30 0.60 | 0.40 0.67 | 0.33 0.53 | 0.14
Test proportion 0.5 0.5 0.5 0.5 0.5 0.5
Sig.
Adj.Sig.
1.0 0.585 0.043 0.362 0.099 0.856
1 1 0.258 1 0.594 1
3.2 Consistency of Decision-Making at Junction The participants’ behavior for each factor, in what regards to the choices made during the 6 trials, was classified in one of the three mutually exclusive categories of consistency: inconsistent (when 2 or more choices are different from the remaining), curious (when only 1 choice is different) and consistent (when all choices are equal). Fig. 3 shows the number of participants, for each behavioral category, regarding consistency, by factor.
Fig. 3. The participant’s behavior relating to the choices made during the travel
For each factor, the options made by those classified as consistent were analyzed to find their object of loyalty, i.e., which was the category they always selected (see Table 8).
Some Evidences of the Impact of Environment’s Design Features
161
Table 8. The participant’s object of loyalty
Consistent
Consistent
Color Blue Yellow 80.0% 20.0% Furniture Chair No-chair 90.0% 10.0%
Window Window No window 100.0% 0.0% Signage Sign No-sign 100.0% 0.0%
Width Narrow Large 8.3% 91.7% Neutral Right Left 66.7% 33.3%
A Cochran test was used to test if the probability of the participants being consistent in what regards their choices is equal in all factors. For this test, two behavioral categories were computed: consistent and inconsistent, with the latter including the curious behavior. The results suggest that the probability of participants’ being consistent is not equal for all factors (Q(2) = 6.186, p = 0.006, N = 30). Pairwise comparisons reveal that this difference is due to window and color factors (p-value = 0.020). 3.3 Left of Right Bias Since some participants were not consistent in their choices, regarding the factors’ categories, it was hypothesized that such inconsistency could be due to a left/right bias (i.e., the preference for left or right side). Thus, data from those participants classified as inconsistent were analyzed in order to check if they were, instead, consistent regarding the side. For the neutral factor were taken in consideration all participants independently of their behavioral classification. Results revealed, from a group of 78 cases of inconsistent behaviors regarding the factors choices, a total of 13 cases (16.6%) of consistency relating side choices. From these last cases, only 3 selected always the left corridor (Color: n = 5, all in right; Window: n = 3, all in right; Width: n = 1, in left; Furniture: n = 4, all in right; Signage: n = 0; Neutral: n = 6, left = 2, right = 4). The number of cases of curious, relating side, was 18 (Color: n = 4, all in right; Window: n = 0; Width: n = 4, left = 3, right = 1; Furniture: n = 4, all in right; Signage: n = 4, left = 2, right = 2; Neutral: n = 4, all in right).
4 Conclusions This study had the objective of determining the effect of factors as color, window, width, furniture, and signage on the paths selection in a VE and, therefore, to determine if such factors could be considered as predictors for route selection. Participants were requested to traverse a VE and to take 36 decisions during the travel. They were confronted, six times, with two alternative and parallel corridors containing diverse, sometimes reverse (e.g., window vs. no window), categories of the mentioned factors. Examination of the statistics reveals that the majority of the participants prefer large corridors, painted with yellow color, with window, with chairs and, finally, in corridors marked with an exit sign. Additionally, when facing corridors that are
162
E. Duarte et al.
similar (neutral) the participants prefer mostly the left side. However, despite the differences found, the results of Binomial tests with Bonferroni adjustments suggest that, considering that at least three significant differences in six trials could identify a factor as a route predictor, the window, width and signage could be considered route predictors. We also highlight that a window seems to be a factor almost as important as the sign. The results did not reveal the color factor as route predictor; furthermore, the outcomes attained for this factor were similar to those achieved by the neutral factor. Two main reasons can explain this outcome. First, this might be due to the fact of having two colors in dispute. If participants had to choose between an achromatic corridor (i.e. white or gray) and one that is colored the results might be different. Second, the small diference can be due to the chosen colors. It is also clear that the participants were not always consistent in their preference for a factor’s category. The Cochran test suggests that the percentage of consistent individuals is not equal for all factors. The difference was between the window and color factors: the factor with higher percentage of consistency was the window and the factor with lower percentage was color. Signage, which was expected to be the most influential, was classified below the window in what regards consistency. This outcome reinforces the importance of the window as route predictor in contrast to the irrelevance of the color factor. In what regards to the left/right bias, in the neutral condition the majority of the participants selected the left corridor, but the differences were not statistically significant. However, those participants that were not consistent in what regards the choices made for the factors revealed a preference for right side. Further analysis could give more insight about the route decision, such as time to make the decision at each decision-making point and participants’ handedness. Additionally, results from participants’ reports, gathered with the post-hoc questionnaire, can help to explain which were the strategies adopted by them in what regards to the choices considering a specific factors and/or to complete the route. VR, with the setup used, has proved to be an adequate tool for this kind of studies. Nevertheless, other VR display techniques, such as the use of stereoscopic devices with larger field of view could benefit this kind of research. The findings of this study may have implications for the design of environments to enhance wayfinding.
References 1. Norman, D.A.: The Design of Everyday Things, 3rd edn. MIT Press, New York (1998) 2. Conroy, R.: Spatial navigation in immersive virtual environments. Unpublished Dissertation, University of London, London (2001) 3. Carpman, J.R., Grant, M.A.: Design That Cares: Planning Health Facilities for Patients and Visitors, 2nd edn. Jossey-Bass, New York (2001) 4. O’Neill, M.J.: Effects of Signage and Floor Plan Configuration on Wayfinding Accuracy Environment and Behavior, vol. 23(5), pp. 553–574 (1991) 5. Smitshuijzen, E.: Signage Design Manual, 1st edn. Lars Muller, Baden (2007) 6. Passini, R.E.: Wayfinding in complex buildings: An environmental analysis. ManEnvironment Systems 10, 31–40 (1980) 7. Weisman, G.D.: Evaluating architectural legibility: Wayfinding in the built environment. Environment and Behavior 13, 189–204 (1981)
Some Evidences of the Impact of Environment’s Design Features
163
8. Cubukcu, E.: Investigating Wayfinding Using Virtual Environments. Unpublished Dissertation. Ohio State University, Ohio (2003) 9. Cubukcu, E., Nasar, J.L.: Influence of physical characteristics of routes on distance cognition in virtual environments. Environment and Planning B: Planning and Design 32(5), 777–785 (2005) 10. Teixeira, L., Rebelo, F., Filgueiras, E.: Human interaction data acquisition software for virtual reality: A user-centered design approach. In: Kaber, D.B., Boy, G. (eds.) Advances in Cognitive Ergonomics, pp. 793–801. CRC Press, Boca Raton (2010) 11. Ishihara, S.: Test for Colour-Blindness, 38th edn. Kanehara & Co., Ltd., Tokyo (1988)
Evaluating Human-Robot Interaction during a Manipulation Experiment Conducted in Immersive Virtual Reality Mihai Duguleana, Florin Grigorie Barbuceanu, and Gheorghe Mogan Transylvania University of Brasov, Product Design and Robotics Department, Bulevardul Eroilor, nr. 29, Brasov, Romania {mihai.duguleana,florin.barbuceanu,mogan}@unitbv.ro
Abstract. This paper presents the main highlights of a Human-Robot Interaction (HRI) study conducted during a manipulation experiment performed in Cave Automatic Virtual Environment (CAVE). Our aim is to assess whether using immersive Virtual Reality (VR) for testing material handling scenarios that assume collaboration between robots and humans is a practical alternative to similar real live applications. We focus on measuring variables identified as conclusive for the purpose of this study (such as the percentage of tasks successfully completed, the average time to complete task, the relative distance and motion estimate, presence and relative contact errors) during different manipulation scenarios. We present the experimental setup, the HRI questionnaire and the results analysis. We conclude by listing further research issues. Keywords: human-robot interaction, immersive virtual reality, CAVE, presence, manipulation.
1 Introduction One of the most important goals of robotics researchers is achieving a natural interaction between humans and robots. For attaining this, a great deal of effort is spent in designing and constructing experiments involving both robots and human operators. Using real physical structures in everyday experiments implies time consuming testing activities. Considering VR has lately emerged as a prolific approach to solving several issues scientists encounter during their work, some recent robotics studies propose the usage of immersive VR as a viable alternative to classic experiments. But what are the advantages gained by using simulated robots instead of real ones? Modeling robot tasks in VR solves hardware troubleshooting problems. From programmer’s point of view, when implementing i.e. a new attribute for real robots, a lot of time is spent on setting up collateral systems, time that can be saved using simulation software [1]. VR solves uniqueness problems. Most research laboratories have one or two study platforms which have to be shared between several researchers. Using simulation eliminates the problem of concurrent use [2]. Simulation lowers the R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 164–173, 2011. © Springer-Verlag Berlin Heidelberg 2011
Evaluating Human-Robot Interaction during a Manipulation Experiment
165
entry barrier for young scientists and improves education process [3]. Using only a personal computer, inexperienced robot researchers can develop complex applications in which they can program physical constrains of virtual objects, for obtaining results close to reality [4]. When developing an experiment that involves both robots and humans, aside solving trivial problems, one must also handle aspects concerning HRI. HRI has been defined as the process of understanding and shaping the interaction between humans and robots. By some recent studies [5], HRI has 5 primary attributes: level of autonomy, nature of information exchange, the structure of human and robot teams involved in interaction, the human/robot training process and the task shaping process. Assessing HRI translates into a methodical “measurement” of its 5 attributes [6]. As humans use their hands all the time, the transfer of objects between humans and robots is one of the fundamental forms of HRI that integrates these attributes. Thus, analyzing a manipulation scenario is a straight way to assess fundamental HRI particularities [7]. Some scientists have identified a set of generic HRI metrics (intervention response time, judgment of motion, situation awareness and others) and a set of specialized HRI metrics for manipulation experiments (degree of mental computation, contact errors and others) [8]. In this paper, we are focusing on conducting a manipulation experiment within CAVE immersive VR environment. As it is not yet clear whether using virtual robots has the same impact on HRI as using real equipment, we propose assessing the difference between using real robots in real physical scenarios versus using virtual robots in virtual scenarios, based on previous work in presence-measuring [9; 10]. In order to validate the experiment setup, a questionnaire which targets both types of HRI metrics was designed and applied to several subjects. The variables identified as conclusive for the purpose of this study are: the percentage of tasks successfully completed, the average time to complete a task, the average time to complete all tasks, relative distance and motion estimate for VR tests and relative contact errors. In order to measure presence, some of these variables are also assessed during tests with the real robot.
2 Contribution to Technological Innovation Although there have been several attempts to clearly determine the nature of HRI within VR, most of the literature focuses on scenarios built upon non-immersive simulation software. Most work in presence-measuring uses non-immersive VR as comparison term. Furthermore, most of the studies focus on measuring the “sociable” part of robots as seen by human operator, rather than measuring the performance attained by direct collaboration between humans and robot, as in the case of a handto-hand manipulation scenario [9; 10]. Although the presented manipulation experiment is intended to be a proof-ofconcept (as several equipment, administrative and implementation issues need to be solved before using the results presented this paper), the computed questionnaire data shows that the proposed approach is suitable to be extended to generic research in HRI and robotic testing.
166
M. Duguleana, F.G. Barbuceanu, and G. Mogan
2.1 Designing the Virtual Robot and Working Environment Nowadays, scientists have at their disposal several pieces of software that can help them to achieve a satisfying simulation of their scenarios. Starting from modeling and ending with the simulation itself, one can use various programs such as SolidWorks or CATIA for CAD design, low level file formats built for 3D applications such as VRML/X3D or COLLADA, for animating their designs, or more focused robot simulators like Player Project, Webots or Microsoft Robotics Developer Studio. For our experiment, we have modeled PowerCube robot arm using CATIA software. The resulted CAD model has been exported as meshes in C++/OGRE (see Fig. 1) and XVR programming environments, as these offer stereo vision capabilities needed to include the arm model in CAVE.
Fig. 1. PowerCube arm exported from CATIA in C++/OGRE
The collision detection of the arm with itself and with objects from the virtual world is handled by MATLAB, within the arm controlling algorithm. In real world, working environments have irregular shaped obstacles. These may vary in size and location with respect to the arm position. For simplicity reasons, we have defined 3 classes of obstacles which may be added to the virtual workspace: - Spheres: (x,y,z, sphere radius). - Parallelograms: (x,y,z, length, width, height). - Cylinders: (x,y,z, radius, height). The collision detection is implemented using sphere covering technique (see Fig. 2). The representation of the world is a set of radii and centers that models each object as a set of spheres. During arm movement, the world is checked to verify that no collisions happen between spheres (the distance between any 2 circles belonging to different sets is higher than the sum of their radii). The clustering into spheres is done using k-means clustering algorithm [11]. A number of clusters is chosen based on the resolution of the world.
Evaluating Human-Robot Interaction during a Manipulation Experiment
167
Fig. 2. In left (a) it is presented a configuration space with 4 obstacles and in right (b), its equivalent after applying the sphere covering function
2.2 PowerCube Control Manipulation assumes solving inverse kinematics and planning the motion for the robot arm used in our experiments. In a previous study [12], an arm controller was developed based on a double neural network path planner. Using reinforcement learning, the control system solves the task of obstacle avoidance of rigid manipulators, such as PowerCube. This approach is used for the motion planning part of the virtual and the real robotic arm from our study. According to algorithm performance specifications, the average time for reaching one target in an obstacle free space is 13,613 seconds, while the average time for reaching one target in a space with one obstacle (a cylinder bar located at Cartesian coordinates 5;15;-40) is 21,199 seconds. The proposed control system is built in MATLAB. In order to achieve a standalone application, a link between MATLAB and C++ is needed (see Fig. 3). Unfortunately, creating a shared C++ library using MATLAB compiler is not a valid solution, as custom neural network directives cannot be deployed. Using MATLAB Engine to directly call for .m files is also not suitable for bigger projects. In the end, a less-conservative method was chosen: a TCP/IP server-client communication. The MATLAB sender transmits trajectory details (the angles vector captured at discrete amounts of time) to C++ receiver. 2.3 Experiment Design The experiment is spit into 2 parts, one handling tests in VR and the other handling tests in real environment. The VR tests have the following prerequisites: - A virtual target object (a bottle) is attached to the position returned by the hand trackers (see Fig. 4).
168
M. Duguleana, F.G. Barbuceanu, and G. Mogan
Fig. 3. The correspondence between MATLAB and C++/OGRE programming environments, from XOY and XOZ perspectives
- Human subjects are mounted with optical sensors on their head (for CAVE head tracking) and on their hands (for determining the target position). - The subjects are asked to perform consecutively 3 tasks in 3 different scenarios, briefly described before commencing the experiment. - When the robotic arm reaches targets’ Cartesian coordinates (with a small error of 0.5cm), we suppose it automatically grasps the bottle. - In all tests, the modeled robot arm starts with the angle configuration (0;90;0;90;0;0) which translates in the position presented in Fig. 3. Scenario 1. In the first VR scenario, the subject is asked to place the virtual object that is automatically attached to his hand in a designated position. The workspace contains the PowerCube robot arm, an obstacle shaped as a cylinder bar with (5, 15, -40, 3, 80) parameters (see Fig. 3) and the target object, a 0.5l bottle. The manipulator waits until the subject correctly places the bottle, then using the motion planning algorithm from MATLAB which generates an obstacle free trajectory to the designated position, it moves towards target. After reaching the bottle, it automatically grasps it and moves it to a second designated position (see Fig. 5).
Evaluating Human-Robot Interaction during a Manipulation Experiment
169
Fig. 4. Passive optical markers mounted on subject’s hand
Fig. 5. Scenario 1 of the HRI experiment in CAVE
Scenario 2. In the second VR scenario, the subject is asked to freely move the virtual object automatically attached to his hand within the workspace, which is, in this case, obstacle free. Using the motion planning algorithm from MATLAB, the manipulator dynamically follows the Cartesian coordinates of the bottle. Subjects are allowed to move the target maximum 30 seconds. After this timeframe, if the robotic arm hasn’t already reached it, the bottle will keep fixed its last Cartesian coordinates. After reaching the bottle, the arm automatically grasps it and moves it to a designated position.
170
M. Duguleana, F.G. Barbuceanu, and G. Mogan
Scenario 3. The third VR scenario assumes the arm has the target object in its gripper. The role of the subject is to reach the bottle, and then place it at a designated position. When the Cartesian coordinates returned by the passive optical markers placed on the subjects hand reach the bottle with same small error of 0.5cm, the virtual bottle is automatically attached to subject’s hand. In order to measure presence, a scenario similar to scenario 1 was tested in the real environment. Subjects are asked to place a bottle in the arm’s gripper. PowerCube is then controlled to place the bottle in a designated position (see Fig. 6).
Fig. 6. Real environment scenario; the robot receives from subject a plastic bottle, which will be placed on a chair
2.4 HRI Questionnaire Design The HRI questionnaire was developed to gain information about subjects’ perception when interacting with the real and the virtual PowerCube manipulators. All questions may be answered with a grade from 1 to 10. The proposed HRI questionnaire contains 12 questions divided into 3 parts: - The biographical information section contains data related to the technological background of the study participants. Using questions from this section, we are trying to categorize subjects by their bio-attributes (age, sex), the frequency of computer use and their experience with VR technologies and robotics. Some examples of questions addressed here are: „How often do you use computers?”, „How frequent are you interacting with robotic systems?”, „How familiarized are you with VR?” - The specific information section refers to the particularities subjects encounter during the manipulation experiment. In order to measure the relative motion of the
Evaluating Human-Robot Interaction during a Manipulation Experiment
171
arm, this section includes questions such as „How fast do you think the robot arm was moving?”. The relative distance and relative contact errors are measured using questions such as „How exact was the robot arm in picking/grasping/placing the target?”. Overall impressions are measured by questions such as „How often did you feel that PowerCube (both virtual and real) was interacting with you?” and „How satisfying do you find the control algorithm of PowerCube?”. The presence measuring section contains questions that try to assess the difference between using a real PowerCube versus using a simulated one. Considering other studies in presence, we have settled to measure 2 types of presence: presence as perceptual realism and presence as immersion [9]. In order to measure presence as perceptual realism, we asked „How real did the overall VR experience feel, when compared with the real equipment?”. Presence as immersion was measured by questions such as „How engaging was the VR interaction?”. We have also addressed some open-ended questions at the end of this section, such as „What was missing from virtual PowerCube that would make it seem more appropriate to the real PowerCube?”. 2.5 Experiment Trials 22 students, 4 PhD. students and 3 persons from the university administrative staff participated as subjects in this experiment. The experiment took an average of 20 minutes per subject, and answering the HRI questionnaire took an average of 5 minutes per subject. The results are centralized in Table 1. Other variables that were measured during our experiment are the percentage of tasks successfully completed – 99.1379%, the average time to complete a scenario – 4 minutes and 8 seconds, and the average time to complete all 4 scenarios – 16 minutes and 32 seconds. The open question received suggestions such as paying better attention to environment details, objects texture and experiment lighting conditions. Some of the subjects inquired about the possibility of integrating haptic devices that could enhance the realism of the simulation. 2.6 Discussion of Results Overall, the centralized results from the HRI questionnaire allow us to conclude that robot’s presence affects HRI. The result of question 10 (7.7586 on a 1-to-10 scale) shows that using immersive VR is a great way of simulating robotic scenarios. However, as reported in other studies [10], subjects gave the physically present robot more personal space than in VR. Most of our subjects enjoyed interacting with both the real and the virtual robot – on average, our test subjects found that interacting with the virtual PowerCube is an experience worth rated at 8.5862 on a 1-to-10 scale. The nature of the arm (fully mechanical, not anthropomorphic) made the subjects to rate question 8 with only 5.1724 on a 1-to-10 scale. However, the arms’ control algorithm seems to be fairly satisfying (7.6896), as it is very accurate (9.1379). Its main reported drawback is its low reaching speed (6.3103).
172
M. Duguleana, F.G. Barbuceanu, and G. Mogan Table 1. Centralized data from HRI questionnaire
Section
Question
Biographical Information
1.Age?
21 years – 14; 26 years – 3; 22 years – 6; 39 years – 1; 23 years – 2; 40 years – 1; 25 years – 1; 44 years – 1; 2.Sex? M – 62% F – 38% 3.How often do you use computers? 7.7931 (1 – never used; 10 – every day) 4.How frequent are you interacting with 5.3448 (1 – never interacted; 10 – robotic systems? every day) 5.How familiarized are you with VR 5.9655 (1 – never heard; 10 – very technologies? familiarized)
Specific Information
6.How fast do you think the robot arm was moving? 7.How exact was the robot arm in picking/grasping/placing the target? 8.How often did you feel that PowerCube (both virtual and real) was interacting with you? 9.How satisfying do you find the control algorithm of PowerCube?
Presence Measuring
Answer
6.3103 (1 – very slow; 10 – very fast) 9.1379 (1 – completely inexact; 10 – perfectly accurate) 5.1724 (1 – never; 10 – all the time)
7.6896 (1 – completely unsatisfying; 10 – completely satisfying) 10.How real did the overall VR 7.7586 (1 – completely unrealistic; experience feel, when compared with the 10 – perfectly real) real equipment? 11.How engaging was the VR 8.5862 (1 – not engaging; 10 – interaction? very engaging) 12.What was missing from virtual PowerCube that would make it seem more appropriate to the real PowerCube?
3 Conclusions Testing real life manipulation scenarios with PowerCube (and other robotic manipulators) imposes additional work in solving security issues, foreseeing and solving possible hardware and software malfunctions and preparing additional equipment for possible injuries. The proposed virtual solution eliminates all these problems. Although the presented virtual model is close to the real life robot, there still are some issues that need to be handled. The real robot has wires between each link, wires that have not been integrated into the simulated model. Another issue is the inconstancy between the real environment and the simulated one. Careful measures have been taken in order to have a good virtual replica of the real setup, however, due to the nature of the measuring process (which is inexact), the simulated arm slightly differs in some dimensions, as well as the real working environment. The
Evaluating Human-Robot Interaction during a Manipulation Experiment
173
simulated working environment had to be modified to include the robot body, chairs and the ground level as obstacles. According to the discussion of results, the information computed from the HRI questionnaire shows that immersive VR is a good alternative to classical robot testing. Acknowledgments. This work was supported by CNCSIS –UEFISCSU, project number PNII – IDEI 775/2008.
References 1. Johns, K., Taylor, T.: Professional Microsoft Robotics Developer Studio. Wrox Press, Indianapolis (2008) 2. Haton, B., Mogan, G.: Enhanced Ergonomics and Virtual Reality Applied to Industrial Robot Programming. Scientific Bulletin of Politehnica University of Timisoara, Timisoara, Romania (2008) 3. Morgan, S.: Programming Microsoft® Robotics Studio. Microsoft Press, Washington (2008) 4. Duguleana, M., Barbuceanu, F.: Designing of Virtual Reality Environments for Mobile Robots Programming. Journal of Solid State Phenomena 166-167, 185–190 (2010) 5. Goodrich, M.A., Schultz, A.C.: Human–Robot Interaction: A Survey. Foundations and Trends in Human–Computer Interaction 1(3), 203–275 (2007) 6. Walters, M.L., et al.: Practical and Methodological Challenges in Designing and Conducting Human-Robot Interaction Studies. In: Proceedings of AISB 2005 Symposium on Robot Companions Hard Problems and Open Challenges in Human-Robot Interaction, pp. 110–120 (2005) 7. Edsinger, A., Kemp, C.: Human-robot interaction for cooperative manipulation: Handing objects to one another. In: Proceedings of the IEEE International Workshop on Robot and Human Interactive Communication, ROMAN (2007) 8. Steinfeld, A., et al.: Common Metrics for Human-Robot Interaction. In: Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction, pp. 33–40 (2006) 9. Lombard, M., et al.: Measuring presence: a literature-based approach to the development of a standardized paper-and-pencil instrument. In: The 3rd International Workshop on Presence, Delft, The Netherlands (2000) 10. Bainbridge, W.A., et al.: The effect of presence on human-robot interaction. In: Proceedings of the 17th IEEE International Symposium on Robot and Human Interactive Communication, pp. 701–706 (2008) 11. Vajta, L., Juhasz, T.: The Role of 3D Simulation in the Advanced Robotic Design, Test and Control, Cutting Edge Robotics, pp. 47–60 (2005) 12. Duguleana, M.: Robot Manipulation in a Virtual Industrial Environment. International Master Thesis on Virtual Environments. Scuola Superiorre Sant’Anna, Pisa, Italy (2010)
3-D Sound Reproduction System for Immersive Environments Based on the Boundary Surface Control Principle Seigo Enomoto1 , Yusuke Ikeda1 , Shiro Ise2 , and Satoshi Nakamura1 1 Spoken Language Communication Group, National Institute of Information and Communications Technology, 3-5 Hikaridai, Keihanna Science City, 619-0289, Japan 2 Graduate school of engineering, Department of Architecture and architectural engineering, Kyoto University, C1-4-386 Kyotodaigaku-katsura, Nishikyo-ku, Kyoto, 615-8540, Japan {seigo.enomoto,yusuke.ikeda,satoshi.nakamura}@nict.go.jp,
[email protected]
Abstract. We constructed a 3-D sound reproduction system containing a 62-channel loudspeaker array and 70-channel microphone array based on the boundary surface control principle (BoSC). The microphone array can record the volume of the 3-D sound field and the loudspeaker array can accurately recreate it in other locations. Using these systems, we realized immersive acoustic environments similar to cinema or television sound spaces. We also recorded real 3-D acoustic environments, such as an orchestra performance and forest sounds, by using the microphone array. Recreated sound fields were evaluated by demonstration experiments using the 3-D sound field. Subjective assessments of 390 subjects confirm that these systems can achieve high presence for 3-D sound reproduction and provide the listener with deep immersion. Keywords: Boundary surface control principle, Immersive environments, Virtual reality, Stereophony, Surround sound.
1
Introduction
Stereophony is one of the primary factors in improving the sense of immersion in movies or television. Recent years have seen the emergence of surround sound systems with 5.1-channel or greater loudspeakers beyond movie theaters and into the home. Surround sound listeners can achieve a feeling as if they are in the actual places to which they are listening. Traditional surround systems, however, cannot reconstruct sound wavefronts that radiate in the actual environment. If a system can reconstruct the wavefront, it can recreate a fully immersive, rather than surrounding, environment. Since the hearing information can be obtained from all directions, the importance of this information is particularly increased in applications where many people in distance places communicate with each other. R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 174–184, 2011. c Springer-Verlag Berlin Heidelberg 2011
3-D Sound Field Reproduction System for Immersive Environments
175
The Kirchhoff-Helmholtz integral equation (KHIE) is the theoretical basis of 3-D sound reproduction systems to record and reproduce sound fields, and this sort of reproduction has a long history. In the early 1930s, Fletcher and colleagues reported that ideal sound field recording and reproduction could be achieved by using numerous microphones and loudspeakers with acoustical transparency[3,4]. Steinberg et al. also confirmed in subjective experiments that stereophonic sound could be represented using only three loudspeakers[13]. A three-loudspeaker system cannot reconstruct a wavefront but can serve as a basis for surround sound. Field or 3-D sound reproduction is, however, more attractive. For correctly reconstructing wavefronts, as compared to conventional surround systems, many technologies have been developed and their theoretical basis have been studied. In 1993, Berkhout et al. proposed Wave Field Synthesis (WFS) [1] as a 3-D sound field reproduction system based on the KHIE, and IOSONO [8] is a commercial application based on WFS. The theoretical basis of WFS is, however, the Rayleigh integral equation. The KHIE is not used directly; therefore, in the WFS an infinite plain boundary is considered explicitly. However, since the finite length loudspeaker array is applied to practical WFS, it is difficult to reproduce a 3-D sound field with no artifacts due to the truncation. Moreover, practical loudspeakers must be placed on the boundary instead of an ideal monopole source. An approximation of an ideal source by means of a loudspeaker causes other artifacts, especially in a higher frequency range of its own properties. Ambisonics [14] is another application of 3-D sound reproduction systems based on the wave equation. Ambisonic-based 3-D sound reproduction systems require higher order spherical harmonics to accurately reproduce 3-D sound fields, this they have to date been difficult to construct. It is difficult to extend the so-called sweet spot in ambisonics. In contrast, Ise in 1997 proposed the boundary surface control principle (BoSC) [6,7]. By integrating the KHIE and inverse system, the BoSC can accurately reproduce a 3-D sound field surrounded by a closed surface. According to the BoSC, it is not necessary to place ideal monopole and dipole sources on the boundary. Therefore an approximation of such sources is also not required. There is also no restriction on the loudspeaker position; it would be located in an exterior area for the boundary. Consequently, in our research we constructed 3-D sound reproduction systems that contain a 70-channel microphone array and 62-channel loudspeaker arrays. In this manuscript, we describe the results of subjective assessment to confirm the “presence” recreated by the BoSC-based 3-D sound field reproduction system.
2
Boundary Surface Control Principle
The BoSC is the theory for reproducing a 3-D sound field, and is now applied to the field of active noise control[10] or steering the directionality of a loudspeaker array[2]. This section describes the BoSC as an application of 3-D sound reproduction. Fig. 1 expresses 3-D sound field reproduction based on the BoSC. The
176
S. Enomoto et al.
Sound source Secondary sources ri
S
n
Primary sound field
V p(s)
Recording points
S’
r’i
n’
Secondary sound field
V’ p(s’)
Control points
Fig. 1. 3-D sound field reproduction based on the BoSC. The left figure shows the primary sound field to be recorded e.g., concert hall. The right figure shows the secondary sound field where the recorded primary sound field is reproduced.
figure on the left shows the primary sound field to be recorded; e.g., a concert hall. On the right is the secondary sound field where the recorded primary sound field is reproduced; i.e., a listening room. The 3-D sound reproduction system based on the KHIE aims to record sound field V bounded by surface S and reproduce it into V bounded by S . Using the KHIE, the complex sound pressure of p(s) and p(s ) where s ∈ V and s ∈ V are the evaluation points is given by ∂G(r|s) p(s) = −jωρ0 G(r|s)vn (r) − p(r) dS , (1) ∂n S ∂G(r |s ) p(s ) = −jωρ0 G(r |s )vn (r ) − p(r ) d S , (2) ∂n S where ω is an angular frequency. ρ0 is density of the medium. p(r) and vn (r) are sound pressure and normal-outward particle velocity on the boundary, respec−j ω |r−s| c tively. G(r|s) = e 4π|r−s| is a free-space Green’s function [11,15]. For notational simplicity, angular frequency ω in equations is omitted. The free-space Green’s function G(r|s) is explicitly defined by r and s. Therefore, if the shape of boundary S is identical to S , the free-space Green’s function G(r|s) is also identical to G(r |s ). Consequently, if p(r) and vn (r) are equal to p(r) and vn (r), p(s) is also equal to p(s ). To equalize p(r) and vn (r) with p(r ) and vn (r ) respectively, the BoSC system employs the secondary loudspeakers. The output signal of the loudspeakers is determined by the convolutions of the recorded sound signal and the inverse system. The inverse system is computed to equalize the room transfer function between each loudspeaker and microphone.
3-D Sound Field Reproduction System for Immersive Environments
3 3.1
177
3-D Sound Reproduction System BoSC in Practice : Reproducing 3-D Sound Field from Recorded Sound Pressure on the Boundary
The 3-D sound reproduction system based on the KHIE theoretically requires measurements of the particle velocity on the boundary. It is well known that the particle velocity can be measured by using the sound pressure positioned with two points that intersect the boundary [11]. However, in this case, since the doubled record/control points are required, there is huge computational cost. Therefore we constructed a 3-D sound reproduction system that can record and reproduce only the sound pressure on the boundary. The Dirichlet Green’s function GD (r|s) can be used [15]. Substituting GD (r|s) into equations (1) and (2), the first item of right-hand of these equations can be eliminated. Note that it is difficult to derive the exact value of the Dirichlet Green’s function GD (r|s), but it is not required in the BoSC system. The BoSC can assume that GD (r|s) and GD (r |s ) are constants if the shape of the boundaries S is identical to S . Therefore, if the sound pressure is recorded on boundary S and reproduced on boundary S , p(s) is also reproduced in volume V . Note, however, that 3-D sound fields at the natural frequency of a closed surface cannot be reproduced in the Dirichlet boundary condition. 3.2
Microphone Array for 3-D Sound Field Recording
As boundary S and S depicted in Fig. 1, we presumed that the microphones should be distributed at regular intervals. To construct a microphone array of this shape, we designed it based on the C80 fullerene structure. The constructed array is shown in Fig. 2. Its diameter is around 46 cm. Omni-directional microphones (DPA 4060-BM) are installed on each node of the fullerene. Ten microphones located on the bottom of the fullerene are also eliminated to insert the head of the Head And Torso Simulator (HATS) or the subjects in Fig. 2. Therefore there are 70 nodes. Since the maximum and minimum interval of each microphone is respectively around 16 cm and 8 cm, the system can reproduce a frequency range up to 2 kHz. The system, however, aims to create immersive environments and have people feel the presence of other people or places. Therefore we did not limit the frequency range to below 2 kHz in the demonstration experiment. In the experiment we also aimed to evaluate 3-D sound fields that contain a frequency signal over 2 kHz. 3.3
Loudspeaker Array and Sound Reproduction Room
As the secondary sound sources depicted in Fig. 1, we designed the loudspeaker array with a dome structure consisting of four wooden layers and supported by the four wood columns. Six, 16, 24, and 16 full-range loudspeakers (Fostex FE83E) were installed on each wood layer. We presumed that the height
178
S. Enomoto et al.
(a)
(b)
Fig. 2. BoSC-based 3D sound reproduction system : 70-channel microphone array in which omni-directional microphones are installed on every node. (a) 70-channel mic. array. (b) Omni-directional mic. installed.
of the third layer is as almost same as the center of the microphone array. To compensate for the lower frequency responses of full-range loudspeakers, two sub-woofer loudspeakers (Fostex FW108N) were installed on each wood column. However, we employed 62 full-range loudspeakers for the design of the inverse system. Though the minimum resonance frequency of a full-range loudspeaker is 127 Hz, the 3-D sound reproduction system we constructed can produce from 80 Hz by using the appropriate inverse system. We therefore employed the subwoofer loudspeaker only for below the 80 Hz frequency range. The loudspeaker array is shown in Fig. 3. The loudspeaker array was constructed in a soundproofed room (Yamaha Woodybox: sound insulation level Dr = 30; inside dimensions: 1,203 mm × 1,646 mm × 2,164 mm) to reduce the disturbance background noise. To reduce the reverberation in the soundproofed room, we attached sound absorbing sponge to each interior wall. 3.4
Design of the Inverse System
In Fig. 1, we presume that X(ω) is a complex pressure recorded on boundary S in the primary sound field, Y(ω) is a radiation signal from the loudspeaker in the secondary sound field, and [G(ω)] is the impedance matrix of the transfer function between each loudspeaker and microphone pair. Complex pressure Z(ω) measured on boundary S in the secondary sound field then satisfies Equation (3). Z(ω) = [G(ω)]Y(ω)
(3)
Therefore, Equation (4) is required to be Z(ω) = X(ω). Y(ω) = [G(ω)]+ X(ω)
(4)
3-D Sound Field Reproduction System for Immersive Environments
(a)
179
(b)
Fig. 3. BoSC-based 3D sound reproduction system: 70-channel loudspeaker array which 62 full-range loudspeakers and eight subwoofer loudspeakers installed. Only 62 full-range loudspeakers are used to render a wavefront. (a) Loudspeaker array constructed in the soundproofed room; (b) Dome structure of the loudspeaker array
where, [·]+ represents pseudo-inverse matrix. In addition, X(ω) = [X1 (ω), · · · , XN (ω)]T , Y(ω) = [Y1 (ω), · · · , YM (ω)]T , T
Z(ω) = [Z1 (ω), · · · , ZN (ω)] ,
⎡
⎤ G11 (ω) · · · G1M (ω) ⎢ ⎥ .. .. .. [G(ω)] = ⎣ ⎦, . . . GN 1 (ω) · · · GN M (ω)
(5)
where, [·]T represents transpose, the number of microphones is N = 70, and the number of loudspeakers is M = 62 in this manuscript. Therefore, by using the left inverse matrix, [G(ω)]+ can be given as [G(ω)]+ = ([G(ω)]† [G(ω)] + β(ω)IM )−1 [G(ω)]† ,
(6)
where, [·]† represents conjugate transpose, β(ω) represents the regularization parameter, and IM is the unit matrix with order M . An appropriate regularization parameter can reduce instabilities of ([G(ω)]† [G(ω)]). We determined the parameters in each octave frequency band heuristically. We also presume the center of each frequency band is 125 Hz, 250 Hz, 500 Hz, 1 kHz, 2 kHz, 4 kHz, 8 kHz, and 16 kHz. The transfer impedance matrix [G(ω)] is measured by using a 217 points swept-sine signal experimentally in advance. Therefore, the inverse matrix only contains the inversion of the measured transfer function and is not the correct inversion in the reproducing circumstance. To determine [G(ω)]+ as it can compensate for the fluctuations and time variants of the transfer function, we can employ adaptive signal processing. Many adaptive signal processing methods for the MIMO inverse system have been proposed and applied in the WFS system to compensate for reverberations in a listening room [9,12,5]. Almost all algorithms can be applied into the BoSC system. However, such adaptive algorithms
180
S. Enomoto et al.
(a)
(b)
Fig. 4. Recording of 3-D sound field data: (a) Orchestra with two microphone arrays are located in the auditorium and in front of the conductor during playing of Beethoven’s Symphony No. 7. (b) Forest sounds which contains air-plane, hamming bird, singing of insects, footsteps, conversational voice, and so on.
with 62 inputs and 70 outputs require huge computational complexity. We therefore assumed in this manuscript that the instabilities caused by fluctuations and time variants of the transfer impedance matrix can be reduced by using the appropriate regularization parameters in this manuscript.
4 4.1
Demonstration Experiments 3-D Sound Field Recording
To demonstrate the performance of the 3-D reproduction system based on BoSC, we recorded real 3-D sound environments. The orchestra and forest sounds were reproduced in the subjective assessments described in the next section. The recording environments are shown in Fig. 4. The recordings of 3-D sound sources were carried out with 48 kHz sampling frequency and 24-bit quantization bit depth. In the recording of the orchestra, we employed two microphone arrays; one located in the auditorium and another in front of the conductor. Providing the visual information supplementally in the demonstration, we carried out video recording at the same position as 3-D field recording. 4.2
Subjective Assessment
The demonstration experiments were conducted to evaluate the performance of the BoSC-based sound reproduction system. The reproduced sound fields are listed in Table 1. Three kinds of nature sound (A) recorded in forests, and an orchestra performance (B and C) were employed in the demonstration. The performance in (B) corresponds to (A). The demonstration was limited to around five minutes. The recorded video was shown on an LCD monitor while the orchestra
3-D Sound Field Reproduction System for Immersive Environments
181
Table 1. Reproduced 3-D sound field data
Forest sound
A
Orchestra
B C
Feature sound/Location airplane conversing voices footsteps in auditorium in front of conductor (stage)
Time [sec.] 70.0 37.5 22.5 67.0 67.0
Table 2. Questionnaire entries for the demonstration of the 3-D sound field reproduction Comprehensive feeling for the reproduced 3-D sound field A. Did you feel as if you were in a forest? 1. Very poor 2. Poor 3. Average 4. Good 5. Very good B. Did you feel as if you were in the auditorium? 1. Very poor 2. Poor 3. Average 4. Good 5. Very good C. Did you feel as if you were in front of the conductor? 1. Very poor 2. Poor 3. Average 4. Good 5. Very good What was the most impressive sound? (description) Table 3. Averaged scores of each age and total subject (1. very poor, 2. poor, 3. average, 4. good, and 5. very good)
A. Forest sound B. Auditorium C. Stage
-9y/o 4.54 4.50 4.26
10’s 4.69 4.53 4.43
20’s 4.48 4.29 4.59
30’s 4.66 4.24 4.57
40’s 4.55 4.36 4.68
50y/o4.10 4.28 4.24
total 4.55 4.37 4.50
performance was being reproduced. Subjective assessment was conducted for the audience of the 3-D sound reproduction system. The questionnaire entries of the subjective assessment were listed in Table 2. For the orchestra performance (B and C), the subwoofer loudspeakers were employed to raise the lower frequency range. Each subwoofer loudspeaker was assigned to radiate the sound signals, which were measured by using the C80 microphone array with delay-and-sum. 4.3
Experimental Results and Discussions
In the demonstration, we obtained questionnaire responses from 390 subjects. The results of the subjective assessment for each age are shown in Fig. 5 and the averaged scores are shown in Table 3. Fig. 5 (a) shows that almost all subjects for all ages felt as if they were in the nature. This confirmed that the BoSCbased sound reproduction system reproduced a 3-D sound field and can create immersive environments. Figs. 5(b) and (c) also show that many subjects rated the reproduced sound field as “good” or “very good.” The system therefore can be said to yield a
182
S. Enomoto et al.
70
Very good Good Avarage Poor Very poor
Number of subjects
60 50 40 30 20 10 0
−9
10s 20s 30s 40s 50− Age
(a) Very good Good Avarage Poor Very poor
Number of subjects
60 50 40 30 20 10 0
70
Very good Good Avarage Poor Very poor
60 Number of subjects
70
50 40 30 20 10
−9
10s 20s 30s 40s 50− Age
(b)
0
−9
10s 20s 30s 40s 50− Age
(c)
Fig. 5. Experimental results: Subjective assessments for (a) forest sounds, (b) auditorium, (c) stage.
presence similar to as if subjects were in the concert hall or on the stage. The scores for B, however, were smaller than for A or C, especially in those aged 20’s to 40’s since mismatches of reproduced sound fields and video information due to the monitor’s size and position caused odd sensations. On the other hand, because only the conductor was shown on the monitor, the subjects did not have such sensations in C. In addition, subjects with experience playing instruments gave more “very good” scores in C compared to B.
5
Conclusions
We constructed a 3-D sound reproduction system based on the BoSC, consisting of a 70-channel microphone array designed from a C80 fullerene structure, and a loudspeaker array in which 62 full-range loudspeakers and eight subwoofer loudspeakers were installed.
3-D Sound Field Reproduction System for Immersive Environments
183
To evaluate the performance of the BoSC-based system, we conducted subjective evaluation through the demonstration. The assessment results confirm that the system can provide immersive environments and the presence of other people or places. However, we did not limit the frequency range of the reproduced sound field, though theoretically it is limited up to 2 kHz. The accuracy of reproduced sound fields should be physically evaluated in the future.
Acknowledgments Part of this study was supported by the Special Coordination Funds for Promoting Science and Technology of the Ministry of Education, Culture, Sports, Science and Technology of Japan, and the Strategic Information and Communications R&D Promotion Program commissioned by the Ministry of Internal Affairs and Communications of Japan.
References 1. Berkhout, A.J., de Vries, D., Vogel, P.: Acoustic control by wave field synthesis. Journal of the Acoustical Society of America 93(5), 2764–2778 (1993) 2. Enomoto, S., Ise, S.: A proposal of the directional speaker system based on the boundary surface control principle. Electronics and Communications in Japan 88(2), 1–9 (2005) 3. Fletcher, H.: Auditory perspective - basic requirement. In: Symposium on Wire Transmission of Symphony Music and its Reproduction in Auditory Perspective, vol. 53, pp. 9–11 (1934) 4. Fletcher, H.: The stereophonic sound film system - general theory. The J. Acoust. Soc. Am. 13(2), 89–99 (1941) 5. Gauthier, P.A., Berry, A.: Adaptive wave field synthesis with independent radiation mode control for active sound field reproduction: Theory. Journal of the Acoustical Society of America 119(5), 2721–2737 (2006) 6. Ise, S.: A principle of sound field control based on the kirchhoff-helmholtz integral equation and the inverse system theory. Journal of the Acoustical Society of Japan 53(9), 706–713 (1997) (in Japanese) 7. Ise, S.: A principle of sound field control based on the kirchhoff-helmholtz integral equation and the theory of inverse systems. Acustica 85, 78–87 (1999) 8. Lee, N.: Iosono. ACM Computers in Entertainment (CIE) 2(3), 3–3 (2004) 9. Lopez, J., Gonzalez, A., Fuster, L.: Room compensation in wave field synthesis by means of multichannel inversion. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2005) 10. Nakashima, T., Ise, S.: A theoretical study of the descretization of the boundary surface in the boundary surface control principle. Acoustical Science and Technology 27(4), 199–205 (2006) 11. Nelson, P.A., Elliot, S.J.: Active Control of Sound. Academic Press, San Diego (1992)
184
S. Enomoto et al.
12. Spors, S., Buchner, H., Rabenstein, R., Herbordt, W.: Active listening room compensation for massive multichannel sound reproduction systems using wave-domain adaptive filtering. Journal of the Acoustical Society of America 122(1), 354–369 (2007) 13. Steinberg, J., Snow, W.: Auditory perspective - physical factors. In: Symposium on Wire Transmission of Symphony Music and its Reproduction in Auditory Perspective, vol. 53, pp. 12–17 (1934) 14. Ward, D.B., Abhayapala, T.D.: Reproduction of a plane-wave sound field using an array of loudspeakers. IEEE Transactions on Speech and Audio Processing 9(6), 697–707 (2001) 15. Williams, E.G.: Fourier Acoustics:Sound Radiation and Nearfield Acoustical Holography. Academic Press, London (1999)
Workspace-Driven, Blended Orbital Viewing in Immersive Environments Scott Frees and David Lancellotti Ramapo College of New Jersey 505 Ramapo Valley Road Mahwah, NJ 07430 {sfrees,dlancell}@ramapo.edu
Abstract. We present several additions to orbital viewing in immersive virtual environments, including a method of blending standard and orbital viewing to allow smoother transitions between modes and more flexibility when working in larger workspaces. Based on pilot studies, we present methods of allowing users to manipulate objects while using orbital viewing in a more natural way. Also presented is an implementation of workspace recognition, where the application automatically detects areas of interest and offers to invoke orbital viewing as the user approaches. Keywords: Immersive Virtual Environments, Context-Sensitive Interaction, 3DUI, interaction techniques.
1 Introduction One of the key benefits of most immersive virtual environment configurations is the ability to control the viewpoint using natural head motion. When wearing a tracked, head-mounted display, users can control the direction of their gaze within the virtual world by turning their head the same way they do in the real world. This advantage typically holds in other configurations as well. We refer to this viewpoint control as egocentric viewing, as the axis of rotation of the viewpoint is centered atop the user’s physical head. There are, however, situations where egocentric view control may not be optimal. In order to view an object from different perspectives, users are normally forced to physically walk around the area. Depending on the hardware configuration (wired trackers, etc.), this can be cumbersome. One alternative is orbital viewing [5], where the user’s gaze is fixed towards the area of interest. Head movements and rotations are mapped such that the user’s viewpoint orbits around the target location, constantly being reoriented such that the target is in view. Physical head rotations to the left cause the virtual viewpoint to swing out to the right, such that the user is looking at the right side of the object. Looking down has the effect of moving the viewpoint up above such that the user is looking down at the object. Orbital viewing has its own disadvantages. Firstly, it is quite ineffective if the user wishes to survey the world –it locks the gaze direction such that the user is always looking towards the same 3D location in the world. Switching between orbital and R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 185–193, 2011. © Springer-Verlag Berlin Heidelberg 2011
186
S. Frees and D. Lancellotti
egocentric viewing can also be disruptive and confusing. Orbital viewing also presents interaction challenges when manipulating objects, as the act of orbiting while manipulated an object can affect the user’s ability to control the object’s position accurately. This paper presents additions to orbital viewing that allows it to be more effective in general interactive environments. Finally, we describe how orbital viewing can be integrated into an application by linking it to known areas of interest within the virtual world – which we call “hotspots”. We present an outline of how we automatically infer the existence of such locations based on user behavior.
2 Related Work Koller et al. [5] first presented orbital viewing as an alternative method for viewing objects, in particular a WIM [7]. Several of orbital viewing’s limitations were addressed in this work, such as the possibility of occlusion, importance of controlling the radius of orbit, and the possibility of increased disorientation when transitioning into / out of orbital viewing. We do not address the occlusion problem in our work, but offer implementations that relate to radius control and disorientation. Many alternatives to egocentric view control have been presented in the literature using a wide range of approaches for both immersive environments [2, 7, 9] and desktop applications [8]. Our work is not aimed at comparing orbital viewing with other alternative techniques however; it is focused on determining and improving orbital viewing’s effectiveness in interactive systems in general. Much of our work is aimed at developing viewpoint control techniques that effectively handle changing workspaces. The user’s workspace can be described as the location and size of the area the user is currently most interested in. The size and location of the workspace can greatly influence an interaction technique’s effectiveness, described in detail in [1]. Our techniques for implicitly recognizing workspace are based in part on the concept of focus and nimbus, presented by Greenhalgh and Benford [3]. In addition, our concept of modeling workspaces as artifacts within the environment in which users can invoke orbital viewing is similar to the idea of landmarks introduced by Pierce and Pausch [6], where landmarks were anchor points for navigation techniques in large-scale workspaces.
3 Identifying Problems with Orbital Viewing This research started as an investigation into whether or not orbital viewing could aid in manipulation tasks requiring the user to view objects from multiple perspectives. In our user studies, participants were presented with a virtual object (as shown in Fig. 1) that fits within a translucent object of the same shape (only slightly larger). The user was asked to position the object to fit within the target as many times as possible within a 3-minute trial. This task was selected because it would require the user to view the object from all directions in order to make the fit.
Workspace-Driven, Blended Orbital Viewing in Immersive Environments
187
Fig. 1. User manipulating the object such that it will match the target's (blue translucent object) orientation
We conducted this experiment with 24 participants. Each participant underwent two training trials using egocentric viewing and two training trials with standard orbital viewing – where the orbital viewing configuration was centered on the target object. They completed two recorded trials with egocentric and orbital viewing in a randomized order. ANOVA analysis of completions per trial showed no statistically significant effect for view control type (p = 0.143). User feedback, however, provided us with some clear areas where orbital viewing could be improved. Survey data also indicated that while orbital viewing did not help performance on the task, many participants found it more comfortable (as opposed to walking around the object). Our first observation was that orbital viewing could make it difficult to control objects precisely. As the head rotates, the user’s virtual viewpoint and avatar orbit around a center point. This becomes a problem when the user is holding a virtual object (perhaps with a hand-held stylus). If the physical relationship between the hand and head are preserved, head movement results in hand movement – and thus the virtual object will move. In Section 4.1 we present a modification that temporarily breaks the relationship between the viewpoint and hand to improve manipulation. Another area where orbital viewing caused problems was at the beginning of trials, where if the trial was to use orbital viewing the user was abruptly switched into that mode when timing began. We observed confusion and awkwardness during these transitions. Furthermore, our overall goal is to take advantage of orbital viewing in general virtual environment applications if/when the user’s workspace becomes small. If the user is to switch between egocentric and orbital viewing as their workspace size changes, the disorientation during transitions needed to be resolved. In Section 4.2, we discuss our blended view control approach that works to reduce this problem.
4 Implementation of Viewing Techniques The implementation of egocentric viewing is straightforward – a tracking device of some kind provides position and orientation information for the user’s head. The viewpoint, or virtual camera, rotates and moves in a one-to-one correspondence with
188
S. Frees and D. Lancellotti
the tracker - commonly im mplemented by making the viewpoint a child object of the tracker representation in th he scene graph (perhaps with offsets to account for eye position and offsets). An alternative way of describing viewpoint control in 3D graphics is the specificatio on of two 3D points, eye and look. In this scenario, the orientation of the viewpoin nt or virtual camera is constrained to point at the “look” point. The look and eye points p are rigidly attached to each other (at some fixed distance). Rotations of the viewpoint (driven by the tracker) result in translations and rotations of the look point. In effect, in egocentric viewing, the look point orbits the eye point (or viewpoint), as depicted in Fig. 2 (A). In our system, we used a Virttual Research 1280 HMD and d Polhemus FASTRAK system. The application and interaction techniques weree implemented using the SVE [4] and CDI toolkits [1]. When orbital viewing is active, the situation reverses – instead of the tracker beeing e point; head rotations are mapped to the look point. T The mapped to rotations of the eye look point’s position is fixed, f and the viewpoint orbits the look point. Thiss is implemented by detaching the viewpoint from the tracker and making it a child off the look point. Rotational inforrmation (not position) is mapped from the head trackeer to the look point. The result iss that head movements cause the viewpoint to orbit arouund the look point, as shown in Fig. 2 (B).
A
B
Fig. 2. (A A) Egocentric viewing; (B) Orbital viewing
In an interactive system, the user must be given a method to control the radiuss of the orbit. In our implementation, where the user points the stylus in the geneeral direction they wish to mov ve and hold the button down. The vector representing the stylus direction is projected d onto a vector connecting the eye and look points. If the vector projects away from the t look point, the radius is increased, and vice versa. 4.1 Object Manipulation n While Using Orbital Viewing In orbital viewing, head rotation results in viewpoint rotation and translattion (viewpoint orbits the worksspace). This presents a question: should the viewpoint and virtual hand move together, as a unit, or should the virtual hand stay fixed while oonly oth choices could create confusion, however the form mer the viewpoint moves? Bo creates real interaction diffiiculties since the virtual hand may move while the physical hand remains still.
Workspace-Driven, Blended Orbital Viewing in Immersive Environments
189
Our implementation allows the hand to orbit with the viewpoint only when not interacting with an object. This means that if the user’s hand is within view when rotating the head, the virtual hand appears still, as it moves proportionately with the viewpoint, as depicted in Fig. 3 (A). This configuration is implemented by moving the virtual hand in the scene graph such that it is a child of the viewpoint. Its local position with respect to the viewpoint is recalculated each time the stylus tracker moves such that it is equal to the distance between the stylus and HMD trackers.
A
B
Fig. 3. (A) Virtual hand moves with viewpoint. (B) Virtual hand stays in original (physical) position, viewpoint moves without it.
In our system the user’s virtual hand is tracked by a hand-held stylus, which has a single button. The user can translate and rotate an object by intersecting their virtual hand with it and holding a stylus button down. While manipulating an object, we switch to the setup depicted in Fig. 3 (B). When holding an object the virtual hand “sticks” in its original position - the cursor/virtual hand still responds to physical hand movement, but it does not orbit with the viewpoint. This creates an inconsistency between the physical hand/head and the virtual hand/viewpoint. Once the user releases the object, the virtual hand “snaps” back to where it would have been if it had orbited with the viewpoint, which restores the user to a workable state going forward. At first look, this technique may seem confusing, as one would think it would be distracting to have your viewpoint potentially move large distances while the hand stays stationary. In our experience, however, it has proven effective. When users are manipulating objects, their attention is focused on their hand and the object and they do not expect their virtual hand to move unless the physical hand does. When head rotations cause their viewpoint to orbit the object of interest, the experience mimics real world experiences of holding an object in hand and “peering” around it to see it from another perspective. During user trials, participants needed little explanation of the technique, they adapted to it without difficulty. We suspect preserving the mapping of physical hand movements (or absence of) to the virtual cursor is far more important than preserving the relationship between the viewpoint and virtual hand, however we would like to investigate this more directly in future trials.
190
S. Frees and D. Lancellotti
4.2 Blended Orbital Viewing Switching between egocentric and orbital viewing can be disorienting, and often times being in complete orbital viewing mode may not be optimal (the user may be over-constrained). We have implemented a mechanism for blending the two viewing techniques together seamlessly, such that the user can be experiencing view control anywhere along the continuum between egocentric and orbital viewing. We describe both techniques by maintaining two 3D points – look and eye – rigidly attached to each other. In egocentric viewing, the head tracker directly rotates the eye (viewpoint), where in orbital viewing it maps to the look point. Rather than pick a configuration, we instead keep track of both sets of points - giving us eyeego and lookego for egocentric and eyeorb and lookorb for orbital viewing using four hidden objects inserted into the scene graph. The actual look and eye points (and thus the viewpoint’s orientation) are defined by a view control variable, VC. At VC = 0, the user is in full egocentric viewing and the viewpoint is directly mapped to lookego and eyeego. When VC = 1, the user is engaged in orbital viewing – the viewpoint is mapped to lookorb and eyeorb. For values between 0 and 1, the eye (viewpoint) and look positions are linear interpolations of their corresponding egocentric and orbital pairs. Once the interpolated look and eye points are known, the actual virtual camera is repositioned and oriented to adhere to the configuration depicted in Fig. 4. A similar interpolation procedure is performed to position the virtual hand and cursor. Blending on Transitions. The most straightforward use of blended orbital viewing is during transitions between full egocentric and orbital viewing. When the workspace becomes small (either determined explicitly or through observation) and the interface invokes orbital viewing, the abrupt change in viewpoint mapping can become extremely distracting. Worse yet, abruptly transitioning from orbital viewing into normal egocentric viewing can leave the user looking in a completely different direction than they were a moment before (without actually physically moving their head). To alleviate these effects, we never move the VC value discretely; rather we gradually adjust it over a period of three seconds. The actual time interval is somewhat arbitrary, but it represented a good compromise derived from several user trials. Shorter transition intervals did little to alleviate disorientation, and significantly longer intervals tended to interfere with the task at hand. We fully expect different users might respond better to different interval values. Blending Based on Workspace Size. Often a user might be particularly interested in a region of the world containing several objects. While this workspace may be relatively small, it could be large enough that locking into a fixed 3D location at its center using pure orbital viewing could be a hindrance to the user. By selecting a VC value between 0 and 1, the user benefits from orbital viewing (i.e. head movements cause them to orbit the center point, giving them a better perspective) while still allowing them to view a larger field of view (i.e. the look point moves within the workspace). This is depicted in Fig. 5, where the VC value is 0.75.
Workspace-Driven n, Blended Orbital Viewing in Immersive Environments
191
Fig. 4. Exam mple of blended orbital viewing with VC = 0.5
Fig. 5. With VC = 0.75, when user rotates physical head right, most of the rotation results inn an orbit around to the right side (due to orbital component), however the look point also mooves (due to egocentric component))
This technique could also o be useful when the user gradually becomes more focuused on a specific area. In the beeginning of the sequence they may be surveying the woorld, using egocentric viewing. Over O time, the user may start to focus on a set of objeects, perhaps manipulating them. As they become more focused on a smaller space, the VC t orbital viewing. can gradually be increased towards
5 Workspace Recogn nition The user’s current workspaace is part of the contextual information an application can utilize when deciding the interaction technique to deliver. When working withiin a wing technique such as orbital viewing may be beneficiial small, confined area, a view while when working in a laarger area it could be too limiting. To make decisions baased on workspace, we require a way to define it quantitatively. Our model defines threee size ranges for the workspace: small, medium, and larrge; supporting different view control c techniques for each range. We base view conntrol decisions on the ranges insstead of physical dimensions directly in order to allow the thresholds to be varied eassily between applications or users. Current workspace can
192
S. Frees and D. Lancellotti
be determined in a variety of ways – ranging from asking the user to explicitly indicate the volume to inferring it by user activity. We use a combination of explicit and implicit workspace recognition. Potential workspaces in the world are identified through observation – which we refer to as “hotspots”. Users must explicitly “attach” to the hotspot before orbital viewing is invoked. Workspace is inferred by recording where the user has been “looking” and interacting. We divide the virtual world into a three-dimensional grid – with each location assigned a numeric score. As the user’s viewpoint moves, we increment the score of each location within the field of view. Interaction with an object near the location also increases its score. Periodically the system reduces the score of all grid locations by a small amount. Overtime, heavily active locations achieve high enough scores to be promoted to “hotspots” – which become permanent artifacts within the world and anchor points for orbital viewing. A hotspot has a precise location. Its size is recorded as “small” if there are no hotspots immediately adjacent to it. If there is a hotspot at an adjacent grid location, its size is set to “medium”. Hotspots are implemented such that the required score, frequency and magnitude of increments, and granularity of the 3D grid points easily varied; we have not yet done formal studies to determine an optimal configuration (if one even exits). In addition to dynamic hotspots, we also support predefined hotspot locations that can be placed within the world. Regardless of the type of hotspot, the user interface associated with it is the same. As a user approaches a hotspot, they are given a textual indication on the top of their HMD’s view suggesting that orbital viewing could be used. If they wish to invoke orbital viewing, they simply touch their stylus to their head and click its button. The user is then transitioned from a VC = 0 (egocentric viewing) up to the maximum VC associated with the hotspot. Typically, we choose VC = 1 for small (standalone) hotspot locations and VC = 0.75 for hotspots with proximate neighbors (part of medium sized workspaces). The VC is adjusted gradually as described in Section 4.1.1. To exit orbital viewing, the user touches their stylus to their head and clicks – which gradually reduces the VC back to 0.
6 Conclusions and Future Work We have created several features that can be implemented with orbital viewing that we’ve seen reduce confusion and disorientation while allowing users to manipulate objects in more natural manner. In follow-up experiments post-trial user feedback has improved. We believe the system for recognizing workspaces, or hotspots, in the virtual world is a solid stepping stone in automatically recognizing areas of interest, and that using blended orbital viewing to ease transitions is a valuable addition to the view control interface. Our greatest challenge has been quantifying performance results. The user experiment described in Section 3 was specifically designed to be difficult if users did not try to view the object from various angles. Our expectation was that if orbital viewing really helped the user to do this, participants would attain more completions on those trials. Even with our improvements however, the actual manipulation task is hard enough that it seems to outweigh the effects of the view control method. In subsequent experiments with the same design, view control technique still has not
Workspace-Driven, Blended Orbital Viewing in Immersive Environments
193
appeared significant. Reducing difficulty (making the target larger such that it is easier to fit the control object within it) tends to reduce the importance of the viewing technique. We have considered experiments that only require the user to view an object or area from multiple directions (without needing to interact), however we feel this would be a trivial comparison – as obviously rotating one’s head would be faster than walking around to the other side of the object. We are currently investigating alternative methods of evaluating the effect orbital viewing has on general interaction. Acknowledgements. This work was funded by the National Science Foundation, grant number IIS-0914976.
References 1. Frees, S.: Context-Driven Interaction in Immersive Virtual Environments. Virtual Reality 14, 277–290 (2010) 2. Fukatsu, S., Kitamura, Y., Toshihiro, M., Kishino, F.: Intuitive control of “birds eye” overview images for navigation in an enormous virtual environment. In: Proceedings of the ACM Symposium on Virtual Reality Software and Technology, pp. 67–76 (1998) 3. Greenhalgh, C., Benford, S.: Massive: A collaborative virtual environment for teleconferencing. ACM Transactions on Computer-Human Interaction 2(3), 239–261 (1995) 4. Kessler, G.D., Bowman, D.A., Hodges, L.F.: The Simple Virtual Environment Library, and Extensible Framework for Building VE Applications. Presence: Teleoperators and Virtual Environments 9(2), 187–208 (2000) 5. Koller, D., Mine, M., Hudson, S.: Head-Tracked Orbital Viewing: An Interaction Technique for Immersive Virtual Environments. In: Proceedings of the ACM Symposium on User Interface Software and Technology, pp. 81–82 (1996) 6. Pierce, J., Pausch, R.: Navigation with Place Representations and Visible Landmarks. In: Proceedings of IEEE Virtual Reality, pp. 173–180 (2004) 7. Stoakley, R., Conway, M., Pausch, R.: Virtual Reality on a WIM: interactive worlds in miniature. In: Proceedings of CHI 1995, pp. 265–272 (1995) 8. Tan, D.S., Robertson, G.G., Czerwinski, M.: Exploring 3D Navigation: Combining Speedcoupled Flying with Orbiting. In: CHI 2001 Conference on Human Factors in Computing Systems, Seattle, WA (2001) 9. Tanriverdi, V., Jacob, R.: Interacting with Eye Movements in Virtual Environments. In: Proceedings of SIGCHI on Human Factors in Computing Systems, pp. 265–272 (2000)
Irradiating Heat in Virtual Environments: Algorithm and Implementation Marco Gaudina, Andrea Brogni, and Darwin Caldwell Advanced Robotics Dept. - Istituto Italiano di Tecnologia, Genoa, Italy {marco.gaudina,andrea.brogni,darwin.caldwell}@iit.it
Abstract. Human-computer interactive systems focused mostly on graphical rendering, implementation of haptic feedback sensation or delivery of auditory information. Human senses are not limited to those information and other physical characteristics, like thermal sensation, are under research and development. In Virtual Reality, not so many algorithms and implementation have been exploited to simulate thermal characteristics of the environment. This physical characteristic can be used to dramatically improve the overall realism. Our approach is to establish a preliminary way of modelling an irradiating thermal environment taking into account the physical characteristics of the heat source. We defined an algorithm where the irradiating heat surface is analysed for its physical characteristic, material and orientation with respect to a point of interest. To test the algorithm consistency some experiments were carried out and the results have been analysed. We implemented the algorithm in a basic virtual reality application using a simple and low cost thermo-feedback device to allow the user to perceive the temperature in the 3D space of the environment. Keywords: Virtual Reality, Thermal Characteristic, Haptic, Physiology.
1 Introduction In the last decade the Human-computer interaction has dramatically evolved due to the large expansion of technology. Some fields like Virtual Reality now have a new important role. In the challenge to take the user’s experience to a different level of interaction, a Virtual Environment is a suitable platform to allow the user to feel more confortable with the application. In a Virtual Environment the user can freely move around, perceive objects depth via 3D glasses, touch objects using haptic interfaces and listen to sounds. Many works presented in [8] demonstrate all the effort that has been put in recent years to achieve a good level of haptic interaction in virtual environments. Softness, surface, friction are for example aspects that have been analysed until now and these helped much more to improve how a user can perceive him/herself as the protagonist of a virtual scene. Despite these several improvements, a big topic needs more attention: the thermal interaction. When we move around in an environment we feel different thermal characteristics, therefore cutaneous skin stimuli are generated and our perception is altered giving important information to the nervous system about the surrounding area. In some pathological cases, like for example blind people, thermal characteristics could help users where visual information is missing. In [6], temperature changes during contact have been used to assist in identifying and discriminating R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 194–203, 2011. c Springer-Verlag Berlin Heidelberg 2011
Irradiating Heat in Virtual Environments: Algorithm and Implementation
195
objects in the absence of vision. Benali-KhoudjaI et al. in [1] showed a way the user can feel the thermal characteristic of a touched object. Other researchers in [5] focused their attention on heat transfer between a finger and a touched object, considering different materials and blood flux. Human skin can perceive the irradiating heat that an object is generating before touching it and therefore, with this information, the user can modify the action that he/she is performing. If for example we want to move in the direction of a switched on lamp, we can perceive the hot temperature the lamp is generating, before touching the lamp. Lamps, oven, engines, electronic devices or in general every system with a temperature different from absolute zero, generates irradiating heat. In this work we present a novel thermal algorithm to represent heat irradiating sources against a moving interest point. We implemented the algorithm in a virtual reality environment with a simple and low cost thermal device. Two different experiments were carried out to analyse how a user behaves reaching a hot object and discovering which is the hot surface of a die with two different side sizes. Finally, we developed a basic virtual application with multiple heat sources differentiating surface temperature and material, introducing a basic 3D element representing a lamp.
2 Thermal Modeling 2.1 Irradiating Heat Exchange Algorithm We are interested in finding which is the final temperature of an interest point under the influence of different irradiating heat sources inside a virtual environment. The aim of the algorithm is to keep into consideration factors like the physical surface of the object, its surface temperature and its material. At this stage we make a preliminary consideration: there is no generating air flux either natural, since we consider the ambient as closed, or forced. This because at this stage we don’t want to consider convection effect. We consider each examined object as grey body as like as humans skin, with homogeneous surface material without taking into account the object of the color itself. To be rigorous we should consider that the heat exchange between two grey bodies is continuous and becomes unimportant after a while. This is because the radiation decreases, bouncing between the two bodies with a reflection coefficient that for a grey body is less than 1 as explained in [10]. Like in acoustic studies, we should consider the case as a transitory stage and study all of the possible energetic exchanges step by step; in thermodynamics studies making a final balance of every exchange doesn’t create a relevant error, because such energetic exchanges take place very rapidly. Therefore we consider the system at its steady stage. In this way we can avoid external and influencing effects, and concentrate our attention on the irradiating effect of an object at the time we are getting closer. Taking into account the base formula of the irradiating heat as described in [2] we want to express now the heat irradiating quantity generated by each heat source like: 4 qi = σ Ai (Tskin − Ti4 )
(1)
where qi is each irradiated quantity generated by an object, σ is the Stefan-Boltzmann constant, Ai the exposed surface of the irradiating object Tskin the initial temperature of
196
M. Gaudina, A. Brogni, and D. Caldwell
the user’s skin and Ti the surface temperature of the irradiating object. The user’s skin will absorb only a part of the irradiated heat, because of its grey body characteristic. We can make a parallelism with the Coulomb attraction law of two electrical charges and therefore assume that the quantity is inversely proportional to the distance of the desired point with the irradiating surface. We therefore introduce the reflection coefficient and distance: 4 σ Ai (Tskin − Ti4 ) qi = (2) 1 1 2π d 2( aMaterial + aSkin − 1) where d is the distance between the interest point and the heat source, aMaterial is the absorption coefficient of the object material and aSkin is the absorption coefficient of the user’s skin, which sum minus 1 constitutes the reflection coefficient. From equation 2 we can say that each heat source can be summarised as function of the distance with the interested point, the surface of the object, the surface temperature and the material of the heat source. At this point we can say that the total heat quantity of each contribution on the user’s skin is: n
Q = ∑ qi (di , Ti )
(3)
i=1
If we consider the heat quantity perceived by user skin, we have to introduce the thermal capacity of the skin itself and the mass of the user finger: Q = cmΔ T
(4)
where c is the thermal capacity of the user skin and m is the mass of the finger. Writing equation 4 in therms of temperature, we may express the temperature over the skin as: Q − Tskin (5) cm At every instant equation 5 gives us the final temperature of a desired point over the user’s skin. This allows us to model a virtual ambient where objects can have irradiating thermal characteristics. This could take the user to a new level of interaction with the surrounding ambient and it allows involving experiences. Tdesired =
2.2 Position and Heat Source Orientation Each heat source will start influencing the desired point at a certain distance and will stop its contribution outside what we define influence area, shown in Figure 1. Another important aspect talking about heat sources, is how the considered irradiating objects are positioned in respect to the desired interest point. Assuming all of the heat sources are diffused reflecting as most of the grey bodies, we are interested to know which is the angle between the interest point and the irradiating physical exposed part of the heat source. For instance, a desired point is influenced by a spherical irradiating heat source wherever the interest point is positioned with respect to the sphere; by the way, if we consider a cube and we assign only one face a thermal irradiating characteristic, the interest point will be influenced only under a certain angle between the irradiating side
Irradiating Heat in Virtual Environments: Algorithm and Implementation
197
of the object and the interest point. More precisely, following the Lambert’s cosine law we introduced alpha that represents the angle between the centre of the interest point and the normal to the irradiating side of the irradiating object. We can therefore include this consideration in equation 2 and consider it as it follows: qi =
⎧ 4 −T 4 ) σ Ai (Tskin i ⎪ ⎪ 1 ⎨ 2π d 2 ( + 1 ⎪ ⎪ ⎩
aMaterial
0,
aSkin −1)
cos α , if lowerlimit < α < upperlimit (6) if outside
where lowerlimit and upperlimit represent the maximum irradiating angle range of the surface. In Figure 1 it is clearly visible that this depends on the surface conformation of the irradiating side of the object considered. If for example, we are behind the verse of the irradiating direction we will not perceive the influence of that particular object. In this way we implement an attenuation of the heat flux compared to the interest point on the normal of the considered surface. An important aspect, that at this stage we choose not to consider, is the presence of other objects between the irradiating object and the interested point and the influence heat source have on one on another to avoid further complications.
Fig. 1. Interest point receives influences if it is in the range of the irradiating side. Therefore the influence qi is not equal to 0.
2.3 Materials Different materials exchange with the ambient different quantities of energy. We perceive a temperature difference from an object of a material at a higher distance with respect to another material, but the temperature is the same for both. This is due to the different power of emissivity of each material that in equation 2 have been called reflection coefficients. Table 1 shows some coefficients which have been used. Table 1. List of emissivity coefficient of some materials Material Plastic Wood Iron Copper Glass
Coefficient 0.91 0.91 0.0014 0.03 0.89
198
M. Gaudina, A. Brogni, and D. Caldwell
After this consideration it is obvious that a material at a certain temperature is perceived less than another material with a significantly higher temperature at the same distance, but with a different emissivity characteristic. 2.4 Virtual Environment To test the irradiating algorithm, we prepared a virtual environment setup. VR projection is obtained with two Christie Mirage S+ 4000 projectors, synchronised with StereoGraphics CrystalEyes active shutter glasses. We use 4x2 m2 powerwall, and an Intersense IS-900 inertial-ultrasonic motion tracking system to sensorize the area in front of the screen; in this way the user’s head is always tracked. Finger tracking is also achieved with a set of 12 Optitrack FLEX:100 infrared cameras; since just one passive marker is set on the finger, this is not a critical task. The main 3D application in Figure 2 was developed with VRMedia XVR1 which handles graphics, scene behavior and input/output data sending. Devices and software exchange data with the main application through XVR internal modules was written in C++. 2.5 Hardware Device The aim of this work is to find the temperature on a interested point from an irradiating heat source to give the user a thermal perception. To do this we can use cutaneous thermal devices that are quite common and have been used for the last twenty years [3], and most of them are based on Peltier effect to give the user the thermal feedback. The problem of heat dissipating is to guarantee a good cooling down phase: this means that cumbersome heat sink need to be used. In [4] Yang et al. developed an interesting device to give surface discrimination and thermal feedback on the same point. The technique used to dissipate heat was liquid cooling. This allows fast temperature variations. We decided to follow a different strategy, developing a simple and low cost device, wearable but at the same time having a temperature variation limited to other solution. The device is composed by a Peltier Cell to generate warm/cold sensation. This thermo cell is attached to a little piece of copper that allows an analog thermal sensor to close the loop around the temperature generated. The upper side of the piece of copper is placed in contact with the user’s skin to generate the thermal sensation. An LM3S1968 evaluation board by Luminary Micro2 generates PWM signals to control, via PID controller, the H-Bridge connected to the thermo cell driving the current in both directions. This allows control of temperature up and down. Two 10x10x10 mm heat sinks and a DC fan gives us the possibility to cool down the hot face of the Peltier cell in a faster way. The system in Figure 2 is capable of increasing temperature of10.5◦C/sec and of cooling down at 4.5◦C/sec. The system can perform temperature variations in the range 15◦C − 75◦C.
3 Experiments and Data Analysis To understand in a more analytical way the algorithm implementation, we carried out two different experiments to test the algorithm and the hardware proposed. Ten 1 2
http://www.vrmedia.it/ http://www.luminarymicro.com
Irradiating Heat in Virtual Environments: Algorithm and Implementation
199
Fig. 2. On the left the testing thermal device. On the right the Virtual Environment setup and heat sources representation.
dexterous participants, with a mean age of 27.6±4 years old, have been selected with no known thermal diseases and with a low usage of their hands during their normal day activities. The experimental setup used is as previously described, hence the finger is tracked by the Optitrack cameras and a custom actuator generates the temperature variation needed. 3.1 Reaching an High Temperature Object In the first experimental session we asked the user to start 1.5 meters away from the screen and slowly move with his finger toward a 3D projected sphere, stopping when feeling thermal pain. The sphere with a radius size of 0.1 m could have three different temperatures 318(K)(45C), 338(K)(65C), 378(K)(105C) randomly distributed between all the ten participants having three trials each. Before each trial, the user has been asked to wait for 10 seconds and the thermal actuator has been cooled down to avoid thermal adaptation. We wanted to analyse if this irradiating algorithm could be used as an alarm or not. The first temperature threshold 318(K)(45C) is the limit observed for human thermal pain [7]. The other two temperature threshold have been chosen to represent an object that is really hot and the user should not touch. From the results shown in Figure 3 and Figure 4 we can see that, as expected, the minimal distance between the tracked finger and the center of the sphere is inversely proportional to the object temperature. In the 318(K)(45C) case most of the users touch the object before feeling any disturbance.
Fig. 3. Graph of the minimal distance between the tracked finger and the hot sphere
200
M. Gaudina, A. Brogni, and D. Caldwell
Fig. 4. Top view of the sphere object and the tracked position for each temperature threshold
With 338(K)(65C) and 378(K)(105C), the user feels the irradiated energy before touching the sphere and the distance felt is inversely proportional to the distance with the sphere. 3.2 Discovering an Heat Source The second experiment consists of the discovery of which is the irradiating heat face of a 3D die. Every face was enumerated like a real dice as explained in Figure 6. The experiment has been divided in two phases to better understand the limits we can encounter modelling a virtual environment regarding object dimensions. In the first part of the experiment the dice side was 0.2m and in the second 0.1m. For each participant the total number of trials was 6 and the irradiating face had been previously randomly assigned for each trial. The correctness percentage of the answers is different between the two dice. As expected its easier to discover which is the irradiating heat source with the big dice than with the smaller. In Figure 5 we can observe the correctness percentage of the two dice.
Fig. 5. Charts of the correctness percentage of the dice with two different side sizes with orientation as in Figure 6
Thus, we discovered, analysing the average time spent for each face in Table 2, that for a smaller object the discovering phase is faster. This is probably due to a greater flexibility in the user movements regarding little objects but this corresponds to a lower percentage of correctness.
Irradiating Heat in Virtual Environments: Algorithm and Implementation
201
Table 2. Comparison of the average time spent by the user over each face of the two dice Big Dice (sec) Small Dice (sec) Correct Answer Big Dice (sec) Correct Answer Small Dice (sec)
Face 1 10.05 88.7 67.79 81.33
Face 2 11.92 72.71 76.56 15.82
Face 3 88.74 119.5 76.07 27.21
Face 4 83.21 71.16 69.84 62.27
Face 5 51.9 55.09 41.97 30.27
Face 6 60.6 80.3 56.09 40.33
Fig. 6. The dice orientation respect the user’s point of view and the enumeration of the faces
The results obtained underline that the irradiating algorithm is working as expected. Users avoid touching objects with high temperatures and they can discover which is the heat generating side of a simple cube. This is justification for a better and deeper analysis and a wider and more comprehensive study, because it could mean that the thermal feedback could be used in an operation like environment mapping or to describe a virtual environment in a more realistic way.
4 Testing Application The implementation in Figure 2 consists of the placement of up to six irradiating spheres with different temperature values in range between 10◦C − 75◦C. The choice of using spheres is to keep the system at this stage as simple as possible. The user can freely move around the virtual scene and feel the temperature on the skin of the finger. In table 3 are shown the parameters used for the application. Table 3. List of parameters used in the testing application Parameter Stefan-Boltzmann constant Skin Thermal Capacity Finger Mass Skin Thermal Emissivity coefficient Sphere Radius Skin Temperature
Value 5.67*10−8 W m−2 K −4 418.6 J/(kgK) 0.01Kg 0.85 0.05m 310K
These are commonly used parameters in thermodynamics and the finger mass is derived by the average weight of a hand [9]. The algorithm implementation suggests that the algorithm works well with this kind of object, having different temperature according to the algorithm control variables surface temperature, surface exposition, distance and material. We tried different combinations of these variables and we noticed that the algorithm implemented works well.
202
M. Gaudina, A. Brogni, and D. Caldwell
Fig. 7. On the left a real case study of the measurement of temperature values of a real lamp, on the right a virtual representation of a bulb lamp
The algorithm takes into account hot and cold surfaces. If the user gets within the influence area of an object with a temperature higher than his skin temperature, the user will feel an increase of temperature. In addition, if he gets closer to a lower temperature irradiating object he could feel a temperature decrease. This is due to the heat quantity exchanged by the object with respect to the user’s skin, we know indeed that the heat direction is from the hot body to the cold body. To test a real case, we have implemented a 40W filament bulb lamp. Other than considering the temperature of the filament itself - we can assume it is around 2000K we take into account the external temperature of the glass bulb. In this case we assumed that the temperature is around 423K ( 150C ). The behaviour expected was that this could have been the same as interacting with a sphere. The virtual representation of the lamp is made with a Autodesk 3D Studio3 mesh imported inside XVR. The radius value is the same as the previously used sphere and the material used in this case is glass. In Figure 7, the virtual model is compared with a real 40w bulb lamp and the output value of a thermal sensor is compared to the output values of the algorithm.
5 Conclusion With this work we defined and tested an algorithm to model thermal characteristics of irradiating objects for virtual environments. Physical characteristics such as the surface and material of the considered irradiating object have been taken into account. Custom electronics have been created to test the overall system. Two different concepts reaching and discovering have been successfully studied using the algorithm. The results show that users do not touch hot objects feeling a temperature disturbance. They could also understand with a high percentage the correct irradiating face of a die. We then implemented the algorithm in a basic virtual application. Future works are in the direction of improving the algorithm, implementing other characteristics to better represent the heat exchange between heats sources and an interested point. Natural and forced heat convection are important topics to be implemented in order to create much more realistic environments. Regarding the custom electronics used, this needs to be improved in 3
http://www.autodesk.com
Irradiating Heat in Virtual Environments: Algorithm and Implementation
203
speed responses, current consumption and physical dimensions and to be integrated into a multimodal interaction system. The proposed work could be the base of psychophysical studies about interaction between humans and their environment, where thermal characteristics could help the user better understand the environment in which he/she is operating.
References 1. Benali-Khoudja, M., Hafez, M., Alexandre, J.M., Benachour, J., Kheddar, A.: Thermal feedback model for virtual reality. In: International Symposium on Micromechatronics and Human Science. IEEE, Los Alamitos (2003) 2. Bonacina, C., Cavallini, A., Mattarolo, L.: Trasmissione del calore. CLEUP, Via delle Fontane 44 r., Genova, Italy (1989) 3. Caldwell, D., Gosney, C.: Enhanced tactile feedback (tele-taction) using a multi-functional sensory system. In: Robotics and Automation Conference. IEEE, Los Alamitos (1993) 4. Yang, G.-H., Ki-Uk Kyung, M.S., Kwon, D.S.: Development of quantitative tactile display device to provide both pin-array-type tactile feedback and thermal feedback. In: Second Joint EuroHaptics Conference and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems. IEEE, Los Alamitos (2007) 5. Guiatni, M., Kheddar, A.: Theoretical and experimental study of a heat transfer model for thermal feedback in virtual environments. In: International Conference on Intelligent Robots and Systems. IEEE, Los Alamitos (2008) 6. Ho, H.-N., Jones, L.: Contribution of thermal cues to material discrimination and localization. Percept Psychophys 68, 118–128 (2006) 7. Kandel, E., Schawrtz, J., Jessell, T.: Principles of Neural Sience. McGraw Hill, New York (2000) 8. Lin, M.C., Otaduy, M.A.: Haptic Rendering, Foundations,Algorithm and Applications. A.K. Peters, Ltd., Wellesly, Massachusetts (2008) 9. Winter, D.A.: Biomechanics and motor control of human movement, 3rd edn. John Wiley and Sons, Chichester (2004) (incorporated) 10. Yunus, C., Boles, M.A.: Thermodynamics: An Engineering Approach Sixth Edition (SI Units). McGraw-Hill Higher Education, New York (2009)
Providing Immersive Virtual Experience with First-Person Perspective Omnidirectional Movies and Three Dimensional Sound Field Kazuaki Kondo1 , Yasuhiro Mukaigawa2, Yusuke Ikeda3 , Seigo Enomoto3 , Shiro Ise4 , Satoshi Nakamura3 , and Yasushi Yagi2 1
Academic Center for Computing and Media studies, Kyoto University Yoshida honmachi, Sakyo-ku, Kyoto, Japan 2 The Institute of Scientific and Industrial Research, Osaka University 8–1 Mihogaoka, Ibaraki-shi, Osaka, Japan 3 Spoken Language Communication Group, National Institute of Information and Communications Technology 3–5 Hikaridai, Keihanna Science City, Japan 4 Graduate school of engineering, Department of Architecture and architectural engineering, Kyoto University Kyotodaigaku-katsura, Nishikyo-ku, Kyoto, Japan
[email protected], {mukaigaw,yagi}@am.sanken.osaka-u.ac.jp, {yusuke.ikeda,seigo.enomoto,satoshi.nakamura}@nict.go.jp,
[email protected]
Abstract. Providing high immersive feeling to audiences has proceeded with growing up of techniques about video and acoustic medias. In our proposal, we record and reproduce omnidirectional movies captured at a perspective of an actor and three dimensional sound field around him, and try to reproduce more impressive feeling. We propose a sequence of techniques to archive it, including a recording equipment, video and acoustic processing, and a presentation system. Effectiveness and demand of our system has been demonstrated by ordinary people through evaluation experiments. Keywords: First-person Perspective, Omnidirectional Vision, Three Dimensional Sound Reproduction, Boundary Surface Control Principle.
1
Introduction
High realistic scene reproduction provides audiences rich virtual experiences that can be used for sensory simulators and multimedia amusements. For example, current cinemas install advanced capturing method, video processing, audio filtering, presentation system to give immersive feeling as they were in the target scene. The most important issue for providing such feeling is to capture and present target scenes as these were. In this paper, we focus on following three functions for that. R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 204–213, 2011. c Springer-Verlag Berlin Heidelberg 2011
Providing Immersive Virtual Experience
205
Preserving observation perspective: observation perspective can be categorized into third-party perspective and first-person perspective The former corresponds to objectively capturing a scene, which can effectively conveys structure of the scene and the story line. The latter perspective can be captured by a recording device placed at a character’s position, which is good at providing immersive feeling. Examples are to attach a compact video camera to ones head, and actors/actress role as if a video camera was a person. Preserving wide range(Omnidirectional) visual feeling: A wide range video boosts realistic feeling. A Panoramic video on a wide screen is a typical approach. But it is not always enough because of not considering temporal change and individual difference of audience’s observation direction. We focus on capturing and displaying omnidirectional videos in order to adapt to these situations. Although reconstruction of a 3D is also a effective approach, we here treat only an omnidirectional property instead of the combination of them. Preserving 3D acoustic feeling: Audio reality strongly depends on sounds coming from which directions and how distances. Thus it is important to reproduce 3D sound field including positions of sound sources. Usual approaches are using stereo or 5.1 ch system, but these provide insufficient reproduction for a specific position and direction. We focus on relaxing those listening limitations as watching one discussed in the visual feeling. Although these functions have been individually attacked in conventional approaches, we do not find any total system that covers all of them. In this paper, we design a special recording device, discuss media processing, and develop a presentation system, in order to satisfy the three functions.
2 2.1
Recording System Wearable Omnidirectional Camera
We here assume three requirements that a video recording device should have. - It can capture high resolution and uniform omnidirectional videos. - Its optical center and a viewpoint of a wearer are at the same position. - It can be easy to wear and to act for an enough long time. Unfortunately, conventional approaches to capture outdoor scenes as omnidirectional videos [5,7] do not satisfy all of the above requirements. These did not consider to capture a scene from a character’s viewpoint, and approximate the viewpoint matching with omnidirectional cameras mounted on the head. Furthermore, needs of an additional equipment for recording and power supply prevent the third requirements. A wearable omnidirectional camera has been proposed [9] for life-log recording. But it also has the viewpoint mismatch, and additionally low resolution problem. For these reason, we had proposed a special wearable camera system named FIPPO [10]. FIPPO is constructed by four optical units consisting of a handy type video camera and curved and flat mirrors (Fig. 1(a)). It captures omnidirectional videos from first-person perspective without any additional equipments and wired supply. Following descriptions briefly explain design of a single optical unit in FIPPO.
206
K. Kondo et al.
We start at an objective projection defined by correspondences between pixels on the image plane and rays running in the scene. Considering uniform resolution of the panoramic scene whose FOV are [θmin , θmax ] along azimuth angle and [tanφmin , tanφmax ] along elevation, respectively, the objective projection is formulated as ⎡
⎤ tan( Uu (θmax − θmin ) + θmin ) Vs (u, v) = ⎣ Vv (tanφmax − tanφmin ) + tanφmin ⎦ 1
(1)
in the world coordinate system. (u, v) is the position of a pixel on the image plane whose size is U × V . It determines also a corresponding camera projection Vc (u, v) = [u, v, −f ]t with its focal length f . They, Vs and Vc , should relate a target curved mirror to its reflection ; An objective normal vector field Nd (u, v) bisects the angle consisting of Vs and Vc . It is obtained by
1 Nd = N (N [Vs ] + N [RVc ]) . 2
(2)
x with a vector normalizing operator N [x] = ||x|| , and external parameters of the camera P = [R t]. The mirror shape is formed so that its normal field is equal to Nd . We use the linear algorithm [6], which equalizes the mirror shape S(u, v) as the cross products of four-degree spline curves. S(u, v) is formulated as
S(u, v) = RVc (u, v)
Cij fi (u)gj (v) + t
(3)
i,j
where Cij and fi (u), gj (v) are control points on the spline curves and fourdegree spline bases, respectively. We obtain an optimal shape by solving the ∂S linear equations about Cij that are stacks of ∂S ∂u · Nd = ∂v · Nd = 0 because the optimal shape should be perpendicular to the desired normal vector field Nd . Although this algorithm certainly minimizes errors on the normal vector field, it tends to form a bumpy surface. So we apply a smoothing procedure to the shape formed by this algorithm. The obtained mirror approximates the objective projection. Thus, we check the degree of the approximation. It is evaluated by sufficiency : how much does it cover the required FOV, redundancy : how much does it cover outside the FOV, and uniformity : how uniformly does it distribute the image. If the approximation is sufficient, the design advances to the next step. If not, we adjust the camera parameters to reduce projection errors and return to Eq. (2). Aberrations of the designed optics need to be also checked because the mirror design algorithm does not consider image focusing. The amount of aberration can be estimated with a spot diagram, which is a spread image of a target object on the image plane. We can construct spot diagrams by tracing the rays that go through an aperture of the lens unit. If the aberrations appear to prevent image focusing, we adjust the camera parameters to reduce the aberration and return to the first step. The design process continues until the aberrations are acceptably small.
Providing Immersive Virtual Experience
(a) FIPPO
207
(b) Microphone array and a recorder
Fig. 1. Overview of the recording system
2.2
Microphone Array
Methods which have been used for recording and reproducing first-person perspective sound field include head and torso simulator(HATS) and recording with microphones worn on listener’s ears. However, in these methods, it is impossible to freely move a listener’s head, because the sound signal is reproduced at only two points around the ears. In this paper, we used a sound reproduction system based on boundary surface control(BoSC) principle so that it gives the listener an experience of the sound field from first person’s perspective with omnidirectional movies. Original BoSC system[8] has 70ch microphone array. It is difficult to make microphone array-aided recording of the first-person perspective sound field accompanied by free body movements. Therefore, we simplified the system through reducing the number of channels. The recording system has eight omnidirectional microphones which are installed horizontally around the head of a wearer. It is recommended that the height of microphone is on the same level with the person’s ears. The microphones are installed slightly over the top of the head in order to keep the microphones away from the mirrors of FIPPO (Fig. 1(b)). The system is small enough and allows the person to freely move while wearing it. One of the factors contributing to the small system is that the signal is recorded in a handheld PC through a small Bus-Powered USB A/D converter.
3 3.1
Media Processing for Making Contents Movie Image Processing for Omnidirectional Panorama
Correcting Image Warping. Images captured by FIPPO still have some geometric warps, despite of uniform projection being configured as objective one. Calibrations of the distorted projections produced by the entire optical system, including the curved mirrors, allow the images to be corrected. The calibrations were conducted with a particular scene construction in order to homologize image pixels and rays in the world. FIPPO placed at the front of a wide flat panel monitor captures coded patterns that give correspondences between each pixel on the image plane and each 2D position on the monitor. Measurements for
208
K. Kondo et al.
(a)
(b)
(c)
(d)
(e) Fig. 2. A result of image unwarping (panoramic). (a)-(b) Input images for each direction, left, front, right, and back (e) Unwarped and mosaiced image.
planes at several depths are necessary for pixel-ray correspondence. When measurements are taken at two depths whose distances d are given, the pixel-ray correspondences can be formulated by ⎤ ⎡ x1 (u, v) − x2 (u, v) p1 − p 2 = ⎣ y1 (u, v) − y2 (u, v) ⎦ ray(u, v) = (4) d d where ray(u, v) = [rx , ry , rz ]t , and pi = [xi , yi ]t denote a ray in the world corresponding to a point(u, v) on the image plane, and a 2D position on the monitor plane, respectively. Figure 2 shows an example of correcting image warping based on the calibration results. Eq. (4) says that directions of rays are determined, but not their position. Thus note that the calibration does not work well for near scene because FIPPO is designed to be approximated as a single viewpoint optical system. Correcting Color Space. It is also necessary to correct chromatic differences that are mainly attributable to individual differences in the cameras. We solved this problem by transforming color spaces under the assumption of an affine transformations between them. It enough approximates relationship of color spaces produced by the same model cameras used in FIPPO. The affine transform is related by ⎤ ⎡ ⎡ ⎤ ⎡ ⎤ Rn Rm p11 p12 p13 p14 ⎢ ⎥ ⎣ Gm ⎦ = ⎣ p21 p22 p23 p24 ⎦ ⎢ Gn ⎥ (5) ⎣ Bn ⎦ Bm p31 p32 p33 p34 1 where [Rk , Gk , Bk ]t and pij are RGB colors of the same object on the k-th camera and coefficients of the affine transformation, respectively. Since Eq. (5) forms three linear equations for one color correspondence, at least four color correspondences are necessary to determine twelve unknowns in pij . Figure 3 shows the
Providing Immersive Virtual Experience
(a)
(b)
209
(c)
Fig. 3. Chromatic correction. (a) Color checkers captured by different cameras. (b) Mosaiced images without the correction. (c) That with the correction.
results of chromatic correction. The images in the figure show a neighborhood of image mosaics. The vertical line at the horizontal center corresponds to the border between two contiguous images. Blue components that were relatively strong assume natural coloring after the correction. 3.2
Reconstructing 3D Sound Field
Boundary Surface Control Principle. It follows from Kirchhoff-Helmholtz integral equation that a control of sound pressures and sound pressure gradients on a boundary of region means a control of sound pressures inside the boundary. Boundary surface control principle removes the problem of ideal sound sources and the restriction of free sound field using Kirchhoff-Helmholtz integral equation and multi-channel inverse system[2]. When applied in the 3D sound reproduction system, the microphones are set at arbitrarily-chosen points within the 3D sound field, and by reproducing the sound pressures recorded at those points in a different location, it becomes possible to accurately reproduce sound field of the area enclosed with the microphones. Therefore, it is different from common transaural system and binaural system. In BoSC system, a listener freely moves his body listening to the sound field which is consistent with the original sound field. Design Method of Inverse System. Here loudspeakers controlling sound pressures and points which are targets of sound pressure control are referred to as “secondary sound sources” and “control points” respectively. The number of secondary sources and control points are denoted by M and N respectively. Frequency transfer characteristic between ith sound source and jth control point is denoted by Gji (ω). Recorded signal at primary sound field, output signal from sound source and measured signal at control points are denoted by Xj (ω), Yi (ω) and Zj (ω) respectively. The relationship between inputs and outputs of sound reproduction system is as follows. Z(ω) = [G(ω)]Y(ω) = [G(ω)][H(ω)]X(ω) T
(6) T
where, X(ω) = [X1 (ω), · · · , XN (ω)] , Y(ω) = [Y1 (ω), · · · , YM (ω)] , Z(ω) = [Z1 (ω), · · · , ZN (ω)]T ,
210
K. Kondo et al.
⎡
⎤ ⎡ ⎤ G11 (ω) · · · G1M (ω) H11 (ω) · · · H1N (ω) ⎢ ⎥ ⎢ ⎥ .. .. .. .. .. .. [G(ω)] = ⎣ ⎦ and [H(ω)] = ⎣ ⎦. . . . . . . GN 1 (ω) · · · GN M (ω) HM1 (ω) · · · HMN (ω) The purpose of designing inverse filter in the sound reproduction system is to find the inverse filter [H(ω)] of [G(ω)]. When a small error included in X(ω) and variation of system transfer function[G] largely effect the value of Z(ω), inverse filter [H(ω)] becomes unstable. We, thus, designed inverse filter using a regularization which can continuously change the parameter to ease the instability.
4
Presentation System: Omnidirectional Theater
Omnidirectional movies should be displayed all around on viewers with a wide FOV in order to provide immersive feeling. Researchers have proposed omnidirectional display systems for such a situation. These are categorized into personal use equipment[3] and dome or room type systems for multiple persons[1]. We developed the latter type omnidirectional theater to emphasize that multiple audiences share the same feeling. The theater consists of four projectors and four 3m × 2m flat screens standing like walls of a square room(Fig. 4). Eight loudspeakers surrounding a listener reproduce a recorded sound field based on BoSC principle. The two loudspeakers are set behind each screen which is perforated for a sound. We measured the impulse responses between each loudspeaker and the microphone array which is set inside the theater and has the same alignment with the microphone array which was used for recording. We calculated the inverse filter which has 4096 points length. In order to simplify the calculation of the inverse system, acoustic panels and carpets are installed on the ceiling and the floor of the theater respectively. The sound field inside the microphone array is reproduced by the loudspeakers which have the convoluted signal of calculated inverse filter and recorded signal as the output signal. It is expected that the sound field of a larger region is reproduced so that it is the same as the original sound field[4]. It is also expected that the sound field is more accurately reproduced because the inverse filter compensates not only an
(a)
(b)
(c)
Fig. 4. Omnidirectional theater. (a) Omnidirectional video display system. (b) 3D sound reproduction system. (c) Inner of the theater.
Providing Immersive Virtual Experience
211
Table 1. Contents of the questionnaire Questions about realistic sensation(five-grade scales). A. Did you got immersive feeling as you were in the scene ? 1. Not at all 2. Not much 3. As usual 4. Fairly 5. Much B. How much did you feel reality compared with a single front movie ? 1. Not at all 2. Not much 3. As usual 4. Fairly 5. Much C. How was the image quality ? 1. Bad 2. Not good 3. Normal 4. Good 5. Great D. How much did you feel reality compared with a stereo sound ? 1. Not at all 2. Not much 3. As usual 4. Fairly 5. Much E. How was the audio quality ? 1. Bad 2. Not good 3. Normal 4. Good 5. Great Questions with free-form spaces. F. What additional features are necessary for the current system ?
attenuation caused by the sound screen but also the acoustic characteristics of a theater.
5 5.1
Experiment Configurations
We validated our scene reproduction proposal from viewpoint of immersive feeling through a virtual experiment. Presented first-person perspective contents were (1) daily scenes in the park including fall foliage, water currents, and other persons, and (2) basketball game scenes like shown in Fig. 2. The experiment had been executed for each group consisting of 3-5 persons in an outreach event held at the National Museum of Emerging Science and Innovation in Tokyo. They experienced the 3 minute video content consisting of the scenes addressed in the above. After that, we conducted questionnaires whose items are listed in Table 1. These are five-grade scale questions and questions with free-form answer spaces. The former is related to realistic sensation that the viewers felt. The latter is to obtain objective opinions about demands of firstperson perspective omnidirectional movies and issues that should be improved. We got about 750 valid responses from more than 1, 100 subjects who experienced our system for three days. Since the subjects were in a wide age range, and included groups such as couples, friends, and families, we can expected general and objective evaluations. 5.2
Results and Discussions
We can see that most subjects felt highly realistic sensations from the result shown in Fig. 5(a), which demonstrates the effectiveness of first-person perspective omnidirectional movies. Unfortunately, image quality got a low score. One
212
K. Kondo et al.
5 4
3.81
3.97
3.82 3.13
3
2.49
2 1 Question A
Question B
Question C
(a)
Question D
Question E
Quality of the medias - Improve image quality - (resolution and contrast) Video quaking - Little sick, nauseous, dizzy - Film without head swinging Reproduction of spatial sense - Desire sense of correct depth - Recognize shapes of objects and scenes Construction of the screen - A cylindrical, a doom, and eight flat screens - Additional overhead screen (b)
Fig. 5. (a) Results of the five-grade scale questions. 1 and 5 denote lowest and highest scores, respectively. (b) Representative answers written in the free-form spaces.
reason is optical construction of FIPPO. Since rays from a scene are reflected multiple times to be projected on image planes, light quantity decreases in each reflection, resulting in low quality images. The mirrors used in the prototype FIPPO are covered with low reflective material. This problem can be solved by using high reflective material. The other reason is less image contrast at the display stage. There are some causes such as output contrast of the projectors and inter reflections between the screens. The representative opinions written in the free-form space are listed in Fig. 5(b). Video quaking that means image shakes and blurs caused by rapid egomotions were pointed out as problems that should be solved. Some subjects said that they felt nauseous or got dizzy. In a way, our system can truly reproduce a first-person perspective, including head swing, but this is actually worse to be displayed to static viewers. Recording a head state with a gyro sensor or ego-motion estimation algorithms that use horizontal cyclic property of omnidirectional movies will help with video stabilization. Some subjects said that a cylindrical screen should be used instead of the four flat screens that gives an incorrect sense on depth. A fundamental approach is needed to provide a spatial sense not considered in the proposed method. Omnidirectional scenes must be spatially constructed, which requires special capturing equipment. A three dimensional display all around viewers is also a challenging issue.
6
Conclusion
In this paper, we proposed a virtual experience system that provides high realistic feeling to audiences with omnidirectional videos and 3D sounds captured from a first-person perspective. Contents data are captured by a specially designed wearable equipment consisting of catadioptric imaging systems and a microphone array. Audio and visual media processing for providing high realistic feeling and a presentation system are also discussed. Its performance have been evaluated by experiences of more than 1,000 ordinary persons. At the same time of expected
Providing Immersive Virtual Experience
213
results on high realistic and immersive feeling, we got several problems such as video quaking, media quality, and sense of depth given by video. These problems are now being attacked by other proposals. Thus the combination with them will give more attractive virtual experiences.
Acknowledgment This work is supported by the Special Coordination Funds for Promoting Science and Technology of Ministry of Education, Culture, Sports, Science and Technology. The authors wish to thank SANYO Electric Co. Ltd. for providing the specially modified portable video cameras.
References 1. Cruz-Neria, C., Sandin, D.J., DeFanti, T.A.: Surround-Screen Projectorion-Based Virtual Reality: The Design and Implementation of the CAVE. In: Proc. of Int. Conf. on Computer Graphics and Interactive Techniques (SIGGRAPH 1993), pp. 135–142 (1993) 2. Ise, S.: A principle of active control of sound based on the Kirchhoff-Helmholtz integral equation and the inverse system theory. The Journal of Acoustical Society of Japan 53(9), 706–713 (1997) 3. Hashimoto, W., Iwata, H.: Ensphered vision: Spherical immersive display using convex mirror. Trans. of the Virtual Reality Society of Japan 4(3), 479–486 (1999) 4. Kaminuma, A., Ise, S., Shikano, K.: Sound reproduction-system design considering head movement. Trans. of the Virtual Reality Society of Japan 5(3), 957–964 (2000) (in Japanese) 5. Yamazawa, K., Takemura, H., Yokoya, N.: Telepresence system with an omnidirectional HD camera. In: Proc. of Fifth Asian Conference on Computer Vision (ACCV 2002), vol. II, pp. 533–538 (2002) 6. Swaminathan, R., Nayar, S.K., Grossberg, M.D.: Designing of Mirrors for catadioptric systems that minimize image error. In: Proc. of IEEE Workshop on Omnidirectional Vision, OMNIVIS (2004) 7. Ikeda, S., Sato, T., Kanbara, M., Yokoya, N.: Immersive telepresence system with a locomotion interface using high-resolution omnidirectional videos. In: Proc. of IAPR Conf. on Machine Vision Applications (MVA), pp. 602–605 (2005) 8. Enomoto, S., Ikeda, Y., Ise, S., Nakamura, S.: Three-dimensional sound field reproduction and recording system based on the boundary surface control principle. In: The 14th Int. Conf. on Auditory Display, pp. o 16 (2008) 9. Azuma, H., Mukaigawa, Y., Yagi, Y.: Spatio-Temporal Lifelog Using a Wearable Compound Omnidirectional Sensor. In: Proc. of the Eighth Workshop on Omnidirectional Vision, Camera Networks and Non-classical Cameras, ONIVIS 2008 (2008) 10. Kondo, K., Mukaigawa, Y., Yagi, Y.: Wearable Imaging System for Capturing Omnidirectional Movies from a First-person Perspective. In: Proc. of The 16th ACM Symposium on Virtual Reality Software and Technology, VRST 2009 (2009)
Intercepting Virtual Ball in Immersive Virtual Environment Massimiliano Valente, Davide Sobrero, Andrea Brogni, and Darwin Caldwell Advanced Robotics Dept. - Istituto Italiano di Tecnologia, Genoa, Italy {massimiliano.valente,davide.sobrero,andrea.brogni, darwin.caldwell}@iit.it Abstract. Catching a flying ball is a difficult task that requires sensory systems to calculate the precise trajectory of the ball to predict its movement, and the motor systems to drive the hand in the right place at the right time. In this paper we have analyzed the human performance in an intercepting task performed in an immersive virtual environment and the possible improvement of the performance by adding some feedback. Virtual balls were launched from a distance of 11 m with 12 trajectories. The volunteers was equipped only with shutter glasses and one maker on backhand to avoid any constriction of natural movements. We ran the experiment in a natural scene, either without feedback or with acoustic feedback to report a corrects intercept. Analysis of performance shows a significant increment of successful trials in feedback condition. Experiment results are better with respect to similar experiment described in literature, but performances are still lower to results in real world. Keywords: Virtual Reality, Ecological Validity, Interceptive Action.
1 Introduction In real life, interaction with different objects is driven by a set of sensory stimuli, first of all visual and haptic ones: if we lose some of them our performances decrease. When we want to interact with a moving object, we need to know its physical characteristics and its space displacement so that our movements toward the target are well directed. In particular, catching a flying ball is a difficult task that requires our sensory system to calculate the precise trajectory of the ball to predict its movement, and our motor system to drive the hand in the right place at the right time. Such characteristics set makes this task suitable for a good evaluation of human performances in terms of precision, immersion and adaptation in virtual reality. The aim of our study is to design a more natural way of interaction and, thus, to evaluate if the performance increases and the goal-directed movement is more accurate, and overall to evaluate the human reactions in a virtual environment. In our experiment, we expect that the natural wave to reach a flying ball produces better performance with respect to the same task performed with other handed devices. Thus, besides the performance results, we have also evaluated the characteristics of the hand movement, to certify the similarity of the trajectory with respect to the same task in a real environment. R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 214–222, 2011. c Springer-Verlag Berlin Heidelberg 2011
Intercepting Virtual Ball in Immersive Virtual Environment
215
The remind of the paper is organized as follows: in Sec. 2 we outline some relevant related work in literature; in Sec. 3 we describe the design of experiment; in Sec. 4 we present the results obtained from the subjects performing the experiment and in Sec. 5 we summarize the most relevant results and discuss some possible future works.
2 Related Works The spatial perception is important variables in our task, we must consider the potential causes of different perception between real-word and virtual environment like graphic characteristics and differences between natural vision and perception in virtual environment: for more details, see Murgia and Sharkey [10]. In particular, distance perception is influenced by these perceptive differences. Different previous works report that object’s distance from observer is underestimated more in virtual environment than in real [1,14]. Literature on reaching ball in physical world is divided in three research areas: the outfielder problem, estimation of reaching and catching fly ball. The outfielder problem regards the movement to position of ball’s fall [5,9]; the test of the outfielder problem is translated by Fink et. al in virtual environment with use of HMD [7]. Experiments of estimation of reaching regard the indication of balls positions before or after balls passages [12]. Catching fly ball studies analyze the trajectory perception in binocular and monocular conditions [8], the performance with several velocity [13], the performance with several angle trajectories [4]. The literature on catching or intercepting balls in immersive virtual reality is addressed on all body movement’s analysis sport performance both wit HMD [7] and in CAVE [2,3,6]. Zaal and Michaels [15] studied the judging of trajectories and intercepting movements for intercepting task in CAVE, they also made analysis of intercepting performances but their results show that the volunteers intercepted only the 15% of balls. They designed their experiment without any environment and they captured hand movements with a wand then the volunteers held. They used only throws to subjects bound without different angles of approach.
3 The Experiment We plan our experiment starting from a typical training task for baseball players: to catch a ball thrown from an automatic system. We designed a simple virtual environment where the subject can perform a similar repetitive task. 3.1 System Setup We carried out our experiment in an immersive virtual environment system: the room was equipped with two stereographic projectors Christie Mirage S+3K that allow visualizing a wide scene on a screen of 4 m of length and 2 m of height. The system is integrated with the IS900 from Intersense: a 6-DOF system for wide area tracking; we used it to track the position of the head of the subject.
216
M. Valente et al.
We also used the Optitrack FLEX:V100r2 motion capture system, with infrared cameras to track the subject hand with a passive marker fixed on his right back hand. The graphical aspect of the experiment was implemented using the XVR1 framework for developing virtual reality applications . We used in addiction a physic simulation engine (Physx2) to calculate in real time the ball trajectory as real as possible. 3.2 Procedure and Design The virtual environment was composed of a green grass with on a wooden board 11 m away from the observer. We placed of the board every 2 m three black point, to indicate the different sources on the throws. A picture of the starting setup of the experiment is shown in Fig. 1.
Fig. 1. The initial setup: the ball is in front of the subject
At the beginning of the experiment, the volunteer stood in front of the screen with his feet on a blue line posed one meter away from the screen. A virtual softball ball, of 10 cm diameter, was placed between the screen and the subject, at subject’s chest height, to make clear the actual dimension of the ball. During the experimental sessions, the subjects were asked to try to intercept the virtual softball ball with the right hand. In every trial, the ball was thrown with constant initial speed, about 10 m/s, from one of the three black points randomly. The ball could have three different elevations from the floor (high 38.5◦, central 37◦ and low 35.5◦ ) and four different angles respects to the subject (azimuth), as explained in Table 1. We designed the trajectories through these parameters to obtain four classes of arrival area: L-balls for balls arriving on the left of the subject, C-balls for ball arriving in 1 2
www.vrmedia.it www.nvidia.com
Intercepting Virtual Ball in Immersive Virtual Environment
217
Table 1. Different azimuth angles for trajectory Azimuth L-balls C-balls R-balls RR-balls Left hole 9.3◦ 10.3◦ ◦ Central hole −1.0 0.0◦ ◦ Right hole −11.3 −10.3◦
11.3◦ 1.0◦ −9.3◦
12.3◦ 2.0◦ −8.3◦
front of the subject, R-balls for balls arriving on the right and RR-balls for the ones arriving on the extreme right, still reachable from the hand subject. In Fig. 2 we show the idealized target position. We tried with some volunteers trajectories arriving to the extreme left, but this throws came out hard to intercept because the balls was out of range of the right hand.
Fig. 2. Idealized target position is indicated by balls. The arrow shows the hand in start position and the marker on back hand.
The subject was asked to stay in fixed position, its feet on the blue line and the right hand on its navel, in front of the screen, before the ball starts and to move the right hand to intercept the ball as quickly as possible when the ball starts to move. To reach the ball far from him, he could move the body, but he could not move his feet. The experiment was made up of five sessions: one training session with ten throws and four experimental sessions of ninety throws each, for a total of 370 trials for subject.
218
M. Valente et al.
3.3 Participants We performed the experiment with twenty volunteers (thirteen men and seven women) between 25 and 32 years of age, with height between 1.65 m and 1.80 m. All the participants reported normal or correct to normal vision. All the subjects right handed, as checked with the Edinburgh Handedness Inventory [11]. The volunteers reported nothing or little previous virtual reality experience. Five people reported experience in ball games such as basketball, volleyball or tennis and all men reported experience in soccer (but not as goalkeeper). The volunteers participated in to this experiment after giving informed consent. We divided the volunteers into two random group of ten people. The first group did not receive any kind of feedback, either visual or other, for rights intercepting ball during the experiment (NoFeedback Group). The second group had sounds feedbacks, one for rights catching ball and one for fail, during the experiment (Feedback Group). Moreover, at the end of every session the subjects of this group were informed of their percentage of success.
4 Results We recorded data from the head sensor for the head movements, the position of the hand during the ball throwing and the position of the ball during the fly. We have examined subjects catching behavior and the immersion in virtual environment degree. We analyzed the percentage of successful catches and we compared results of NoFeedback Group with results of Feedback Group. In addiction we analyzed the hand movement characteristics (peak of maximum velocity, latency before start the movement and time to catching) to verify if our results can be comparable with the same data found in experiments in real word showed in literature. 4.1 Catching Performance To assess the success in the catching of the balls, we calculated at each time step the distance between the marker placed on subject’s hand and the center of the virtual ball, when the ball was in front of the hand and not behind. We assessed a successful catch (hit) when this distance was less than 10 cm. We calculated the global hit probability for both groups of subjects, the hit probability for all trajectories and hit probability for the 12 intercept areas. A summary of the performances for all intercept areas can be found on Table 2. We can see that the performance without acoustic feedback is significantly lower than performances helped by a feedback. The average of performance of No feedback Group was 0.53 with a standard error of 0.014, the performance of Feedback Group was 0.61 with standard error of 0.012 (p < 0.001). The graphs in Fig. 3 show mean and standard error for all types of throw. We can observe that the performances of Feedback group are always better than performances of NoFeedback group for throws from left hole (hit probability: 0.61 vs. 0.5) and central
Intercepting Virtual Ball in Immersive Virtual Environment
219
Table 2. Results L
C
R
RR
Mean SEM Mean SEM Mean SEM Mean SEM High
Fb 0.557 0.045 0.637 0.041 0.651 0.035 0.581 0.053 NoFb 0.488 0.044 0.599 0.047 0.568 0.049 0.489 0.039
Central
Fb 0.657 0.038 0.740 0.030 0.742 0.029 0.612 0.041 NoFb 0.567 0.061 0.632 0.039 0.644 0.036 0.567 0.050
Low
Fb 0,443 0,041 0.586 0.034 0.569 0.036 0.520 0.036 NoFb 0.392 0.055 0.409 0.058 0.536 0.045 0.468 0.044
hole (hit probability: 0.64 vs. 0.53). This difference is not present for throws from right hole: the NoFeedback group’s performances reach the Feedback group’s performances (hit probability: 0.57 vs. 0.57).
Fig. 3. Hit success percentage of all throws. (A) throws from right hole. (B) throws from central hole. (C) throws from right hole.
ANOVA analysis shows significant effect of the elevation factor in both groups. For the NoFeedback group the effect is significant between 38.5◦ and 35.5◦ with p = 0.017 and between 37◦ and 35.5◦ with p > 0.001. The difference is not significant between 38.5◦ and 37◦ (p = 0.08).
220
M. Valente et al.
Fig. 4. Hit probability for different elevations
For the Feedback group the effect is significant for all elevation among them (38.5◦ vs. 37◦ p = 0.001; 38.5◦ vs. 35.5◦ p = 0.003; 37◦ vs. 35.5◦ p > 0.001). Feedback’s effect is show in Fig. 4: this effect is significant for all elevations always with p < 0.001. The analysis performances reveals a good interpretation of balls’ trajectories but the intercept performances are worse than results made in real environment, that record performances around 90% of hit [8,13] versus 74% of our better mean performance. However our hit probabilities are better than the performance indicated by Zaal and Michaels [15] for the same type of throws in virtual environment. 4.2 Hand Movement Latency. We calculated the latencies from the instant in which the ball starts, to the moment when movement speed is about 5 m/s. Mean latency is 417 ms; this results is consistent with the acquisition in real task [8]. There are some differences in latencies between the different types of throws. Latencies times are inverse proportional to distances between the start position and the probable impact point, and they have correlation with left or right side of catch. Post hoc analysis shows significant differences between latencies for L-balls and R-balls. Graph of the mean latencies grouped for area of intercept is shown in Fig. 5a. Maximal Velocity Analysis. The means of maximal velocity for the various trajectories (Fig. 5b) is the mirror image of latencies graph: minor latency is equal to major speed movement to left side. Heuristics Observations. We made two particular observation on subject’s behavior during the task execution.
Intercepting Virtual Ball in Immersive Virtual Environment
221
Fig. 5. (A) Mean latencies of hand movement for area of intercept. (B) Maximal velocity of hand for area of intercept.
We observed that many volunteers made an hand’s grasping movement on virtual balls during the experiment. This behavior lasted about two or three sessions (including training session) and after only few people carried on to close the hand. Moreover, the subjects had to be still with feet during the experimental sessions, according to instructions, but they had no instructions for rest of body. Few volunteers started immediately to move the body to reach the balls, most people started to move the body only after two or three sessions.
5 Conclusions The experiment gave us positive results with respect to previous experiences: the possibility to perform a natural reaching movement without using any handed device and the ecologic setup have raised the performances of the subjects beyond the results present in literature, in spite of the presence of a major number of trajectories.
222
M. Valente et al.
We have shown that by giving in real time an acoustic feedback the performance increases, allowing a correction of the eye/hand coordination and, probably, giving a reward element absent in the case without feedback. The fact that in the case of throws from the right hole the feedback does not give any aid is probably due to the more clear visibility of this types of trajectories. Nevertheless, we do not reach the performances expected in the real case: this can be caused by perceptive problems mentioned in Sec. 2, or from the relative lack of proposal feedbacks. We are working on different types of feedback and combination of more than one feedback at the same time, to verify if the performances can be further improved.
References 1. Armbrster, C., Wolter, M., Kuhlen, T., Spijkers, W., Fimm, B.: Depth perception in virtual reality: Distance estimations in peri- and extrapersonal space. Cyber Psychology & Behavior 11(1), 9–15 (2008) 2. Bideau, B., Kulpa, R., Vignais, N., Brault, S., Multon, F., Craig, C.: Using virtual reality to analyze sports performance. IEEE Computer Graphics and Applications 30(2), 14–21 (2010) 3. Bideau, B., Multon, F., Kulpa, R., Fradet, L., Arnaldi, B., Delamarche, P.: Using virtual reality to analyze links between handball thrower kinematics and goalkeeper’s reactions. Neuroscience Letters 372(1-2), 119–122 (2004) 4. Bockemhl, T., Troje, N.F., Drr, V.: Inter-joint coupling and joint angle synergies of human catching movements. Human Movement Science 29(1), 73–93 (2010) 5. Chapman, S.: Catching a baseball. American Journal of Physics 36(10), 868–870 (1968) 6. Craig, C.M., Goulon, C., Berton, E., Rao, G., Fernandez, L., Bootsma, R.J.: Optic variables used to judge future ball arrival position in expert and novice soccer players. Attention, Perception, & Psychophysics 71(3), 515–522 (2009) 7. Fink, P.W., Foo, P.S., Warren, W.H.: Catching fly balls in virtual reality: A critical test of the outfielder problem. Journal of Vision 9(13), 1–8 (2009) 8. Mazyn, L.I.N., Lenoir, M., Montagne, G., Savelsbergh, G.J.P.: The contribution of stereo vision to one-handed catching. Experimental Brain Research 157(3), 383–390 (2004) 9. McLeod, P., Dienes, Z.: Do fielders know where to go to catch the ball or only how to get there? Journal of Experimental Psychology: Human Perception and Performance 22(3), 531– 543 (1996) 10. Murgia, A., Sharkey, P.M.: Estimation of distances in virtual environments using size constancy. The International Journal of Virtual Reality 8(1), 67–74 (2009) 11. Oldfield, R.C.: The assessment and analysis of handedness: The edinburgh inventory. Neuropsychologia 9(1), 97–113 (1971) 12. Peper, L., Bootsma, R.J., Mestre, D.R., Bakker, F.C.: Catching balls: How to get the hand to the right place at the right time. Journal of Experimental Psychology: Human Perception and Performance 20(3), 591–612 (1994) 13. Tijtgat, P., Bennett, S., Savelsbergh, G., De Clercq, D., Lenoir, M.: Advance knowledge effects on kinematics of one-handed catching. Experimental Brain Research 201(4), 875– 884 (2010), 10.1007/s00221-009-2102-0 14. Wann, J.P., Rushton, S., Mon-Williams, M.: Natural problems for stereoscopic depth perception in virtual environments. Vision Research 35(19), 2731–2736 (1995) 15. Zaal, F.T.J.M., Michaels, C.F.: The information for catching fly balls: Judging and intercepting virtual balls in a cave. Journal of Experimental Psychology: Human Perception and Performance 29(3), 537–555 (2003)
Concave-Convex Surface Perception by Visuo-vestibular Stimuli for Five-Senses Theater Tomohiro Amemiya1, Koichi Hirota2, and Yasushi Ikei3 1 NTT Communication Science Laboratories, 3-1 Morinosato Wakamiya, Atsugi-shi, Kanagawa 243-0198 Japan 2 Graduate School of Frontier Science, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwano-shi, Chiba 277-8563 Japan 3 Graduate School of System Design, Tokyo Metropolitan University, 6-6 Asahigaoka, Hino-shi, Tokyo 191-0065 Japan
[email protected],
[email protected],
[email protected]
Abstract. The paper describes a pilot study of perceptual interactions among visual, vestibular, and tactile stimulations for enhancing the sense of presence and naturalness for ultra-realistic sensations. In this study, we focused on understanding the temporally and spatially optimized combination of visuotactile-vestibular stimuli that would create concave-convex surface sensations. We developed an experimental system to present synchronized visuo-vestibular stimulation and evaluated the influence of various combinations of visual and vestibular stimuli on the shape perception by body motion. The experimental results urge us to add a tactile sensation to facilitate ultra-realistic communication by changing the contact area between the human body and motion chair. Keywords: vestibular stimulation, ultra realistic, multimodal, tactile.
1 Introduction With the progress in video technology and the recent spread of video presentation equipment, we can watch stereoscopic movies and large-screen high-definition videos not only in large amusement facilities but also in our private living rooms,. The next step for enhancing the presence of audiovisual contents will be to add other sensory information, such as tactile, haptic, olfactory, or vestibular information. After SENSORAMA, a pioneering system in multisensory theater, a number of similar attractions have been developed for the large amusement facilities. In order for a new technology to make its way to our living rooms, it is important to establish a methodology with the aim of not only faithfully reproducing the physical information, but also of optimizing it for human perception. If the sensory stimuli can be fully optimized, it is expected that a highly effective system can be developed with inexpensive, simple, and small equipment. R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 225–233, 2011. © Springer-Verlag Berlin Heidelberg 2011
226
T. Amemiya, K. Hirota, and Y. Ikei
The authors have proposed Five-Senses Theater [1-3] to generate “ultra-realistic” sensations [4]. The “theater” we envision here would be widely available in living rooms as “home theater” and offer an interactive framework rather than just a way to experience contents. In this paper, we focus on motion sensation, which is one aspect of Five-Senses Theater. We developed an experimental system to integrate visual and vestibular sensory information and conducted a pilot study to investigate how to effectively generate vestibular sensation with visual stimuli. We also present a tactileintegrated prototype, which we plan to use in psychophysical experiments on multisensory integration.
2 System Design Sensory inputs involved in self-motion sensation are mainly visual, vestibular, and somatosensory signals. In chair-like vehicles, such as driving simulators or theater seats, we generally detect velocity information using visual cues and detect acceleration and angular acceleration information using mechanical cues (vestibular and tactile sensations), respectively. A stationary observer often feels subjective movement of the body when viewing a visual motion simulating a retinal optical flow generated by body movement. This phenomenon is called vection [5-8]. Acceleration and angular acceleration are sensed by the otolith organs and semicircular canals. These organs can be stimulated by mechanical (e.g., motion chair [9,10]), electrical (e.g., galvanic vestibular stimulation [11]), and thermal means (e.g., caloric tests). Electrical stimulation can be achieved with a more inexpensive configuration than the others can. However, it affects anteroposterior and lateral directions differently, and there have been no reports that it affects the vertical direction. In addition, its effect is changed by the electrical impedance of the skin. In thermal stimulation, cold water is poured directly into the ear, which is not a suitable experimental stimulus with computer systems. In this study, we choose a motorized motion chair (Kawada Industries, Inc.; Joy Chair-R1), to stimulate the vestibular system with haptic modality. The motion chair has two degrees of freedom (DOF) in roll and pitch rotations. To reproduce exact physical information, a motion chair needs six degrees of freedom [12]. However, such motion chairs tend to be expensive and large-scale. We constructed an experimental system using a simple 2-DOF motion chair as an approximate representation since size and cost are constrained in home use. Figure 1 shows the configuration of the experimental system for generating visual stimuli and controlling the motion chair. The motion chair and the visual stimulus are controlled by different computers on a network with distributed processing, coded by Matlab (The MathWorks, Inc.), Cogent Graphics Toolbox, and Psychophysics Toolbox. Synchronization of the stimuli was performed over the network. Position control was adopted to drive the motion chair. A voltage proportional to the desired angle is applied by a microprocessor (Microchip Inc.; PIC18F252) and a 10-bit D/A
Concave-Convex Surface Perception by Visuo-vestibular Stimuli for Five-Senses Theater
227
converter (MAXIM; MAX5141). A visual stimulus is presented to a 100-inch screen by a projector on the floor (NEC; WT600J).
Screen Numeric keyboard
Projector Computer JoyChair R1
Computer
Control box
Fig. 1. System configuration
Inclination angle [deg]
14.0
10.5
7.0
3.5
0.0 0.0
0.5
1.0
1.5
2.0
Frequency [Hz] Fig. 2. Perceptual threshold where the participant could not notice vibration noise of the motion-chair versus amplitude (inclination angle) and frequency
We need to not only drive the chair within the maximum rotation velocity but also to know the perceptual threshold of vibration noise. Figure 2 shows the perceptual threshold of a smooth motion. The threshold was determined by asking a naïve participant (24-year-old male) to control the level of the amplitude and frequency and to alter them until they were not detectable (i.e., with the adjusting method). In addition, ten male naïve participants (details later) reported that they did not feel vibration noise under the criteria. The results show that the vibration noise could be perceptually ignored under the combinations of amplitude and frequency, which were
228
T. Amemiya, K. Hirota, and Y. Ikei
below the line in Fig. 2. In the following experiment, we choose the experimental parameters to drive the motion chair to meet the criteria for perceptual unawareness of vibration noise. We measured delay time between the onset of visual stimuli and motion-chair stimuli in advance. The delay time can be reduced up to one video frame (33 ms) by adjusting the onset of the visual stimuli. Each computer over the network synchronizes the time between visual and motion-chair stimuli.
3 User Study Ten male participants, aged 19–33 years, participated in the experiments. We decided to use males only because it has been reported that women experience motion sickness more often than men [13,14]. None of participants had any recollection of ever experiencing motion sickness, and all had normal or corrected-to-normal vision. They had no known abnormalities of their vestibular and tactile sensory systems. Informed consent was obtained from the naïve participants before the experiment started. Recruitment of the participants and the experimental procedures were approved by the NTT Communication Science Laboratories Research Ethics Committee, and the procedures were conducted in accordance with the Declaration of Helsinki. Visual stimuli generated by Matlab with Cogent Graphics Toolbox were radial expansions of 700 random dots. The distance between the participant and the screen was 1.72 m. The size of each dot was 81.28 mm. Resolution was 1024×768 (XGA). Participants wore an earmuff (Peltor Optime II Ear Defenders; 3M, Minnesota, USA) to mask the sound of the motion chair. In each trial, a stimulus was randomly selected from experimental conditions. Subjects were seated in the motion chair with their body secured with a belt. They were instructed to keep their heads on the headrest of the chair. Figure 3 shows the experimental procedure. Subjects were instructed to watch the fixation point on the screen during the trial. After five seconds, the stimuli were presented for 20 seconds. The experimental task was to respond whether the shape they overran was a bump (convex upward), a hole (concave) or a flat surface (plane) by pressing a key of a numeric keyboard. Buttons were labelled ‘bump’, ‘hole’ and ‘flat’. No feedback was given during the experiment. Data from a seven-point scale of motion sickness (1. not at all; 4. neither agree nor disagree; 7. very much) were also collected. Three visual conditions (Bump/Hole/Flat) × 3 motion-chair conditions (Bump/Hole/Flat) × 3 velocity conditions (20, 30, 40 m/s) × 10 trials (a total of 270 trials) were conducted. Subjects had 15-minute breaks after every 28 trials, but could rest at any time. A typical experiment lasted about three hours and thirty minutes. The translational velocity of the motion chair was expressed as velocity of optical flow. The shape was expressed by titling the chair forwards and backwards, i.e., by modifying the pitch rotation, which corresponded to the tangential angle on surface.
Concave-Convex Surface Perception by Visuo-vestibular Stimuli for Five-Senses Theater
Time
229
5 sec
20 sec
Response - Bump, hole, or flat (3-alternative forced choice) - Motion sickness rating (7-point scale)
5 sec
Fig. 3. Experimental procedure
The vertical velocity of optical flow was determined by a combination of the translational motion and pitch rotation from the profile of the shape. The profile of the shape (y=f(x)) was Gaussian as follows: f (x)=
⎧ (x − μ) 2 ⎫ 1 ⎬ exp⎨ − 2σ 2 ⎭ 2πσ ⎩
(1)
where σ=1.1 since the maximum of the tilt of the motion chair
⎧d ⎫ f (x) x = μ ±σ ⎬ ⎩ dx ⎭
θ =arctan⎨
(2)
was set to 13.5 degrees from the limit of the motion chair’s angle. After ten seconds from start, the height was at the maximum (i.e., x=μ). The translational velocity was calculated by v=dx/dt. The results of shape perception by visuo-vestibular stimulation are shown in Fig. 4. Experimental results show that shape perception was greatly affected by vestibular stimulation. The results suggest that the tilt of the chair, 13.5 degrees, was large enough to judge the shape independent of visual stimuli. Reducing the angular amplitude of the chair motion or weakening the effect of the vestibular sensory stimuli (e.g., adopting around 2.2 degrees of tilt perception threshold [15] or slower angular acceleration) can be expected to increase the effect of visual stimuli. In contrast, it seems the visual stimuli should be redesigned to augment the effect. When the motion chair stimulus was a flat surface and the visual stimulus was not a flat surface, the responses were almost evenly split among the three surfaces. This
230
T. Amemiya, K. Hirota, and Y. Ikei
indicates that it is difficult to perceive the sensation of non-flat surfaces only from visual stimuli. The velocities of optical flow we used in the experiment did not greatly affect the shape perception. Subjective motion-sickness scales of from all subjects were not larger than 2, which means that the experimental stimulus did not cause motion sickness.
Probability of clasifying surface as a hole
Probability of clasifying surface as a bump
Visual bump 1.0
Motion bump Motion hole
Visual hole Motion flat
Motion bump Motion hole
Visual flat Motion flat
Motion bump Motion hole
Motion flat
0.5
0 20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
20
30
40
1.0
0.5
Probability of clasifying surface as a flat
0
1.0
0.5
0
Velocity conditions (m/s)
Fig. 4. Surface classification probabilities. Subjects mainly used information of the tilt of the motion chair to identify the shape.
4 Enhancing Tactile Stimulation Integration of visuo-vestibular stimuli with tactile stimuli is expected to enhance the perception of self-motion. When we drive a car and accelerate it, the body is pressed
Concave-Convex Surface Perception by Visuo-vestibular Stimuli for Five-Senses Theater
231
to the seat. If our body is pressed to the seat more strongly, we will perceive a stronger subjective motion against the direction of pressure. In our motion chair system so far, we have not yet implemented the pressure stimulation, which would simulate acceleration or deceleration of body motion. Figure 5 shows the design of the tactile stimulator for changing the pressure between a seat and human body. The tactile stimulator is composed of voice-coil motors with a pin-array and plates with holes. Figure 6 shows a layout drawing of the tactile stimulator on the seat of a motion chair. We expected that when the voice-coil motor with the pin-array vibrates at lower frequencies, such as sub-hertz, a pressure sensation will be induced rather than vibration sensation, because Merkel disks, which convey pressure information, are the most sensitive of the four main types of mechanoreceptors to vibrations at low frequencies.
Fig. 5. Schematic design of tactile stimulator with voice-coil motor and pin-array
Fig. 6. Layout drawing of the tactile stimulator on the seat of a motion chair
232
T. Amemiya, K. Hirota, and Y. Ikei
Figure 7 is a photograph of a prototype of the tactile stimulator. The pin-arrays of the tactile stimulator are made of ABS resin. Four sets of voice-coil motors were connected as a unit. To measure the pressure between the seat and human body, each unit was connected on pairs of strain gauges in a Wii Balance Board (Nintendo, Inc.). The tactile stimulators were driven with a computer with a D/A board (DA12-8(PCI), CONTEC Co., Ltd.) and a custom-made circuit (including an amplifier).
Fig. 7. Prototype of tactile stimulator to generate the sensation of being pressed to a seat
5 Conclusion In this paper, we reported a pilot study of presenting visuo-vestibular stimulation to generate convex or concave surface perception. The results of the pilot study indicate that we should redesign the effective stimulus combination of the visuo-vestibular stimuli. After that, we will conduct a further experiment with different parameters in an attempt to augment the effect of visual stimuli. We are also planning to conduct an experiment with the tactile stimulator integrated with the visuo-vestibular system to better understand the effectiveness of these stimuli. Acknowledgement. This research was supported by the National Institute of Information and Communication Technology (NICT). We thank Dr. Takeharu Seno for his valuable comments on visually induced self-motion perception, and Mr. Shohei Komukai for his contribution to building the experiment setup.
Concave-Convex Surface Perception by Visuo-vestibular Stimuli for Five-Senses Theater
233
References 1. Ikei, Y., Urano, M., Hirota, K., Amemiya, T.: FiveStar: Ultra-realistic Space Experience System. In: Proc. of HCI International 2011 (2010) (in appear) 2. Yoshioka, T., Nishimura, K., Yamamoto, W., Saito, T., Ikei, Y., Hirota, K., Amemiya, T.: Development of Basic Techniques for Five Senses Theater - Multiple Modality Display for Ultra Realistic Experience. In: Proc. of ASIAGRAPH in Shanghai, pp. 89–94 (2010) 3. Ishigaki, K., Kamo, Y., Takemoto, S., Saitou, T., Nishimura, K., Yoshioka, T., Yamaguchi, T., Yamamoto, W., Ikei, Y., Hirota, K., Amemiya, T.: Ultra-Realistic Experience in Haptics and Memory. In: Proc. of ASIAGRAPH 2009 in Tokyo, p. 142 (2009) 4. Enami, K.: Research on ultra-realistic communications. In: Proc. of SPIE, vol. 7329, p. 732902 (2009) 5. Fischer, M.H., Kornmuller, A.E.: Optokinetisch ausgelöste Bewegungswahrnehmung und optokinetischer Nystagmus. Journal of Psychological Neurology 41, 273–308 (1930) 6. Duijnhouwer, J., Beintema, J.A., van den Berg, A.V., van Wezel, R.J.: An illusory transformation of optic flow fields without local motion interactions. Vision Research 46(4), 439–443 (2006) 7. Warren Jr., W.H., Hannon, D.J.: Direction of self-motion is perceived from optical flow. Nature 336, 162–163 (1988) 8. Seno, T., Ito, H., Sunaga, S., Nakamura, S.: Temporonasal motion projected on the nasal retina underlies expansion-contraction asymmetry in vection. Vision Research 50, 1131– 1139 (2010) 9. Amemiya, T., Hirota, K., Ikei, Y.: Development of Preliminary System for Presenting Visuo-vestibular Sensations for Five Senses Theater. In: Proc. of ASIAGRAPH in Tokyo, vol. 4(2), pp. 19–23 (2010) 10. Huang, C.-H., Yen, J.-Y., Ouhyoung, M.: The design of a low cost motion chair for video games and MPEG video playback. IEEE Transactions on Consumer Electronics 42(4), 991–997 (1996) 11. Maeda, T., Ando, H., Amemiya, T., Nagaya, N., Sugimoto, M., Inami, M.: Shaking the World: Galvanic Vestibular Stimulation as a Novel Sensation Interface. In: Proc. of ACM SIGGRAPH 2005 Emerging Technologies, p. 17 (2005) 12. Lebret, G., Liu, K., Lewis, F.L.: Dynamic analysis and control of a stewart platform manipulator. Journal of Robotic Systems 10(5), 629–655 (1993) 13. Lentz, J.M., Collins, W.E.: Motion Sickness Susceptibility and Related Behavioral Characteristics in Men and Women. Aviation, Space, & Environmental Medicine 48(4), 316–322 (1977) 14. Sharma, K., Aparna: Prevalence and Correlates of Susceptibility to Motion Sickness. Acta Geneticae Medicae et Gemellologiae 46(2), 105–121 (1997) 15. Guedry, F.: Psychophysics of vestibular sensation. In: Kornhumber, H.H. (ed.) Handbook of Sensory Physiology, vol. VI/2. Springer, Heidelberg (1974)
Touching Sharp Virtual Objects Produces a Haptic Illusion Andrea Brogni1,2, Darwin G. Caldwell1 , and Mel Slater3,4 1
Advanced Robotics Dept. - Istituto Italiano di Tecnologia, Genoa, Italy {andrea.brogni,darwin.caldwell}@iit.it 2 Universidad Politcnica de Catalu˜ na, Barcelona, Spain 3 ICREA-Universidad de Barcelona, Barcelona, Spain 4 Computer Science Dept. - University College London, London, UK
[email protected]
Abstract. Top down perceptual processing implies that much of what we perceive is based on prior knowledge and expectation. It has been argued that such processing is why Virtual Reality works at all - the brain filling in missing information based on expectation. We investigated this with respect to touch. Seventeen participants were asked to touch different objects seen in a Virtual Reality system. Although no haptic feedback was provided, questionnaire results show that sharpness was experienced when touching a virtual cone and scissors, but not when touching a virtual sphere. Skin conductance responses separate out the sphere as different to the remaining objects. Such exploitation of expectationbased illusory sensory feedback could be useful in the design of plausible virtual environments. Keywords: Virtual Reality, Human Reaction, Physiology, Haptic Illusion.
1 Introduction It was argued by the late Professor Lawrence Stark that ’virtual reality works because reality is virtual’ [12]. The meaning of this is that our perceptual system makes inferences about the world based on relatively small samples of the surrounding environment, and uses top down prior expectations to automatically fill in missing information. A scene displayed in virtual reality typically provides a very small sample of what it is supposed to be portraying at every level - in terms of geometric, illumination, behavioral, auditory, and especially haptic sensory data, and within each of these with low resolution, low fidelity, a visually small field of view - a typically huge amount of sensory information that is missing compared to what would be available in physical reality. Yet there is a lot of evidence that people tend to respond realistically to situations and events in virtual reality in spite of this paucity of sensory data [9] [11]. There is also strong evidence that the brain in processing sensory data relies strongly on multisensory correlations. By manipulating these multisensory correlations it is possible to produce in a person bizarre illusions of changes to their body - a rubber or virtual arm replacing their real arm [2] [10], the Pinocchio Illusion (the feeling that one’s nose is growing longer) [6], the shrinking waist illusion [5], out-of-the-body experiences [7] [4] and even the illusion that a manikin body is one’s own body [8]. Each R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 234–242, 2011. c Springer-Verlag Berlin Heidelberg 2011
Touching Sharp Virtual Objects Produces a Haptic Illusion
235
of these relies on synchronous visual-tactile or visual-motor correlations, or in the case of the Pinocchio and shrinking waist illusions, correlation between the feeling of touch on a body part (the nose, the waist) while proprioception indicates that the hand doing the touching is also moving. The brain solves this contradiction through the inference that the body itself is changing. In this paper we investigate a simple setup that produces the illusion of touch when there is none. It also relies on multisensory correlation and top-down knowledge. In the case of the rubber hand illusion the subject sees, for example, a brush touching a rubber hand placed in a plausible position on the table at which he or she is seated, and feels the corresponding touch synchronously on the hidden real hand. The visual-tactile correlation produces after a few seconds of such stimulation in most people, the strong feeling that the rubber hand is their hand, the touch is felt on the rubber hand, and this is demonstrated not only subjectively through a questionnaire, but behaviorally. The subject will blindly point towards the rubber hand when asked to indicate where the hand is felt to be located. If the rubber hand is threatened then this will generate physiological arousal that would be appropriate for a pain response [2]. When the visual-tactile sensations are not synchronous then the illusion typically does not occur. This shows that the connection between the visual and haptic modalities is very strong and that one can influence the other under specific conditions, even if vision usually dominates. In our experiment the seen hand is the person’s real hand, and the visual touch with a virtual object is also visually on the hand. What is missing is the real touch sensation, and the main question is to what extent the brain will fill in this missing information and provide a feeling of touch. Can we really feel a sensation on our palm when we approach a virtual object ? Are we perceiving different sensation if we have a sharp object or a smooth one ? This would open new opportunities for some applications, where the pure haptic feedback is not so crucial and simple virtual sensations could be enough for increasing the sense of presence.
2 The Experiment 2.1 Introduction The study we carried out was related the different perception we could have in approaching objects with different shapes. The experiment covered the simple act of approaching and touching an object. Moving the hand and reaching for an object with the palm of the hand was the task for the subjects, in a very simple virtual world. The main task for the volunteers was to stand in a very simple environment, consisting of just a grid placed on the floor and to wait for an object to appear. They were asked to approach the object at the clear red spot and to lift their arm and ”touch” with the palm, where we have an high density of receptors. Skin Conductance was recorded during the experiment, to analyze physiological responses when the volunteer was ”touching” the different objects. In real life we perceive objects of different shapes in different ways, due to our previous experience, and our sensations and reactions are different according to the object. The hypothesis is that similar reactions could happen in a completely virtual
236
A. Brogni, D.G. Caldwell, and M. Slater
environment, where no real haptic feedback is provided. Approaching a virtual sharp object should be different from approaching a virtual sphere, or a smooth object, showing similarity with a real situation. 2.2 Equipment The study was carried out at the Advanced Robotics Lab, at the Istituto Italiano di Tecnologia, Genoa, Italy. Participants were placed inside a Powerwall system, where the virtual environment was projected on a 4x2m2 screen, by two Christie Mirage S+ 4000 projectors, synchronised with StereoGraphics CrystalEyes active shutter glasses. The head of the participant was tracked with 6 dof by an Intersense IS900 inertialultrasonic motion tracking system. An Optitrack FLEX:100 system composed of 12 infrared cameras was used for tracking the hand position using a single passive marker. We used a Sun workstation (dual core AMD Opteron, processors 2218, 2.60 GHz, 2.50 GB RAM) with Windows XP Professional and Nvidia Quadro FX 4600 video card. A snapshot of a volunteer during the experiment is in Figure 1. The device for the physiological monitoring was the Nexus-4, by MindMedia1 . This was used to record skin conductance at a sampling rate of 128Hz. The main application was developed in C++, using XVR2 as graphic library and VRPN3 as a network library for the connection with the physiological monitoring and the tracking system. The participants were able to move freely in the VE like in the real one, because the virtual space was aligned with the real walls of the VR system and the grid on the floor was on the same level of the real floor of the room. The frame rate of the simulation was around 100 Hz. 2.3 Procedures and Scenario Seventeen individuals were recruited for the experiment, ten male and seven females. After initial greetings, the participants were asked to complete a consent form after reading an information sheet. The participants were then asked to answer a questionnaire containing a series of demographic questions. The participant was introduced to the Powerwall and fitted with various devices: tracking sensors, shutter glasses and physiological sensors. A passive marker was placed on the back of the hand to keep track of the palm position. Initially the participant was asked to stand still for 30 seconds in the dark in order to record a baseline for physiological signals in a relaxed and non-active state. After the dark baseline session, the VE appeared on the projection screen, and another set of baseline measurements were recorded for another 30 seconds, but with the virtual environment displayed (a simple grid on the floor). After this period, the actual experiment started, and seven objects were shown in random order. These were a cone, a cube, a cylinder, a pyramid, scissors, a sphere and a vase (Figures 2 and 3). These objects varied in their degree of sharpness from none 1 2 3
http://www.mindmedia.nl/english/index.php http://www.vrmedia.it/Xvr.htm http://www.cs.unc.edu/Research/vrpn/
Touching Sharp Virtual Objects Produces a Haptic Illusion
237
Fig. 1. A Volunteer during the experiment
Fig. 2. Snapshots of the geometrical virtual objects
at all (a sphere) to very pointy (a cone). The participants were asked touch the virtual objects at a point indicated by a red spot, using the palm of their right hand. However, the vase was slightly different, there was no red spot on it, and participants were told that they could touch it wherever they wanted, and even in a different place each time. Each object was displayed in succession for about 15 seconds and then there was 15 seconds between the display of each object. The participants saw these objects in a different random order, but the first and the last in the sequence were never the vase or the scissors because those objects were breaking the geometrical sequence, and we wanted them to be in the middle. In Figure 1, some snapshots of a participant during the experiment.
238
A. Brogni, D.G. Caldwell, and M. Slater
Fig. 3. Snapshots of virtual scissors and vase
2.4 Recording Subjective Responses A questionnaire was administered at the end of the experiment consisting of three questions on a 7-point Likert4 scale. The order of questions was randomized for each subject, and each question was answered with respect to each object - again the order of the objects different and randomised for each subject. These questions were: – (Q1) Did it sometimes occur to you that you wanted to pick up or use the object in some way? – (Q2) Did it sometimes occur to you that the object was really there? – (Q3) When your hand touched the [object name] did you at any time feel any of the following sensations? (hot, humid, unpleasant, sharp, soft, painful, cold, smooth). Each question was answered on the scale 1 to 7 where 1 meant ‘not at all’ and 7 ‘very often’. With respect to Q3, this scale was used in relation to each of the listed properties. 2.5 Physiological Responses The physiological measure recorded during this experiment was skin conductance [3] recorded at 128Hz by the Nexus device. Superficial electrodes were placed on the palmar areas of the index and ring fingers of the non-dominant hand. This measures changes in arousal through changes in skin conductance caused by sweat levels. In particular, Electrodermal Activity (EDA) was recorded during the virtual experience. EDA is based on the sweat gland activity on the human skin. When the level of arousal increases the glands produce more sweat, changing the resistance, and the conductance, of the skin [1] and [13]. Our signal was indeed the Skin Conductance (SC), expressed in micro-siemens, that during relaxation normally decreases. The raw data coming from the device have been treated to obtain the micro-siemens values, and then we have smoothed the signal using MATLAB wavedec function to have a better signal for the skin conductance level (SCL) detection. Following [4], we have measured the SCL at the start of any touching event, and then the maximum reached during the immediately following 6 seconds. We allowed 5 4
http://en.wikipedia.org/wiki/Likert_scale
Touching Sharp Virtual Objects Produces a Haptic Illusion
239
seconds for the response, and 1 second as leeway in not having the very exact moment that the participant’s hand intersected the object. The greater this amplitude change the greater the level of arousal.
2.6 Analysis and Results Questionnaire Analysis. We first consider the internal consistency of the responses within each object. In this paper we only pay attention to the scores on the felt properties of the objects as elicited by Q3. We expect that the sharp objects should be reported as sharp and smooth objects as smooth. We used a non-parametric Kruskal-Wallis nonparametric one-way analysis of variance, since the questionnaire responses are ordinal. The results are consistent with expectation. For the cone, cube and pyramid the ‘sharp’ response was significantly higher than all of the rest (P = 7x10−8 , 0.0004, 1.3x10−5 respectively, for the tests of equality of all medians). For the scissors the medians were not all equal (P = 3.6x10−6), with ‘sharp’ having the highest median, but also ‘painful’ and ‘unpleasant’ were not significantly different from ‘sharp’ (using a multiple contrast analysis based on the ANOVA at an overall 5% level). There was no significant difference between the properties for cylinder (P = 0.18). For the sphere ‘smooth’ is significantly higher (P = 1.1x10−9) than the remaining properties, and for the vase ‘humid’ and ‘smooth’ were significantly higher than the remaining properties (P = 0.0004) but this difference was caused by very few differences in the scores (most of the scores were 1). More interesting analysis is between the levels of reported sharpness between the objects. Considering the question ‘sharp’ only, there is a significant difference between the medians of the 7 objects (P = 5.6x10−7 ). Using a multiple contrast analysis with overall significance level 0.05 we find that the level of reported sharpness for the cone is significantly higher than for cylinder, sphere and vase. Scissors and pyramid are both significantly higher than sphere and vase. Sphere and vase are significantly lower than cone, scissors and pyramid. The questionnaire analysis is interesting, but is not sufficient, since we cannot be sure that people are not just reporting what they think might be expected of them rather than an objective response. For this we turn to the skin conductance analysis. Skin Conductance Analysis. Analysis of variance of the skin conductance amplitudes revealed no differences in the mean values between the different objects. However, we consider the 7 by 17 matrix of skin conductance amplitudes, 7 columns for the objects, by 17 participants, and from this construct a distance matrix between the objects, using Euclidian distance. Using this we carried out a cluster analysis (using the MATLAB linkage function). We allowed for 2, 3 or 4 clusters amongst the 7 objects. The results are shown in Table 1. It is clear that the sphere stands out as a cluster on its own. We know a priori that there were 3 types of object (those with sharp edges, the sphere, and the vase which was quite different from the others). The cluster analysis always correctly distinguishes the sphere from the rest.
240
A. Brogni, D.G. Caldwell, and M. Slater Table 1. List of the clusters allowed by the analysis Number clusters lowed 2 3 4
of Cluster 1 al-
Cluster 2
Cluster 3
Cluster 4
All except sphere sphere cone, cube, vase cylinder, scis- sphere sors, pyramid cone, vase cube cylinder, scis- sphere sors, pyramid
2.7 Discussion In the rubber hand illusion there is synchronous visual and tactile sensory data, and the brain infers that the rubber hand must be the real hand. Our experiment may be considered as an ‘inverse’ virtual hand illusion experiment - here the hand is the person’s seen real hand, and this hand is seen to touch a virtual object. It is as if there were an equation: this is my hand = the rubber hand is seen to be being touched + there is felt touch (with synchronous visual-tactile stimulation). In our case the left hand side is definitely true (it is their real hand), and also the first term on the right hand side is true (the hand is seen to be touching virtual objects). The ‘unknown’ value here is the felt touch, which is then may be generated automatically by the perceptual system. Although the absolute values of the skin conductance amplitudes revealed no differences between the means of the different object types, it is nevertheless the case that the cluster analysis can distinguish the one object that is clearly smooth and different from the remaining ones. Remember that this is based solely on physiological responses, which in themselves seemingly have no connection with the shapes of virtual geometric objects. However, it appears that the relationships between the degrees of arousal can distinguish between the different types of objects. The result gives an indication on how the brain could be able to deal with a completely new situation and to provide an help for us. Based on the database of previous experiences, the brain sends very small but relevant sensations related the touch sense, even without a real feedback from the environment. This ”haptic illusion” seems to be stronger for sharp shapes than for smooth ones, maybe because the former is connected to something dangerous for us and the object could hurt and a subconscious mechanism overwrites the actual perception with something fake, but useful as alarm. Positive feedback came also from the quick and informal chats made after the experiments. People reported ”How did you make it ? I was feeling the material ! - The sphere was soft and smooth, and I felt my hand falling down when it disappeared - The blue side of the objects was cold”, but of course we had also negative impressions like ”I didn’t get any sensation, a part of a bit of discomfort when I have approached the scissors facing me with the sharp end”.
Touching Sharp Virtual Objects Produces a Haptic Illusion
241
3 Conclusions Our studies has demonstrated that subjectively people distinguish between the smoothness / sharpness properties of different types of objects. There is also some evidence that this occurs at a physiological level. It is our view that haptics remains the great unsolved problem of virtual reality. It is true that there are specific haptic devices that can give specific types of haptic feedback under very constrained circumstances. However, there is no generalized haptics in the sense that contingent collisions with virtual objects on any part of the body can generate tactile responses. We speculate that reliance on the perceptual properties of the brain can be employed in order, in the long run, to solve this problem.
4 Future Works The main problem in the results of the our study was that we could not distinguish between the visual influence and the imaginary haptic influence. Just seeing the cone or scissors might provoke a response, which could be the cause of the GSR effect of the cone without any touching. The next step in the research will be to carry out a new small study, in which we could ask the volunteers to move their hand close to the point but not actually touch it and compare to moving the hand onto the sharp / smooth point and intersecting with it. This should help us to discriminate the purely visual effect from the illusory haptic one. Acknowledgment. This work was supported by the Spanish Ministry of Education and Science, Accion Complementaria, TIN2006-27666-E, and the EU-FET project IMMERSENCE (IST-2006-027141). The study was approved by the Ethical Committee of the Azienda Sanitaria Locale 3 in Genoa, Italy. The studies were carried out at the Advanced Robotics Lab, at the Istituto Italiano di Tecnologia, Genoa, Italy.
References 1. Andreassi, J.J.: Psychophysiology: Human Behavior and Physiological Response, 4th edn. Lawrence Erlbaum Associates, London (2000) 2. Armel, K., Ramachandran, V.: Projecting sensations to external objects: evidence from skin conductance response. Proceedings of the Royal Society, B, Biological Sciences 270, 1499– 1506 (2003) 3. Boucsein, W.: Electrodermal Activity, New York (1992) 4. Ehrsson, H.H.: The experimental induction of out-of-body experiences. Science 317(5841) (August 2007) 5. Ehrsson, H.H., Kito, T., Sadato, N., Passingham, R.E., Naito, E.: Neural substrate of body size: Illusory feeling of shrinking of the waist. PLoS Biol. 3(12) (November 2005) 6. Lackner, J.R.: Some proprioceptive influences on the perceptual representation of body shape and orientation. Brain 111(2), 281–297 (1988) 7. Lenggenhager, B., Tadi, T., Metzinger, T., Blanke, O.: Video ergo sum: Manipulating bodily self-consciousness. Science 317(5841), 1096–1099 (2007)
242
A. Brogni, D.G. Caldwell, and M. Slater
8. Petkova, V.I., Ehrsson, H.H.: If i were you: Perceptual illusion of body swapping. PLos ONE 3(12) (2008) 9. Sanchez-Vives, M.V., Slater, M.: From presence to consciousness through virtual reality. Nature Neuroscience 6(4), 8–16 (2005) 10. Slater, M., Marcos, D.P., Ehrsson, H.H., Sanchez-Vives, M.V.: Towards a digital body: The virtual arm illusion. Frontiers in Human Neuroscience 2(6) (March 2008) 11. Slater, M.: Place illusion and plausibility can lead to realistic behaviour in immersive virtual environments. Philosophical Transactions of the Royal Society B: Biological Sciences 364(1535), 3549–3557 (2009) 12. Stark, L.W.: How virtual reality works! the illusions of vision in real and virtual environments. In: Proc SPIE: Symposium on Electronic Imaging: Science and Technology, vol. 2411, pp. 5–10 (February 1995) 13. Stern, R.M., Ray, W.J., Quigley, K.S.: Psychophysiological Recording, 2nd edn. Oxford University Press, Oxford (2001)
Whole Body Interaction Using the Grounded Bar Interface Bong-gyu Jang, Hyunseok Yang, and Gerard J. Kim Digital Experience Laboratory Korea University, Seoul, Korea
[email protected]
Abstract. Whole body interaction is an important element in promoting the level of presence and immersion in virtual reality systems. In this paper, we investigate the effect of “grounding” the interaction device to take advantage of the significant passive reaction force feedback sensed throughout the body, and thus in effect realizing the whole body interaction without complicated sensing and feedback apparatus. An experiment was conducted to assess the task performance and level of presence/immersion, as compared to a keyboard input method, using a maze navigation task. The results showed that while the G-Bar did induce significantly higher presence and the task performance (maze completion time and number of wall collisions) was on par with the already familiar keyboard interface. The keyboard user instead had to adjust and learn how to navigate faster and not collide with the wall over time, indicating that the whole body interaction contributed to a better perception of the immediate space. Thus considering the learning rate and the relative unfamiliarity of GBar, with sufficient training, G-Bar could accomplish both high presence/immersion and task performance for s. Keywords: Whole-body interaction, Presence, Immersion, Task performance, Isometric interaction.
1 Introduction One of the defining characteristics of virtual reality is the provision of “presence” [1], the feeling of being contained in the content. Many researches have identified the elements that contribute to enhance the level of presence [1][8], and one such element is the use of “whole body” interaction whose strategy is to leverage on as many sensory and motor organs as possible [2]. In this paper, we introduce an interface called the “G-Bar” (Grounded Bar), a twohanded isometric device that is fixed to the ground (grounded) for a variety of interactive tasks including navigation, and object selection and manipulation. The interface is “whole body” because it is basically operated with two hands, and since it is grounded, it also indirectly involves the interactions through the legs and body parts in between (see Figure 1). Since the user also needs to move one’s head/neck in order to view and scan the environment visually, virtually all parts of the body become active. Moreover, since the device is isometric and senses the user’s pressure input, R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 243–249, 2011. © Springer-Verlag Berlin Heidelberg 2011
244
B.-g. Jang, H. Yang, and G.J. Kim
the user can express dynamic interactions more naturally [3][4][5]. In addition, we formally evaluate and validate the projected merits of the whole body interaction induced by the “grounded” device such as G-Bar.
Fig. 1. The G-Bar in usage for navigating and selecting objects in a virtual environment. The reaction force resulting from the two-hand interaction with the grounded device propagates throughout the body (right). More detailed view of the G-Bar prototype (left).
2 Related Work Employing whole body interaction is an effective method to enhance the immersive quality of interactive contents [6][7][8]. However, whole body interaction does not necessarily require separate sensing and feedback mechanisms for the body parts involved. Through clever interaction and interface design, whole body interaction can be induced through gesture latitude and minimal sensing. For instance, the arcade game “Dance Dance Revolution” [9] utilizes a very simple foot switch pad, but the interaction is designed to induce the use of whole body (similarly for Nintendo Wii based games [10]). This concept is somewhat related to that of the “passive” haptics, an inexpensive and creative way to use natural reaction force feedback (e.g. tangible props). However, props are usually light weight and fragile, limiting the users in applying high amount of forces. Meehan et al. [11] demonstrated the utility of passive haptics with grounded prop (a ledge) which has significantly enhanced the level of presence for their virtual cliff environment. A large reaction force is more likely to propagate throughout and stimulate the whole the body. Isometric input is also known to increase interaction realism by allowing dynamic expression through the input [3][4][5]. G-Bar combines all these aforementioned elements in hopes to creating effective interaction and compelling user experience. On the other hand, the effect of whole body toward the task performance is unclear. For one, the relationship between presence/immersion and task performance is generally viewed as being task dependent [1][8].
Whole Body Interaction Using the Grounded Bar Interface
245
3 G-Bar G-Bar is implemented by installing low cost pressure sensors and vibrating motors on a bar handle. Four pressure sensors (two at each end of the bar) realize the isometric input and the four vibrating motors (laid out in a regular interval along the bar) give directional feedback cues in addition to the natural reaction force. Thus G-Bar is particularly appropriate for tasks that involve frequent and dynamic contact with the environment or interaction object. A typical example might be navigating and passing through a bumping crowd, riding and directly controlling a vehicle such as a cart, motor cycle or hang-glider. In fact, the “bar” resembles the control handles used for some of these vehicles (e.g. motor cycle handle), and such metaphors can be even more helpful. While object selection and manipulation might not really involve whole body interaction in the real life, by exaggerating the extent of the required body parts involved, we posit that it could be a one way to maximize the virtual experience. Figure 2 shows how one might be able to navigate through the virtual space by combinations of simple two handed isometric push (forward) and pull (backward) actions. In the selection mode, the same interaction technique can be used for controlling the virtual ray/cone, then applying the grasping action for final selection (right hand grasp) and undo (left hand grasp). Once the object is selected, it can be rotated and moved in a similar fashion as well. Despite the seemingly natural interaction metaphors, the sheer unfamiliarity will require the users some amount of learning.
Fig. 2. Navigation, and object selection and manipulation through combinations of push, pull, twist and grab actions with the G-Bar
4 Experiment To assess the effectiveness of the proposed interaction technique, we have carried out an experiment comparing the G-Bar interface to a non-grounded interface, namely a
246
B.-g. Jang, H. Yang, and G.J. Kim
keyboard input. The user was asked to navigate a fixed path in a virtual maze with a cart (in first person viewpoint) and the task performance, level of presence/immersion and general usability were measured. Our hypothesis was that the use of G-Bar would result in significantly enhanced user experience (e.g. high presence and immersion), but may not produce good task performance nor high usability without sufficient training. 4.1 Experiment Design The experiment was designed as a one factor (two level) repeated measure (within subject), the sole factor being the type of the interface employed (G-Bar vs. keyboard). The subject was asked to navigate a fixed path in a maze like environment using the two interfaces (presented in a balanced order). The shown in Figure 3, the virtual maze was composed of brick walls with the directional path marked for user convenience. The subject was asked to follow and navigate the path as fast as possible but without colliding with walls as much as possible. The width of the path was set properly so that the task was not too easy, especially in making turns (after an initial pilot test). The test environment was staged as the user pushing cart (seen in the first person viewpoint) with a large box in it occluding the front end so that the user had to get the “feel” for the extent of the cart. This “feel” would be important in avoiding collision with the walls and also a quality that was thought to be better acquired with a whole body interaction and thus would produce higher task performance (at least eventually).
Fig. 3. A snapshot of the virtual maze used in the comparative experiment (left). A subject navigating the virtual maze using the interface set up. The G-Bar is installed on a heavy treadmill for grounding. The keyboard interface was also placed on the same place where the G-Bar was installed (right).
The measured dependent variables were the task completion time, accuracy (e.g. no. of collision), subjective presence/immersion score (collected through a survey) and other general usability (we omit the content of the survey for lack of space).
Whole Body Interaction Using the Grounded Bar Interface
247
4.2 Experiment Process A total of 16 subjects participated in the experiment (13 males / 3 Females). Average age of the subject was 27.18 and mostly college undergraduate or graduate students recruited on campus, all with previous experiences in keyboard/mouse based computer interface and using supermarket carts. They were given proper compensation for the participation. The subject was first given some amount of training until one got sufficiently familiarized with the G-Bar. However, due to the prior familiarity with the keyboard interface, the competency of G-Bar usage did not match that of the keyboard. Then, the subject navigated the virtual maze using the two interfaces presented in a balanced order. The subject tried out the maze three times over which the same maze was used to measure the learning effect of the interface itself. The learning effect (of the maze) or bias due to using the same maze over the trials or over the different treatments was deemed minimal because the task was simply to follow the marked path (rather than finding the opening path). Since G-Bar is a grounded interface, it was attached to a front horizontal bar on a (heavy) treadmill (see Figure 3, the moving walk of treadmill in the bottom was not used). To set the environment condition equally, when the keyboard was used, it was placed on the same position as the G-Bar. The quantitative dependent variables were captured automatically through the test program software and the presence/immersion and usability survey was taken after the subject carried out both treatments.
5 Experiment Results Figure 4 shows the experiment results with the task performance (task completion time and no. of wall collisions) over three trials. Contrary to our expectation, despite the relative familiarity of the keyboard interface, the users performed generally better with the G-Bar interface (even though statistically significant differences were not observed). Moreover, the keyboard interface showed more learning effect than the G-Bar. Therefore, we can conclude that the G-Bar (or whole body interaction) was a more proper interface (e.g. better depth/spatial perception) to begin with and in reverse forced the keyboard users to adapt and learn how not to collide or navigate better. Figure 5 shows the experiment results with the qualitative survey responses (all measured in the 7 Likert scale). It is very interesting that in relation to the results with the task performance, the users still felt the keyboard interface was much easier. The extent of perceived level of whole body usage, force feedback (note that there was not explicit force feedback) and immersion (presence) was much higher with the G-Bar (with statistical significance). Again, we posit that such factors affected the depth and spatial perception contributing to the relatively high task performance despite the user was not totally trained to use the G-Bar.
248
B.-g. Jang, H. Yang, and G.J. Kim
Fig. 4. Experiment results (time of completion and no. of collisions between the keyboard and G-Bar)
Fig. 5. Experiment results (survey questions: ease of use, extent of whole body interaction, the level of immersion, and extent of the perceived force feedback)
Whole Body Interaction Using the Grounded Bar Interface
249
6 Conclusion In this paper, we presented the G-Bar, a low cost, two handed isometric whole body interface for interacting in the virtual space. The use of two hands in combination with the passive reactive feedback proved to be a contributing factor to enhanced presence. In addition, despite the novelty of the technique, after minimal training, the users were able to achieve the level of task performance comparable to the nominal non-grounded device as well. While G-Bar may not be appropriate for all types of virtual tasks (e.g. for interacting with fast moving light objects with relatively little reaction force, or for tasks that are more natural with one hand), the study shows the effectiveness of grounding the interaction device and leveraging on the naturally induced whole body experience. We believe that in combination with multimodal feedback (e.g. vibration feedback and visual simulation), the virtual experience can be further enriched. Acknowledgements. This research was supported in part by the Strategic Technology Lab. Program (Multimodal Entertainment Platform area) and the Core Industrial Tech. Development Program (Digital Textile based Around Body Computing area) of the Korea Ministry of Knowledge Economy (MKE).
References 1. Kim, G.J.: Designing Virtual Reality Systems: A Structured Approach. Springer, Heidelberg (2005) 2. Buxton, W.: There’s More to Interaction than Meets the Eye: Some Issues in Manual Input. In: Norman, D.A., Draper, S.W. (eds.) User Centered System Design: New Perspectives on Human-Computer Interaction, pp. 319–337. Lawrence Erlbaum Associates, Mahwah (1986) 3. Lecuyer, A., Coquillart, S., Kheddar, A.: Pseudo-Haptic Feedback: Can Isometric Input Devices Simulate Force Feedback. In: Proceedings of the IEEE Virtual Reality Conference, pp. 83–89 (2000) 4. Zhai, S.: Investigation of Feel for 6DOF inputs: Isometric and Elastic rate control for manipulation in 3D environments. In: Proceedings of the Human Factors and Ergonomics Society (1993) 5. Zhai, S.: User Performance in Relation to 3D Input Device Design. Computer Graphics 32(4) (1998) 6. Boulic, R., Maupu, D., Peinado, M., Raunhardt, D.: Spatial Awareness in Full-Body Immersive Interactions: Where Do We Stand? In: Boulic, R., Chrysanthou, Y., Komura, T. (eds.) MIG 2010. LNCS, vol. 6459, pp. 59–69. Springer, Heidelberg (2010) 7. Benyon, D., Smyth, M., Helgason, I. (eds.): Presence for Everyone: A Short Guide to Presence Research. The Centre for Interaction Design, Edinburgh Napier University, UK (2009) 8. Peterson, B.: The Influence of Whole-Body Interaction on Wayfinding in Virtual Reality. PhD Thesis, University of Washington (1998) 9. Konami Digital Entertainment, Inc. Dance Dance Revolution (2010), http://www.konami.com/ddr/ 10. Nintendo, Inc. Wii, http://wii.com/ 11. Meehan, M., Whitton, M., Razzaque, S., Zimmon, P., Insko, B., Combe, G., Lok, B., Scheuermann, T., Naik, S., Jerald, J., Harris, M., Antley, A., Brooks, F.: Physiological Reaction and Presence in Stressful Virtual Environments. In: Proc. of ACM SIGGRAPH (2002)
Digital Display Case Using Non-contact Head Tracking Takashi Kajinami1, Takuji Narumi2, Tomohiro Tanikawa1, and Michitaka Hirose1 1
Graduate School of Information Science and Technology, The University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656 Japan 2 Graduate School of Engineering, The University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656 Japan {kaji,narumi,tani,hirose}@cyber.t.u-tokyo.ac.jp
Abstract. In our research, we aim to construct the Digital Display Case system, which enables a museum exhibition using virtual exhibits using computer graphics technology, to convey background information about exhibits effectively. In this paper, we consider more practical use in museum, and constructed the system using head tracking, which doesn't need to load any special devices on users. We use camera and range camera to detect and track user's face, and calculate images on displays to enable users to appreciate virtual exhibits as if they were really in the virtual case. Keywords: Digital Display Case, Digital Museum, Computer Graphics ,Virtual Reality.
1 Introduction In our research, we aim to construct the Digital Display Case system, which enables museums to hold an exhibition with virtual exhibits using computer graphics technology, to convey background information about the exhibits effectively[1]. Recently museums are very interested in the introduction of digital technologies into their exhibitions, to tell more about background information about their exhibits. Every exhibit has many background information, for example when or where it was made, what kind of culture it belongs to and so on. However, museums have some problem in conveying these information, because they cannot modify the exhibit itself to preserve them. They conventionally used panel to convey them (Fig. 1), but it is not so effective way to help visitors to connect the exhibit itself and its information on the panel, because they often placed detached. Thus digital exhibition system is needed to tell the background information in a manner more closely to the exhibit, without suffering their exhibits. Therefore in our research, we aim to construct an interactive exhibition system to tell the background information about exhibits more effectively, which is designed based on the contexts of conventional exhibitions in museums. In this paper, we consider more practical use in museum, and constructed the system using head tracking, which doesn't need to load any special devices on users. We use camera and R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 250–259, 2011. © Springer-Verlag Berlin Heidelberg 2011
Digital Display Case Using Non-contact Head Tracking
251
Fig. 1. Conventional Exhibition in Museum
range camera to detect and track user's face, and calculate images on displays to enable users to appreciate virtual exhibits as if they were really in the virtual case.
2 Related Works 2.1 Digital Devices for Museums Although some digital devices like Information Kiosk or video about exhibits are already introduced into museums, most of them are placed out of the exhibition rooms. This is because curators in museums, who design exhibitions, do not know how to use it effectively, while they know much about conventional exhibition devices. We have to consider this know-how to introduce mixed reality technologies into museums. The most popular digital system is a theater system, which some museums already introduced. Several studies have been conducted on the gallery talk in the theater[2]. These systems can present the highly realistic images about the theme of the exhibition. However it is difficult to introduce the system into exhibition rooms, and it is a big problem which looses the connection between the contents in the theater and the exhibits in the room. There are also some researches to use digital technologies at the gallery talk in exhibition rooms. Gallery talk is a conventional way for museums to convey the background information about exhibits to their visitors, which means oral explanation about exhibits by However, it is difficult to have frequently or individually because of the problem of manpower shortage. Some digital devices are made to solve this problem. Gallery talk robot[3][4] is one solution for the problem, which realize the gallery talk from a remote person. This reduces the geographic restriction of commentators, and makes it easy to do the gallery talk. However it has the problem how the robot moves in the exhibition rooms where people are also walks. We have to consider the robot not to knock against the person nor disturb person's movement.
252
T. Kajinami et al.
Mobile devices are also used to convey the information about exhibits. Hiyama et al[5] present this type of museums guiding system. They use mobile device with position tracking using infra-red signals, and show visitors the information based on this position data. This enables museums to have a structural explanation of the entire exhibition room. However, it is difficult to install the devices for positioning into all exhibition rooms, and this is a high threshold for introduction. In addition, there are some works about digital exhibition devices for museum. Some researches about the exhibition system with HMD have been conducted[6]. However wearable systems like HMD system have a big problem when we introduce them into permanent exhibition, because it is difficult for museums to manage them. On the other hand, there are also some installed devices for museums presented. We can cite Virtual Showcase[7] as an example, which overlays images on the exhibit with half mirror and allows multiple users to observe and interact with augmented contents in the display. It can explain the background information using real exhibit, but at the same time has some constrain in its exhibition because it cannot move the exhibit. 2.2 Display Device of 3D Model Data There are also some other related works especially about the display device of 3D model data and the interaction with it. Many studies have been conducted for the 3D display, and today we can easily get the 3D display system with glasses and a display in the shape of conventional one. Here we focuses on the volumetric displays considering the shape of current display cases of free-standing type. Seelinder[8] is a 3D display with the rotation of cylindrical parallax barrier and light source array of LED. This system allows multiple viewers to see the appropriate images corresponds to there position from any position, but it can show appropriate image only correspond to horizontal motion, and does not support vertical motion. On the other hand, there are some displays[9][10] which can show the appropriate images for vertical motion with the two-axis rotation of mirror. However, they can only display small images in low resolution and low contrast. On the other hand, there are also some studies about the system consisted of 3D display system and a kind of interaction devices. We can take MEDIA3 [11] or gCubik[12] as examples. However when we use it in the story-telling in museums, We need more complex interaction than ones they realize.
3 Digital Display Case In the previous paper[1], we constructed a prototype of Digital Display Case(Fig. 2), which realizes an exhibition using computer graphics(Fig. 3). With this prototype we considered how to tell background information about exhibits, categorized to synchronicity and diachronicity (Fig. 4), and make some exhibition to tell them.
Digital Display Case Using Non-contact Head Tracking
253
Fig. 2. Prototype of Digital Display Case
Fig. 3. Exhibition of virtual exhibits
Fig. 4. Exhibition to Convey Diachronicity
That prototype was enough to indicate the effectiveness of Digital Display Case in museums. However, it has relatively law compatibility with conventional display cases, which is not suitable when we place the system in an exhibition room, It has also necessity to load polhemus sensor on a user, which will be the problem when we enable many visitors in museum to experience the system. Therefore In this paper, we aim to construct the display system of virtual exhibits using CG, which is more compatible with conventional display cases, and which visitors in museum can appreciate virtual exhibits more easily. ,
254
T. Kajinami et al.
3.1 Implementation of the System We constructed the system shown in Fig. 5. In this system, we use 40 inch 3D televisions as display, and constructed three displays into box shape like conventional display cases. In previous prototype, we use polhemus sensor to measure user's position of view. Then in this system, we use camera and range camera attached, and measure the position of view without loading any special devices on users. This system composed of two subsystems, one for detect and track user's head and the other for render the images on the display. Fig. 6 shows the dataflow of the whole system.
Fig. 5. Digital Display Case more compatible with conventional display cases
Fig. 6. Dataflow of the system
Digital Display Case Using Non-contact Head Tracking
255
Fig. 7. Kinects placed at the top
Fig. 8. Capture image and depth around the system
3.2 Motion Parallax with Non-contact Tracking of User's Head For tracking user's head, we use Kinect [13], which has camera and range camera as sensor. We place three Kinects at the top of the system (Fig. 7), and capture image and measure depth around the system (Fig. 8). We measure the position of view based on these data. Fig. 9 shows how to measure user's position of view. First, we use depth image to detect user around the system, and extract the area of user from captured image. From this area, we detect a user's face and calculate its position in the image. Then we get an average of depth around the position and calculate the position of view.
256
T. Kajinami et al.
Fig. 9. Dataflow to get the position of view
Then the system gathers the data from three Kinects and select the user most near to the system. To avoid the confusion when two or more person are in the same distance, we set priority on the one most near to the position detected in the precious detection. 3.3 Discussion Fig. 10 shows how the system works. It shows that the head tracking process works effectively, and realizes a motion parallax about the virtual display case in the system, without any sensors on a user. This is more suitable for museum exhibition, because visitors can appreciate virtual exhibits in the same way we appreciate real exhibits, without putting any sensor on them.
Digital Display Case Using Non-contact Head Tracking
257
Fig. 10. Motion parallax without any sensor on a user
This system measures the position of view in 15 fps and enable users to move almost 180 degree around the system and appreciate the virtual exhibit in it. It also can select the appropriate user when some users are detected, and he can keep his appreciation. The speed of the head tracking is enough when user go around the system. However, when user moves so fast, the big gap between frames reduces the smoothness of motion parallax. Failure in the detection also reduces this smoothness. So we have to improve the speed of processing in head tracking and complement the movement between frames in detection. To do this, we have a plan to introduce the object tracking using computer vision, and use hybrid algorithm composed of detection and tracking, to realize more effective head tracking processing. Although this system can track user's head in enough range to enable him to move around and appreciate the virtual exhibit, he cannot appreciate the virtual exhibits from very near or under, because there face goes out of the range Kinects can capture. To avoid this, we have to consider more about the number or placement of Kinects base on user's behavior in his appreciation. Selection of the appropriate user usually works well even if some faces or users are captured. However, the system is confused when many people stand in the same distance from the system. To solve this problem, we have to use more intelligent user detection using depth data captured by range camera. We are also planning to introduce some system to indicate who has the priority in the appreciation, for example spot light on the user, to avoid users from confusing on the position of the priority in head tracking.
258
T. Kajinami et al.
4 Conclusion In this paper, we constructed the Digital Display Case system to realize museum exhibition using virtual exhibits using computer graphics, designed more compatible with conventional display cases. We constructed vertically long system composed of large display. We also constructed head tracking system using Kinect, and realizes the appreciation of virtual exhibits from any point around the system, without loading any special devices or sensors on users, which we can use more easily than previous prototype. As our future work, we have to improve the process of head tracking as described in the chapter 3.3 to realize more natural motion parallax. In addition to this, we are now planning to introduce some interaction to our system using the detection of users' gesture using Kinect, which we now use only for the measurement of their position of view. Acknowledgments. This research is partly supported by “Mixed Realty Digital Museum” project of MEXT of Japan. The authors would like to thank all the members of our project. Especially Makoto Ando and Takafumi Watanabe from Toppan printing.
References 1. Kajinami, T., Hayashi, O., Narumi, T., Tanikawa, T., Hirose, M.: Digital Display Case: Museum exhibition system to convey background information about exhibits. In: Proceedings of Virtual Systems and Multimedia (VSMM) 2010, pp. 230–233 (October 2010) 2. Tanikawa, T., Ando, M., Yoshida, K., Kuzuoka, H., Hirose, M.: Virtual gallery talk in museum exhibition. In: Proceedings of ICAT 2004, pp. 369–376 (2004) 3. Kuzuoka, H., Yamazaki, K., Yamazaki, A., Kosaka, J., Suga, Y., Heath, C.: Dual ecologies of robot as communication media: thoughts on coordinating orientations and projectability. In: CHI 2004: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 183–190. ACM, New York (2004) 4. Kuzuoka, H., Kawaguchi, I.: Study on museum guide robot that draws visitors’ attentions. In: Proceedings of ASIAGRAPH 2009 (October 2009) 5. Hiyama, A., Yamashita, J., Kuzuoka, H., Hirota, K., Hirose, M.: Position tracking using infra-red signals for museum guiding system. In: Murakami, H., Nakashima, H., Tokuda, H., Yasumura, M. (eds.) UCS 2004. LNCS, vol. 3598, pp. 49–61. Springer, Heidelberg (2005) 6. Kondo, T., Manabe, M., Arita-Kikutani, H., Mishima, Y.: Practical uses of mixed reality exhibition at the national museum of nature and science in Tokyo. In: Joint Virtual Reality Conference of EGVE - ICAT -EuroVR (December 2009) 7. Bimber, O., Encarnacao, L.M., Schmalstieg, D.: The virtual showcase as a new platform for augmented reality digital storytelling. In: Proceedings of the Workshop on Virtual Environments 2003, vol. 39, pp. 87–95 (August 2003) 8. Tomohiro Yendo, N.K., Tachi, S.: Seelinder: The cylindrical lightfield display. In: SIGGRAPH 2005 E-tech (2005)
Digital Display Case Using Non-contact Head Tracking
259
9. Doyama, Y., Taniakawa, T., Tagawa, K., Hirota, K., Hirose, M.: Cagra: Occlusion-capable automultiscopic 3d display with spherical coverage. In: Proceedings of ICAT 2008, pp. 36–42 (2008) 10. Jones, A., McDowall, I., Yamada, H., Bolas, M., Debevec, P.: Rendering for an interactive 360deg light field display. In: ACM SIGGRAPH (2007) 11. Kawakami, N., Inami, M., Maeda, T., Tachi, S.: Proposal for the object-oriented display: The design and implementation of the media3. In: Proceedings of ICAT 1997, pp. 57–62 (1997) 12. Roberto Lopez-Gulliver, S.Y.N.I., Yoshida, S.: gcubik: A cubic autostereoscopic display for multiuser interaction - grasp and groupshare virtual images. In: ACM SIGGRAPH 2008 Poster (2008) 13. Kinect, http://www.xbox.com/en-US/Kinect
Meta Cookie+: An Illusion-Based Gustatory Display Takuji Narumi1, Shinya Nishizaka2, Takashi Kajinami2, Tomohiro Tanikawa2, and Michitaka Hirose2 1
Graduate School of Engineering, The University of Tokyo / JSPS 7-3-1 Hongo Bunkyo-ku, Tokyo Japan 2 Graduate School of Information Science and Technology, The University of Tokyo 7-3-1 Hongo Bunkyo-ku, Tokyo Japan
[email protected], {nshinya,kaji,tani,hirose}@cyber.t.u-tokyo.ac.jp
Abstract. In this paper, we propose the illusion-based "Psuedo-gustation" method to change perceived taste of a food when people eat by changing its appearance and scent with augmented reality technology. We aim at utilizing an influence between modalities for realizing a "pseudo-gustatory" system that enables the user to experience various tastes without changing the chemical composition of foods. Based on this concept, we built a "Meta Cookie+" system to change the perceived taste of a cookie by overlaying visual and olfactory information onto a real cookie. We performed an experiment that investigates how people experience the flavor of a plain cookie by using our system. The result suggests that our system can change the perceived taste based on the effect of the cross-modal interaction of vision, olfaction and gustation. Keywords: Illusion-based Virtual Reality, Gustatory Display, Pseudogustation, Cross-modal Integration, Augmented Reality.
1 Introduction Because it has recently become easy to manipulate some kind of multimodal information by using a computer, many research projects have used computergenerated virtual reality for studying the input and output of haptic and olfactory information in order to realize more number of realistic applications [1]. However, few of these studies have dealt with gustatory information, and there have been few display systems presenting gustatory information [2, 3]. This scarcity of research on gustatory information is for several reasons. One reason is that taste sensation is based on chemical signals, whose functions have not yet been fully understood. Another reason is that taste sensation is affected by other factors such as vision, olfaction, thermal sensation, and memory. Thus, as described above, the complexity of the cognition mechanism for gustatory sensation makes it difficult to build up a gustatory display, which is able to present a wide variety of gustatory information. Our hypothesis is that the complexity of the gustatory system can be applied to the realization of a pseudo-gustatory display that presents the desired flavors by means of R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 260–269, 2011. © Springer-Verlag Berlin Heidelberg 2011
Meta Cookie+: An Illusion-Based Gustatory Display
261
a perceptual illusion. The cases I have in mind are ones in which what you sense with one modality affects what you experience in another. For example, the ventriloquist effect involves an illusory experience of the location of a sound that is produced by the sound's apparent visible source. The effect is neither inferential nor cognitive, but results from cross-modal perceptual interactions. Cross-modal interactions, however, are not limited to vision's impact upon the experience through other sense modalities. By using this illusionary effect, we may induce people to experience different flavors when they taste the same chemical substance. Therefore, in order to realize a novel gustatory display system, we aim to establish a method for eliciting and utilizing a cross-modal interaction. In this paper, we propose a method to change the perceived taste of a cookie when it is being eaten based on an illusion evoked by changing its appearance and scent by employing augmented reality technology. We then report a "Meta Cookie" system, which implements the proposed method, and the results of an experiment that investigates how people experience the flavor of a plain cookie by using our system.
2 Cross-Modal Interactions Underlying the Perception of Flavor Fundamental tastes are considered the basis for the presentation of various tastes similar to how basic colors such as RGB are used as the basis for visual systems. According to physiological definitions, taste has the status of a minor sense, as the channel of only a limited number of sensations: sweetness, sourness, bitterness, saltiness, and umami [4]. What is commonly called taste signifies a perceptual experience that involves the integration of various sensations. When we use the common word "flavor" in place of taste, then, we are again referring to what is a quite multi-faceted sensation. In fact, the International Standards Organization has defined flavor as a complex combination of the olfactory, gustatory, and trigeminal sensations perceived during tasting [5]. Auvray et al. reviewed the literature on multisensory interactions underlying the perception of flavor; furthermore, they summarized saying that flavor is not defined as a separate sensory modality but as a perceptual modality that is unified by the act of eating, and should be used as a term to describe the combination of taste, smell, touch, visual cues, auditory cues, and the trigeminal system [6]. These definitions suggest that it is possible to change the flavor that people experience from foods by changing the feedback they receive through modalities other than the sense of taste. While it is difficult to present various tastes through a change in chemical substances, it is possible to induce people to experience various flavors without changing the chemical ingredients but by changing only the other sensory information that they experience. The sense of smell is most closely related to our perception of taste above all other senses. This relationship between gustatory and olfactory sensations is commonly known, as illustrated by pinching our nostrils when we eat food that we find displeasing. Indeed, it has been reported that most of what people commonly think of as the taste of food actually originates from the nose [7]. Conversely, another set of researches on taste enhancement has provided strong support for the ability of odors
262
T. Narumi et al.
to modify taste qualities [8]. These studies indicate the possibility of changing the flavor that people experience with foods by changing the scent. Conversely, under many conditions, it is well known that humans have a robust tendency to rely upon vision more than other senses. Several studies have explored the effect of visual stimuli on our perception of flavor. For instance, taste and flavor intensity have been shown to increase as the color level in a solution increases [9]. However, Spence et al. state the empirical evidence regarding the role that food coloring plays in the perception of the intensity of a particular flavor or taste that is attributed (as reported by many researchers over the last 50 years), which is rather ambiguous because food coloring most certainly influences people's flavor identification responses [10]. Their survey suggests the possibility of changing the flavor identification by changing the appearance of food. Therefore, our research focuses on the technological application of the influence of appearance and scent on flavor perception. We propose a method to change the perceived taste of food by changing its appearance and scent. In this study, we use cookies as an example application. This is because cookies have a wide variety of appearance, scent, and taste; however, at the same time, almost all cookies are similar in texture and shape. Thus, we have developed a system to overlay the appearance and scent of a flavored cookie on a plain cookie to let users experience eating a flavored cookie although they are just eating a plain cookie.
3 Pseudo-gustatory Display: MetaCookie+ We developed a system, which we have named "MetaCookie+," to change the perceived taste of a cookie by overlaying visual and olfactory information onto a real cookie with a special AR marker pattern. 3.1 System Overview "MetaCookie+" (Fig. 1) comprises four components: a marker-pattern-printed plain cookie, a cookie detection unit based on an Edible Marker System, an overlaying visual information unit, and an olfactory display. Fig. 2 illustrates the system configuration of "MetaCookie+." Each component is discussed in more detail in the following sections. In this system, a user wears a head-mounted visual and olfactory display system. The cookie detection unit detects the marker-pattern-printed cookie and calculates the state (6DOF coordinate/occlusion/division/distance between the cookie and the nose of a user) of the cookie in real time. Based on the calculated state, an image of a flavored cookie is overlaid onto the cookie. Moreover, the olfactory display generates the scent of a flavored cookie with an intensity that is determined based on the calculated distance between the cookie and the nose of a user. The user can choose one cookie, which s/he wants to eat, from multiple types. The appearance and scent of the flavored cookie, which the user selects, are overlaid onto the cookie.
Meta Cookie+: An Illusion-Based Gustatory Display
Fig. 1. MetaCookie+
263
Fig. 2. System configuration of "MetaCookie+"
3.2 Edible Marker System For taste augmentation, an interaction with foods is necessary. At the time when the foods must be eaten or divided, a method to detect occlusion and division is required. However, conventional object-detection methods assume that a target object is continuous. When the object is divided into pieces, tracking will fail because the feature points of only a single piece are recognized as the target object whereas other feature points are regarded as outliers. Despite the importance of division as one of the state changes of an object, it has not been studied from the viewpoint of object detection for AR applications. Therefore, we proposed the "Edible Marker" system, which not only estimates the 6DOF coordinate of the AR marker, but also detects its occlusion and division. We then applied this system to "MetaCookie+." Fig. 3 shows the processing steps of the occlusion- and division-detectable "Edible Marker" system. The Edible Marker System estimates the 6DOF coordinate, occlusion, and division of a marker in three steps: Marker Detection, Background Subtraction and Superimposition. In step 1 (Marker Detection), the natural feature points are detected from the captured image and the marker position is extracted from an estimated homography matrix. Subsequently, the projected image of the marker area in the captured image can be obtained. For implementation, we used Fern [11] as the natural feature descriptor and classifier. In a conventional planar-object detection method, a homography matrix is estimated from the correspondence of the feature points between a prepared template image and an image captured by the user's camera. An accurate homography matrix is estimated by calculating its elements using the least squares method after outlier elimination. If we simply apply the conventional method to a divided planar object, only one out of all the pieces of the object is detected as an inlier; therefore, the other pieces cannot be detected as parts of the object. To detect all pieces of the divided object, another method is required. The proposed method detects the pieces of the divided object by iteratively applying PROSAC [12]. A database of the target object's feature points is prepared in advance. In each estimation process, the inlier points are deleted from the database. Next, estimation is performed using the updated database. Subsequently, this method detects the inliers for each piece of the target object. The iteration stops when the homography matrix calculation fails. After these processes, the projected image of each piece in the captured image is obtained.
264
T. Narumi et al.
In step 2 (Background Subtraction), a difference image is obtained by background subtraction. We implemented background subtraction based on the method shown in [13]. The image to the left of the middle panel in Fig. 3 is the difference between the template image and the projected image. Combining the temporary result and the mask image for superimposition, the final result is obtained. In step 3 (Superimposition), the image to the left of bottom panel in Fig. 3 is overlaid using the final result obtained in step 2. The result of superimposition is shown in the image to the right. Fig. 4 shows that the proposed object-detection system can be recognized even if the cookie is eaten, partially occluded, or divided. Furthermore, the area on which the image should be overlaid can be detected by the background subtraction method. The marker can manage an occlusion/division that is more than half of an entire cookie.
Fig. 3. Processing steps of the occlusion-and-division-detectable “Edible Marker” system and realistic superimposition based on the detection
3.3 Pattern Printed Cookie We made a plain cookie detectable by a camera for this system by printing a pattern on it with a food printer. We use a MasterMind's food printer "Counver" [14], which is a commercial off-the-shelf product. The printer produces a jet of colored edible ink and creates a printed image on a flat surface of food. 3.4 Cookie Detection and Overlaying an Appearance The Cookie Detection unit based on the Edible Marker system can obtain the 6DOF coordinate of the cookie and the distance between the cookie and the camera. In the Cookie Detection phase, two cameras are used in parallel. We used two Logicool Webcam Pro 9000 cameras (angle of view: 76°) in this implementation. The layout of the cameras, a head-mounted display (HMD), and an olfactory display is shown in Fig. 4. The range of the cameras is also illustrated in this figure. The two cameras are positioned to eliminate blind spots between the user's hands and mouth, in order to
Meta Cookie+: An Illusion-Based Gustatory Display
265
track the cookie from the time at which a user holds it to the time at which s/he puts it in her/his mouth.
Fig. 4. Layout of two cameras, a head-mounted display and an olfactory display
Camera 1 in Fig. 6 is for overlaying the appearance of another cookie on the marked cookie and deciding the strength of the produced smell. The relationship of the distance between the cookie and the camera to the strength of the produced smell is discussed below. This camera and an HMD are used for Video see-through. The HMD displays an image of several types of cookies on the pattern-printed cookie based on the estimated position and detected occlusion/division. This visual effect allows users to experience eating a selected cookie while merely eating a plain cookie. Another camera (Camera 2 in Fig. 6) is positioned in front of the user's nose and oriented in the downward direction in order to detect when the user eats a cookie. The area near the user's mouth is outside the first camera's line of vision. This is because we placed the second camera in front of the user's nose and oriented it in the downward direction. When the second camera detects that there is a cookie in front of the user's mouth (within 15 cm from the camera), the system recognizes that the user is about to put a cookie in her/his mouth. 3.5 Olfactory Display We use an air-pump-type head-mounted olfactory display (Fig. 5) to produce the scent of the selected cookie. The olfactory display comprises six air pumps, a controller, and scented filters. One is to send fresh air and five pumps are to send scented air. Each pump for scented air is connected to a scent filter filled with aromatic chemicals. It can eject fresh air and six types of scented air. The scent filters add scents to air from the pumps, and the scented air is ejected near the user's nose. The strength of these scents can be adjusted to 127 different levels. By mixing fresh air and scented air, the olfactory display generates an odor in arbitrary level with same air volume. Users are unable to feel any change in air volume when the strength of the generated odor changes.
266
T. Narumi et al.
According to the position of the pattern-printed plain cookie, the controller drives the air pumps. Nearer the marked cookie from the user's nose, stronger scent ejects from the olfactory display. Response time for generating arbitrary odor is less than 50 ms. It is quick enough to let users experience the change of smell in synchronization with the change of visual information.
Fig. 5. Air-pump type head-mounted olfactory display
"MetaCookie+" generates two patterns of olfactory stimuli for simulating orthonasal and retronasal olfaction. One pattern simulates orthonasal olfaction and functions after the user holds the pattern-printed cookie and before s/he brings it near her/his mouth. In this pattern, the controller drives the air pumps according to the position of the pattern-printed plain cookie. The nearer the pattern-printed cookie is to the user's nose, the stronger the scent ejected from the olfactory display. The olfactory display is activated when the cookie is detected within 50 cm from camera 1. The value of 50 cm is determined based on the average distance between the cameras and a 70 cm-high desk along the line of sight when the user sits on a chair in front of the desk. The strength of the smell produced by the olfactory display is zero when the distance is 50 cm and strongest when the distance is 0 cm. The output is controlled linearly within 50 cm from camera 1. Another pattern simulates retronasal olfaction and functions after the system recognizes that the user is about to put a cookie in her/his mouth with camera 2. When camera 2 detects a cookie in front of the user's mouth, the system produces the strongest smell from the olfactory display for 30 s. We determined the period to be longer than the time to finish eating a bite of the cookie. This olfactory information evokes a cross-modal effect between olfaction and gustation, and enables users to feel that they are eating a flavored cookie although they are just eating a plain cookie.
4 Evaluation In order to evaluate the effectiveness of our proposed method for inducing people to experience various flavors, we conducted an experiment to investigate how people experience flavor in a cookie by using the "Meta Cookie" system. The purpose of this
Meta Cookie+: An Illusion-Based Gustatory Display
267
experiment was to examine the cross-modal effect of visual stimuli and olfactory stimuli on gustation. We investigated how the participants would perceive and identify the taste of the cookie under conditions of only visual augmentation and only olfactory augmentation and visual and olfactory augmentation. We prepared two types of appearances and scents of commercially available cookies: chocolate and tea. We examined how the participants experience and identfy the taste of a plain cookie with these appearances and scent that are overlaid using our system. 4.1 Experimental Protocol The combinations of scent and appearance in each experimental condition, which were used in the experiment for representing flavored cookies, are illustrated in Fig. 6. There are 7 combinations (without augmentation, visual augmentation (chocolate), visual augmentation (tea) olfactory augmentation (chocolate) olfactory augmentation (tea), visual and olfactory augmentation (chocolate) and visual and olfactory augmentation (tea) ). We captured images of a chocolate cookie and a tea cookie, and used these to overlay onto real cookies. The experiment was conducted with 15 participants. The participants had never received training in the anatomy of tastes. And we did not inform the participants beforehand that our system aimed at changing the perceived taste before the participant finished the experiment.
Fig. 6. Experimental conditions
After subjects had eaten the plain cookie and the cookie which was augmented in one of the seven experimental conditions, they were asked to compare it with the plain cookie and to plot their experience of the taste on plotting paper. The plotting paper had two scales from -4 to 4: one for sweetness and one for bitterness. We defined the origin (0) of the scale as the taste of the plain cookie. Moreover, they were asked to write the taste they identified from the cookie. Subjects repeated these steps 7 times. To eliminate any effect of the order in which the cookie were eaten, the order was randomly assigned by the experimenters. In addition, subjects drank water in the intervals between their eating of the cookies.
268
T. Narumi et al.
4.2 Result Fig. 7 illustrates the results of this experiment. When the participants ate olfactory augmented cookie, they experienced a change in the cookie's taste in 80% of the trials. Moreover, when the participants ate cookie with visual stimuli (chocolate) and olfactory stimuli (chocolate), they identified it as the chocolate cookie in 67% of the trials. And when the participants ate cookie with visual stimuli (tea) and olfactory stimuli (tea), they identified it as the tea cookie in 80% of the trials. While when the participants ate cookie only with olfactory stimuli (chocolate), they identified it as the chocolate cookie in 47% of the trials. And when the participants ate cookie with visual stimuli (tea) and olfactory stimuli (tea), they identified it as the tea cookie in 67% of the trials. These averages in condition of only with olfactory stimuli are lower than the averages in condition of with visual and olfactory stimuli.
Fig. 7. The cross-modal effect of visual stimuli, olfactory stimuli and visual & olfactory stimuli on perception and identification of taste
4.3 Discussion The results suggested that olfactory stimuli play an important role in the perception of taste. While the results also suggested that olfactory stimuli cannot change the identification of the taste sufficiently without help of visual stimuli. These do suggest that cross-modal integration among visual, olfactory and gustatory plays important roles for pseudo-gustatory system and our system can change a perceived taste, and lets users experience various flavors without changing the chemical composition by only changing the visual and olfactory information.
5 Conclusion In this study, we proposed a "Psuedo-gustation" method to change the perceived taste of a cookie when it is being eaten by changing its appearance and scent with
Meta Cookie+: An Illusion-Based Gustatory Display
269
augmented reality technology. We built a "Meta Cookie" system based on the effect of the cross-modal integration of vision, olfaction, and gustation as an implementation of the proposed method. We performed an experiment that investigates how people experience the flavor of a plain cookie by using our system. The results of the experiment suggested that our system can change the perceived taste. Because our system can shift the flavor of nutritionally controlled foods from distasteful or tasteless to tasty or desired, we believe that it can be used for food prepared in hospitals and in diet food applications. Moreover, we believe we can build an expressive gustatory display system by combining this pseudo-gustation method based on cross-modal integration and methods for synthesizing a rough taste from fundamental taste substances. By doing so, we can realize a gustatory display, which is able to display a wide variety of tastes. Acknowledgement. This research was partially supported by MEXT, Grant-in-Aid for Young Scientists (A), 21680011, 2009.
References 1. Nakamoto, T., Minh, H.P.D.: Improvement of olfactory display using solenoid valves. In: Proc. of IEEE VR 2007, pp. 179–186 (2007) 2. Iwata, H., Yano, H., Uemura, T., Moriya, T.: Food Simulator: A Haptic Interface for Biting. In: Proc. of IEEE VR 2004, pp. 51–57 (2004) 3. Maynes-Aminzade, D.: Edible Bits: Seamless Interfaces between People, Data and Food. In: ACM CHI 2005 Extended Abstracts, pp. 2207–2210 (2005) 4. Delwiche, J.: The impact of perceptual interactions on perceived flavor. Food Qual. Prefer. 15, 137–146 (2004) 5. Chandrashekar, J., Hoon, M.A., Ryba, N.K., Zuker, C.S.: The receptors and cells for mammalian taste. Nature 444, 288–294 (2006) 6. Auvray, M., Spence, C.: The multisensory perception of flavor. Consciousness and Cognition 17, 1016–1031 (2008) 7. Rozin, P.: Taste-smell confusion and the duality of the olfactory sense. Perception and Psychophysics 31, 397–401 (1982) 8. Stevenson, R.J., Prescott, J., Boakes, R.A.: Confusing Tastes and Smells: How Odours can Influence the Perception of Sweet and Sour Tastes. Chem. Senses 24(6), 627–635 (1999) 9. Zampini, M., Wantling, E., Phillips, N., Spence, C.: Multisensory flavor perception: Assessing the influence of fruit acids and color cues on the perception of fruit-flavored beverages. Food Quality & Preference 18, 335–343 (2008) 10. Spence, C., Levitan, C., Shankar, M., Zampini, M.: Does Food Color Influence Taste and Flavor Perception in Humans? Chemosensory Perception 3(1), 68–84 (2010) 11. Ozuysal, M., Calonder, M., Lepetit, V., Fua, P.: Fast keypoint recognition using random ferns. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 448–461 (2010) 12. Chum, O., Matas, J.: Matching with prosac-progressive sample consensus. In: Proc. of CVPR 2005, vol. 1, pp. 220–226. IEEE, Los Alamitos (2005) 13. Li, L., Huang, W., Gu, I., Tian, Q.: Foreground object detection from videos containing complex background. In: Proceedings of the Eleventh ACM International Conference on Multimedia, p. 10. ACM, New York (2003) 14. Master Mind, Food Printer ”Couver”, http://www.begin.co.jp/goods/11_plotter/main_01.htm
LIS3D: Low-Cost 6DOF Laser Interaction for Outdoor Mixed Reality Pedro Santos1, Hendrik Schmedt1, Bernd Amend1, Philip Hammer2, Ronny Giera3, Elke Hergenröther4, and André Stork5 1
Fraunhofer-IGD, A2, Germany Deck13 Interactive GmbH, Germany 3 Semantis Information Builders GmbH, Germany 4 University of Applied Sciences Darmstadt, Germany 5 Technical University of Darmstadt, Germany {Pedro.Santos,Hendrik.Schmedt,Bernd.Amend}@igd.fhg.de,
[email protected],
[email protected],
[email protected],
[email protected] 2
Abstract. This paper introduces a new low-cost, laser-based 6DOF interaction technology for outdoor mixed reality applications. It can be used in a variety of outdoor mixed reality scenarios for making 3D annotations or correctly placing 3D virtual content anywhere in the real world. In addition, it can also be used with virtual back-projection displays for scene navigation purposes. Applications can range from design review in the architecture domain to cultural heritage experiences on location. Previous laser-based interaction techniques only yielded 2D or 3D intersection coordinates of the laser beam with a real world object. The main contribution of our solution is that we are able to reconstruct the full pose of an area targeted by our laser device in relation to the user.In practice, this means that our device can be used to navigate any scene in 6DOF. Moreover, we can place any virtual object or any 3D annotation anywhere in a scene, so it correctly matches the user’s perspective.
1 Introduction Why should research be conducted in the area of mixed reality technologies and which is their benefit for human-machine interaction? The main reason is that a human being will best interact with a machine in his known and familiar environment using objects that are also known and familiar to him. The ultimate goal of mixed reality applications is therefore to make the transition between virtual and real content appear seamless to the user and interaction easy to handle. However, to reach that ultimate goal many supporting technologies need to either be invented from scratch or further be developed and enhanced, many of which are vision based, because no other human sense offers so much available bandwidth for information transfer. R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 270–279, 2011. © Springer-Verlag Berlin Heidelberg 2011
LIS3D: Low-Cost 6DOF Laser Interaction for Outdoor Mixed Reality
271
The applications of mixed reality technologies are wide-spread. In the domain of industrial applications mixed reality technologies are used more and more along the production chain. Prominent examples feature the design and modelling stages of new prototypes, but also training and maintenance scenarios on the final products. Prototypes built in product design stages are increasingly virtual and no longer physical mock-ups until the final design is produced. Mixed reality technologies allow for seamless visualization of models in real environments under correct lighting conditions. In the automotive industry technologies such as autonomous vision systems for robots lay the foundations for advanced driver assistance systems which super-impose context and back-up path planning in the driver’s field of view. Using mixed reality for customer care concerning household appliances greatly reduces the need for an expert on location and increases safe handling of simple repairs or parts replacement by the customers themselves. In job-training scenarios mixed reality applications can make learning schedules more flexible, while guaranteeing constant levels of quality education. The cultural heritage domain benefits from mixed reality technologies that enable 3D reconstruction of an ancient piece of art, based on its fragments found at an archaeological site. In many cases even whole buildings or premises can be visualized in mixed reality on location in the way they would have been many years ago. Tourism benefits from mixed reality enabling visitors to explore a city without previous knowledge of the topography as well as points of interest which are superimposed in their view, while pose estimation technologies are used for navigation. In conclusion, mixed reality and its supporting technologies bring benefit to a wide range of domains simplifying human-machine interaction. The user’s attention is no longer diverted to other sorts of input or output devices when virtual content is directly super-imposed in his view. In addition the visual quality of virtual content has improved so much it seamlessly blends in with reality. Mixed reality makes complex tasks easier to handle. To correctly visualize a mixed reality scene requires good pose estimation technology, able to calculate the 6DOF position and orientation of the user or his display, in case of using a head mounted see-through device. Knowing his pose allows virtual content to accurately be super-imposed in his field-of-view. But what if the user wants to directly interact with the scene and place a 3D virtual object of his own next to a real building on a town square ? What if the user wants to add a 3D annotation to a monument or a location of special interest? What if he wants to move virtual content around a real scene when designing his new home ? Our new low-cost, laser-based 6DOF interaction technology offers a very simple way of answering those needs. Instead of using a single laser-pointer, we project a specific laser pattern on any flat surface and dynamically reconstruct its pose from its given projection on that surface.
2 Setup and Calibration The basic idea is to track a projected pattern consisting of five laser points on a planar surface and infer the pattern’s pose from a camera view on the pattern. The system is intended to be used in mixed reality where the user wears a camera attached to his
272
P. Santos et al.
head-mounted-display and projects a pattern on to any planar surface of his choice. To develop and test this idea we have used a stereo back-projection system first, featuring a webcam capturing projected laser patterns on the screen and we have built a low cost interaction device out of five small, off-the-shelf laser pointers, which projecting a cross-hair pattern on any target area (Figure 1).
Fig. 1. Projected pattern
Later the goal will be to test it outdoors for mixed reality environments, allowing a user to interact with reality and place, move or modify virtual content anywhere onthe-fly while looking at it with a laser pattern generator connected to his Headmounted display, letting the mixed reality rendering system correctly super-impose the virtual content in his view. Screen Setup. The preliminary test environment for the device and its corresponding algorithms is composed of a back-projection stereo wall with two projectors for passive stereo projection and a webcam with a daylight blocking filter to better resolve the red laser beams when projecting a pattern on the projection screen resulting in an outside-in tracking setup for the interaction device (Figure 2).
Fig. 2. Outside-in tracking setup and camera view of projected patterns
Interaction Device. The interaction device consists of five laser pointers which are mounted on an aluminium chassis, so that each pointer can individually be aligned. Having flexible mount points was useful to determine the best possible projection pyramid for the laser lights depending on the distance to the projection target which in
LIS3D: Low-Cost 6DOF Laser Interaction for Outdoor Mixed Reality
273
Fig. 3. Laser Interaction Device
our case was the back-projection display. We have used 5 mW Lasers with a wavelength of 635mm-670mm. All laser pointers are fed by a single battery pack and are switched on and off simultaneously (Figure 3). Calibration of the Setup. To properly use the interaction device we calibrated both, the camera that films and tracks the projection of the laser pattern on a surface (backprojection screen) and the laser-set itself. Concerning the camera calibration we applied a radial and perspective distortion correction. For the radial correction we use algorithms as stated in [1][2] and implemented in OpenCV, which take into account that real lenses also have a small tangential distortion. A checkered calibration pattern is used for that purpose. The perspective correction is needed, because the camera is not perpendicular to the back-projection screen. Therefore recorded video footage of any pattern projected on top of the screen would always feature an additional distortion due to the camera angle it is taken from. To compensate for this in our calculus, we project checkered pattern previously used for radial distortion correction on the screen and identify its four corners which are then matched against the camera image plane to define a homography from the distorted input to the perspective corrected picture we want to use for tracking, where the edges of the calibration pattern match the edges of the camera image plane (Figure 4).
274
P. Santos et al.
In practical terms, calibration of the back-projection setup including the tracking camera is done by interactively marking the edges of the pattern to compute the radial and the perspective distortion.
Fig. 4. Without and with perspective distortion correction
Calibration of the Laser-set. To calibrate the laser interaction device we have to align all five lasers accordingly. This is achieved by a small tool that projects the target pattern on the back projection screen, so the laser lights have to match it, while the laser-set is mounted perpendicular to the screen. A configuration file is generated which contains the established projection pyramid of the laserset. This step is needed to properly interpret the results of the laser tracking and being able to identify absolute values for position and orientation in relation to the target screen.
3 Laser Tracking Once the setup is calibrated, the goal is to be able to track the projected laser pattern consisting of five points and avoiding or identifying ambiguities in particular when more than one of those interaction devices is used, which means that projected laser points need to be associated to their corresponding patterns generated by the respective devices. Our approach to solve this tracking problem consists of the following tracking pipeline: • • • • •
Point recognition Point tracking Line recognition Pattern recognition Pose reconstruction
Point Recognition. The camera has a daylight blocking filter to enhance the effect of the laser pointers. The first step of the pipeline is point acquisition. For this purpose we initially converted incoming video footage to greyscale and compensated radial
LIS3D: Low-Cost 6DOF Laser Interaction for Outdoor Mixed Reality
275
and perspective distortion. However the two latter operations took around 60ms per frame for a camera resolution of 1280x1024. Therefore we first identified relevant feature points in a frame and then compensated for radial and perspective distortion for those points only, so this would take less than 1ms per point and we were able to process much more feature points in real-time. To identify a feature point in a frame, we use a contour finder [2][4][5] and search for ellipsoids that fit matching criteria in respect to their minimum and maximum sizes as well as their ratio regarding their major and minor axis (Figure 4). We have to impose constraints because despite using laser light, the contours of a projected laser beam on the screen are not a well-defined ellipse, but only an approximation of it. Moreover, since the tracking camera is behind the screen together with the stereo projectors, we also have to filter out the projectors’ bright spots in the centre of the projection using an infrared filter. As an alternative to the described approach we also implemented a method using adjacency lists on connected points.
Fig. 4. Two laser projections and the recognized points
Point Tracking. Point recognition outputs a list of detected feature points. To track points from one frame to another we analyze the position, speed and direction of a laser point. Those criteria are dependent on camera frame-rate, covered area, camera
276
P. Santos et al.
resolution and the number of points tracked. For each feature point pair in one frame we calculate a rating stating how similar those points are to each other. We do the same for the subsequent frame and can then match the pairs to each other to identify the previous and subsequent position of the same point. The similarity rating per point is calculated based on the current position, the previous and current direction and the speed and results in a number between 0 and 1. For each new point we calculate its similarity rating to all points of the previous frame.
Where
For each point we take the best and second best similarity rating and calculate if erfülltist. “nearest neighbourhood” is a factor which specifies how much better the first choice must be over the second best choice. If we do not have a match, then we have a new point (Figure 4). Line Recognition. To detect the laser pattern we now use Hough Transforms to identify three points on a line. The difference to common use of the Hough Transform is, that here we do not transform points of a line detection into an accumulator, but only the already recognized points in the current frame[3],[6],[7]. We build a list of potential candidates for lines (accumulator cells with more than k hits). Cells with 3 hits allow for a single line. Cells with 4 hits allow for 4 lines and cells with 5 hits allow for 8 lines. Already recognized lines in a previous frame are found in a subsequent frame if all previous points correspond to one of the new potential lines. In that case the previous line is used. Figure 5 shows the result of three projected laser-set patterns with three interaction devices.
LIS3D: Low-Cost 6DOF Laser Interaction for Outdoor Mixed Reality
277
Fig. 5. Three projected laser-set patterns and possible line combinations
Pattern Recognition. To identify the cross patterns proposed we use a Greedyalgorithm which tries to find a generic solution to a problem by iteratively finding local optima. For this purpose we see the number of lines as an undirected graph connecting the corresponding points and build a co-existence matrix which for each pair of points stores their ability to co-exist. The solution of the problem is represented by the maximum clique, meaning the biggest connected sub-graph (Figure 6).
Fig. 6. All possible patterns and final result
Pose Recognition. Finally the pose of each recognized pattern is reconstructed from the delta transformation between previous and current frame. Each transformation is around the pattern’s axis which in our case is the center point. Therefore rotations and translations are applied around that point to alter the pose. A 2d translation can easily be calculated by observing the shift of the center point from previous to current frame. A 2d rotation can be calculated from rotation around its center in the previous and current frame. Yet to reconstruct the 3D pose in space we use an algorithm of our own which we cannot disclose at this stage [11] or alternatively a variant of the algorithm
278
P. Santos et al.
proposed by [2],[3] which iteratively computes the pose based on the knowledge of the points and dimension of the pattern.
4 Results We have successfully demonstrated the feasibility of a 6DOF laser-based interaction technique requiring 20ms per interaction pattern for pose reconstruction on regular Intel Core 2Duo 2.4Mhz Hardware. We have validated our results in outside-in and inside-out tracking scenarios. The usage of a pattern-based laser interaction greatly simplifies human computer interaction in mixed reality environments for the purpose of 3D annotations or virtual content modification (Figure 7).
Fig. 7. Test applications being controlled by Laser interaction
References 1. Zhang, Z.: Flexible Camera Calibration By Viewing a Plane From Unknown Orientations. Version:1999. IEEE, Los Alamitos (1999) 2. Santos, P., Stork, A., Buaes, A., Pereira, C.E., Jorge, J.: A Real-time Low-cost Markerbased Multiple Camera Tracking Solution for Virtual Reality Applications. Journal of Real-Time Image Processing 5(2), 121–128 (2010); First published as Online First, November 11 (2009), DOI 10.1007/s11554-009-0138-9 3. Santos, P., Stork, A., Buaes, A., Jorge, J.: PTrack: Introducing a Novel Iterative Geometric Pose Estimation for a Marker-based Single Camera Tracking System. In: Fröhlich, B., Bowman, D., Iwata, H. (eds.) Proceedings of Institute of Electrical and Electronics Engineers (IEEE): IEEE Virtual Reality 2006, pp. 143–150. IEEE Computer Society, Los Alamitos (2006) 4. Sukthankar, R., Stockton, R., Mullin, M.: Self-Calibrating Camera-Assisted Presentation Interface. In: Proceedings of International Conference on Automation, Control, Robotics and Computer Vision (2000) 5. Kurz, D., Hantsch, F., Grobe, M., Schiewe, A., Bimber, O.: Laser Pointer Tracking in Projector-Augmented Architectural Environments. In: Proceedings of the 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, November 13-16, pp. 1–8. IEEE Computer Society, Washington, DC (2007), http://dx.doi.org/10.1109/ISMAR.2007.4538820
LIS3D: Low-Cost 6DOF Laser Interaction for Outdoor Mixed Reality
279
6. Kim, N.W., Lee, S.J., Lee, B.G., Lee, J.J.: Vision based laser pointer interaction for flexible screens. In: Jacko, J.A. (ed.) Proceedings of the 12th International Conference on Human-Computer Interaction: Interaction Platforms and Techniques, Beijing, China. LNCS, pp. 845–853. Springer, Heidelberg (2007) 7. Zhang, L., Shi, Y., Chen, B.: NALP: Navigating Assistant for Large Display Presentation Using Laser Pointer. In: First International Conference on Advances in Computer-Human Interaction, February 10-15, pp. 39–44 (2008), doi:10.1109/ACHI.2008.54 8. Santos, P., Schmedt, H., Hohmann, S., Stork, A.: The Hybrid Outdoor Tracking Extension for the Daylight Blocker Display. In: Inakage, M. (ed.) ACM SIGGRAPH: Siggraph Asia 2009. Full Conference DVD-ROM, p. 1. ACM Press, New York (2009) 9. Santos, P., Gierlinger, T., Machui, O., Stork, A.: The Daylight Blocking Optical Stereo See-through HMD. In: Proceedings: Immersive Projection Technologies / Emerging Display Technologies Workshop, IPT/EDT 2008, p. 4. ACM, New York (2008) 10. Santos, P., Acri, D., Gierlinger, T., Schmedt, H.: Supporting Outdoor Mixed Reality Applications for Architecture and Cultural Heritage. In: Khan, A. (ed.) The Society for Modeling and Simulation International: 2010 Proceedings of the Symposium on Simulation for Architecture and Urban Design, pp. 129–136 (2010) 11. Amend, B., Giera, R., Hammer, P., Schmedt, H.: LIS3D Laser Interaction System 3D, System Development project internal report, Dept. Computer Science, University of Applied Sciences Darmstadt, Germany (2008)
Olfactory Display Using Visual Feedback Based on Olfactory Sensory Map Tomohiro Tanikawa1, Aiko Nambu1,2, Takuji Narumi1, Kunihiro Nishimura1, and Michitaka Hirose1 1
The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku Tokyo, 113-8656 Japan 2 Japan Society for the Promotion of Science, 6 Ichibancho, Chiyoda-ku Tokyo, 102-8471 Japan {tani,aikonmb,narumi,kuni,hirose}@cyber.t.u-tokyo.ac.jp
Abstract. Olfactory sensation is based on chemical signals whereas the visual sensation and auditory sensation are based on physical signals. Therefore olfactory displays which exist now can only present the set of scents which was prepared beforehand because a set of “primary odors" has not been found. In our study, we focus on development of an olfactory display using cross modality which can represent more patterns of scents than the patterns of scents prepared. We construct olfactory sensory map by asking subjects to smell various aroma chemicals and evaluate their similarity. Based on the map, we selected a few aroma chemicals and implemented a visual and olfactory display. We succeed to generate various smell feeling from only few aromas, and it is able to substitute aromas by pictures nearer aromas are drawn by pictures more strongly. Thus, we can reduce the number of aromas in olfactory displays using the olfactory map. Keywords: Olfactory display, Multimodal interface, Cross modality, Virtual Reality.
1 Introduction Researches on olfactory displays are evolving into a medium of VR as well as visual and auditory displays. However, there are some bottlenecks in olfactory information presentation.Visual, auditory and haptic senses come from physical signals, whereas olfactory and gustatory senses come from chemical signals. Therefore, researches on olfactory and gustatory information are not so well on the way of researches as that of visual, auditory and haptic information. Olfaction is most unexplained among five senses, and even the mechanism of reception and recognition of smell substances is unknown. Thus, “primary odors”, which can represent all types of scents, are not established. It means that the policy on mixing and presenting smell substances does not exist. Thus, it is difficult to present various scents using olfactory displays.In addition, olfaction is more unstable and variable than vision and audition. It is known that we can identify scents of daily materials only fifty percent of the time. For example, only half can answer “apple” when they sniff apples. [1,2] R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 280–289, 2011. © Springer-Verlag Berlin Heidelberg 2011
Olfactory Display Using Visual Feedback Based on Olfactory Sensory Map
281
1.1 Olfactory Display Olfactory displays can produce high realistic sensation which cannot be given by vision or audition. We illustrate olfactory displays developed so far by two following examples. “Let’s cook curry” by Nakamoto, et al. [3] is an olfactory display with interactive aroma contents, “a cooking game with smells”. It presents smells of curry, meat, onion and so on by player’s control. Wearable olfactory display” by Yamada, et al. [4] generates a pseudo olfactory field by changing concentration of some kinds of aroma chemicals using position information. However, they produce only combination of prepared element odors, selected aroma chemicals, in each preceding studies. Therefore element odors cannot represent smells which do not belong to the element odors. It means conventional olfactory displays have limitations to represent various smells. In order to implement practicable olfactory displays which can produce more various smells than before, it is required to reduce the number of element odors and produce feelings of wide range of smells from those few element odors. We focused on instability and variability of olfaction. Using the band of olfactory fluctuation, there is a possibility that we can make people feel a smell different from the presented smell by using some techniques when a certain smell material is presented. If we are able to make people feel other smell than actual element odors, we can treat element odors the same as “primary odors” and we can generate various olfactory experiences from the element odors.
2 Concept 2.1 Drawing Effect to Olfaction by Visual Stimuli Olfaction has more ambiguity than vision or audition. For example, it is difficult to distinguish the name of flowers or foods only by scents, unlike by visual images. [2] Thus, olfaction is easily affected by knowledge of smell and other sensation. [5][6] In addition, olfactory sensation interacts with other various senses, especially visual sensation. That is, it is thought that olfaction can be used for the information presentation more effectively by the interaction with the cue of other sensation than olfaction. [7][8][9] In this paper, we tried to generate various "pseudo olfactory experience" by using the cross modal effect between vision and olfaction. When the visual stimulus which contradicts the presented olfactory stimulus is presented, the visual stimulus influences olfaction. It is able to produce the olfactory sensation corresponding to not the smell generated actually but the visual information by presenting an image conflicting with the produced smell. We defined this cross modal effect between vision and olfaction as "drawing effect" on olfaction by vision. This drawing effect leads visual images to give pseudo olfactory sensation not presented actually. For example, Nambu et al. [10] suggest the possibility of producing the scent of melons from the aroma of lemons, which is unrelated to the scent of melons, by showing a picture of melons. In this case, the picture of melons
282
T. Tanikawa et al.
draws the aroma of lemons toward the pseudo scent of melons. When we apply the drawing effect to the olfactory display, we need the index what condition intensifies the drawing effect.
Fig. 1. Concept of Visual-olfactory Display
2.2 Olfactory Display Using Sensory Maps It is thought that the strength of the drawing effect depends on the kind of the presented scent. That is to say, it is forecast that the drawing effect is generated easily between smells with high degree of similarity of the scent, while the drawing effect is not generated easily between scents with low degree of similarity of the scent. Then, we propose to use the degree of similarity of the smell, or the distance between scents, as an index in the drawing effect to the smell. The method of evaluating the degree of similarity of the scent includes a method using similarity according to the language and similarity of the chemical characteristics.[11] However, there is not a fundamental research on the degree of similarity of the scent based on human’s olfactory sensation yet. Then, we tried to construct a new olfactory map based on smell evaluation, which is more approximate to olfactory sensation than past olfactory maps. The distance between scents can be evaluated more accurately and easily than before by making the olfactory map based on olfaction. Then, it becomes possible to operate the drawing effect to olfaction by vision more efficiently if we can prove that the closer the content of the olfactory source and the content of the picture aimed at, the stronger the drawing effect occurs.
3 Construction of an Olfactory Map It is difficult to take out a common part of multiple people's smell senses and to make it into the common map because olfaction has more individual variations than other senses. [12] Two kinds of approaches are thought for making an olfactory map. The first approach is making a personal olfactory map. It is required to measure distance, to make the map, and to prepare appropriate aromas for each user but the olfactory display completely suitable for each user can be achieved.
Olfactory Display Using Visual Feedback Based on Olfactory Sensory Map
283
The second approach is evaluating the distance between scents by many people, extracting a common part of olfaction not influenced to the individual variation and making the map. The advantage of this approach is to be able to obtain the data of the distance between smells in which one map has generality. Therefore, when the individual variation of the result is not extremely large, it can be said that it is more advantageous for the development of an olfactory display suitable for practical use to use the approach of this common olfactory map. It is necessary to consider two points, secure reliability to the smell evaluation and application to multiple people with individual variation of olfaction, to make a common olfactory map by using sense of smell. We discuss the method for making the olfactory map satisfying the points and evaluation to the map. 3.1 Method for Constructing Olfactory Maps The procedure of constructing an olfactory map is as follows. First, we prepared 18kinds of fruit flavored aroma chemicals: lemons, oranges, strawberries, melons, bananas, grapefruits, yuzu (Chinese lemons), grapes, peaches, pineapples, lychees, guavas, mangoes, apples, green apples, kiwis, apricots, plums. We confined the kinds of aroma chemicals among fruit flavors because we can compare the similarity between two aromas in the same category (fruits, flowers, dishes, etc.) easily than between two aromas in different categories. Then, we soaked test papers into each aroma chemicals and used them as smell samples. Seven subjects evaluated the degree of similarity of two smell samples by five stages from among them. For example, if a subject feels that the scent of oranges and the scent of melons are very similar, the similarity is “4”. Then, we calculated smell distance between two smell samples as “5 minus similarity”. Correspondence table between the similarity and the distance is illustrated in Table 1. This trial was done to the combination of all of the 18-kinds smell samples. Table 1. Relationship between score of similarity and distance of smell samples Score of Similarity 1 2 3 4 5
Mention of questionnaire Different Less similar Fairly similar Very similar Hard to tell apart
Score of Distance 4 3 2 1 0
The relation of the distance of the 18-kinds of smell samples is shown as 18 dimensions square matrix. We analyzed the distance matrix by Isometric Multidimensional Scaling (isoMDS) and mapped the result into a two dimension olfactory map. 3.2 Evaluation of Reliability Reliability to the smell evaluation should be ensured because human sense of smell is unstable. In order to check if people can identify the same aromas as same scents, we
284
T. Tanikawa et al.
prepared a pair of test paper of the same aroma chemicals, and we asked five subjects to evaluate the degree of similarity between the same aromas. We went on the experiment for two kinds of aromas: Lemons and Lyches. Then, we compared the results of comparing the same aromas with the average of the degree of similarity to the combination of all of the 18 kinds of smells.
Fig. 2. The degree of similarity between the same aromas. The mean value of degree of similarity between same lemon aromas is 4.6 and the one between lychee aromas is 4.2. Both results differ significantly from the mean value among entire 18-kinds of aroma chemicals (2.07). (p<0.01).
There is reliability in the similar level measurement based on the sense of smell because the mean value of the degree of similarity between the same aromas indicated a remarkably high value compared with the entire mean value. 3.3 Making the Common Olfactory Map We created a policy that it is possible to construct the common olfactory map by integrating and extracting a common part from results of multiple people’s olfactory maps. In this section, we constructed a common olfactory map by averaging the result of 7 subjects’ olfactory similarity evaluation. We tried two ways of methods to average subjects’ result: the simple average and the binarized average of similarity values. The simple average method is the method calculating the average of smell distance between each aroma and mapping them aroma chemicals by isoMDS. The binarized average method is the method binarizing smell distance before calculating arithmetic average and mapping the results in order to prevent blurring the map by fluctuation of subjects’ answers. First we set a proper threshold value from 0 to 5 and divided the similarity values into the smell distance value 0 if the similarity degree value is under the threshold and the distance value 1 if it is over the threshold. We set the threshold between 3 and 4 because the similarity degree of two identical aromas is no less than 4 in 9 trials out of 10 in 3.2 Evaluation.
Olfactory Display Using Visual Feedback Based on Olfactory Sensory Map
285
Fig. 3. (A) simple average (left), (B) binarized average (right)
On the simple average map (Fig.3.A), the scent “yuzu” was mapped between “lemon” and “lemon2” (same as lemon) and “guava” was mapped between “lychee” and “lychee2” (same as lychee). It means that the distance between different aromas is closer than the distance between the same aromas. In contrast, on the binarized average map (Fig.3.B), “lemon2” is closest to “lemon” among all aromas, and “lychee2” is closest to “lychee” among all aromas. The binarized average method can prevent blurring among individuals and blurring between subjects more than the simple average, and suitable for the olfactory map generation. Furthermore, the generated olfactory map makes it possible to categorize aroma chemicals based on each rough character of the smell like citrus fruits and the apples, etc. There is a possibility to implement the olfactory display with which small number of representative aroma chemicals can present various smells by selecting representative aromas from each category.
4 Olfactory Display Using Olfactory Maps If we prove that “the more similar the content of picture and the content of aroma chemicals are, the more the drawing effect is likely to happen”, we can implement a brand-new olfactory display which can render more kinds and range of smells than the number and range of a few of prepared aroma chemicals. In this chapter, we describe the prototype of visual-olfactory display system and the experiments to evaluate the effect of the smell distance on the map to the drawing effect. 4.1 Implementation of Visual-Olfactory Display The visual-olfactory system consists of an olfactory display and notebook PC for showing pictures and control. (Fig.4)
286
T. Tanikawa et al.
The olfactory display consists of the scent generator, the controller, the showing interface and PC monitor. The scent generator has four air pumps. Each pump is connected to a scent filter filled with aroma chemicals. The controller drives the air pumps in the scent generator according to the command from PC. The scent filters add scents to air from the pumps and then the showing interface ejects air nearby user’s nose.
Fig. 4. Prototype system of visual-olfactory display
4.2 Evaluation of the Visual-Olfactory Display We went on experiments to evaluate the visual-olfactory display for 7 subjects. These 7 subjects are different from subjects of the evaluation 3 in chapter 3.4 in order to prove the validness of olfactory map for people not participating to build the map. We showed them a picture from 18-kind of pictures of fruits and an aroma from 4kind of element aromas. The pictures of fruits correspond to 18 aroma flavors used in chapter 3 one by one. Then, we asked them, “What kind of smell you feel by sniffing the olfactory display?” We conducted the experiment in a well-ventilated large room to avoid mixing different aromas and olfactory adaptation. The four kinds of element aromas were selected from 18-kinds of fruit used in chapter 3. First, we categorized the 18-kinds of aromas into four groups by features of scents. (Fig.5) Then we selected one scent from each group (apple, peach, lemon, lychee) so as to minimize the distance between a key aroma and each another aroma in the same category as the key aroma. Each picture was shown with the nearest key aroma. If subjects answer that they feel the smell correspondent to the shown picture when the content of the picture and the content of the aroma are different, the drawing effect is considered to occur. Thus, we used the rate of answering the smell of the shown picture as an index of the drawing effect. In order to prove “the more similar the content of picture and the content of aroma chemicals are, the more the drawing effect is likely to happen”, we also conducted another experiment to evaluate the drawing effect between the picture and another aroma which is not closest to the content of picture. We asked subjects to answer what smell they felt when we showed them a picture and the aroma second closest to the picture. The trials were done for 9-kinds of picture. We compared the drawing effect between trials using the closest aroma and trials using the second closest aroma.
Olfactory Display Using Visual Feedback Based on Olfactory Sensory Map
287
Fig. 5. Categorization of aroma chemicals
Each subject answered the scent of the picture in an average of 13 of 27 trials (36%) per person showing a picture and an aroma. This is statistically higher than the rate to answer the scent of the aroma, 11%. (p<0.01) The number of description of answer for 27 trials per person was as many as an average of 13 kinds although we used only four kinds of aromas. Moreover, we compared the rate to answer that the smell is like the content of the picture by olfactory distance on the map. The rate was 44% when the picture and the aroma were close, while 27% when the picture and the aroma were distant. It means the close pair helped subjects to answer the smell influenced by the picture statistically significantly. (p<0.01) (Fig.6)
Fig. 6. Comparison by Distance
288
T. Tanikawa et al.
5 Conclusion There were much more answers that it smelled corresponding to the destined image than the answer that it smelled corresponding to the aroma chemical actually presented. Besides, the kind of a free answer included many kinds of smells. These results confirmed that we can generate several times more kinds of pseudo smells than the number of prepared aroma chemicals. In addition, the fact that a similar set of a picture and an aroma increases the rate of the drawing effect of the picture to the aroma proved the hypothesis that the closer the picture and the aroma on the olfactory map were, the stronger the drawing effect was. As well as the smell distance, positional relationship of smells in the map is available for a criterion of selecting element odors. However, the rate answering the smell of the destined picture was 44%, which was not enough high to use the drawing effect for olfactory displays actually. This is attributed to the difficulty to identify the picture and the name by free answer. For example, guavas are not popular for Japanese subjects so that they are thought not to be able to recall the name of guavas by visual cue, and it is difficult to tell the picture of lemons and grapefruits apart. It proved to be possible to construct the olfactory map based on olfactory sensation according to the sensory evaluation of the smell similarity of two or more people. The common olfactory map suitable for people with various olfactory sensation patterns can be used as well as the language based olfactory map. Using the common olfactory map, we achieved an olfactory display presenting various smells virtually from a few aroma chemicals. It becomes possible to achieve olfactory virtual reality with a simpler system by reducing the number of aroma sources. Moreover, it is thought to be possible to make olfactory maps among other kinds of smells like flowers or dishes as far as the smells have visual cues in order for the olfactory cue. The technique changing smell feeling by visual sensation without changing the aroma chemicals itself makes it possible to achieve high quality olfactory VR more easily.
References 1. Cain, W.S.: To know with the nose: Keys to odor identification. Science 203, 467–470 (1979) 2. Sugiyama, H., Kanamura, S.A., Kikuchi, T.: Are olfactory images sensory in nature? Perception 35, 1699–1708 (2006) 3. Nakamoto, T., Otaguro, S., Kinoshita, M., Nagahama, M., Ohinishi, K., Ishida, T.: Cooking Up an Interactive Olfactory Game Display. IEEE Computer Graphics and Applications 28(1), 75–78 (2008) 4. Yamada, T., Yokoyama, S., Tanikawa, T., Hirota, K., Hirose, M.: Wearable Olfactory Display: Using Odor in Outdoor Environment. In: Proceedings IEEE VR 2006, pp. 199– 206 (2006) 5. Herz, R.S., von Clef, J.: The influence of verbal labeling on the perception of odors: evidence for olfactory illusion? Perception 30, 381–391 (2001) 6. Gottfried, J., Dolan, R.: The Nose Smells What the Eye Sees: Crossmodal Visual Facilitation of Human Olfactory Perception. Neuron 39(2), 375–386 (2003)
Olfactory Display Using Visual Feedback Based on Olfactory Sensory Map
289
7. Zellner, D.A., Kautz, M.A.: Color affects perceived odor intensity. Journal of Experimental Psychology: Human Perception and Performance 16, 391–397 (1990) 8. Grigor, J., Van Toller, S., Behan, J., Richardson, A.: The effect of odour priming on long latency visual evoked potentials of matching and mismatching objects. Chemical Senses 24, 137–144 (1999) 9. Sakai, N., Imada, S., Saito, S., Kobayakawa, T., Deguchi, Y.: The Effect of Visual Images on Perception of Odors. Chemical Senses 30(suppl. 1) (2005) 10. Nambu, A., Narumi, T., Nishimura, K., Tanikawa, T., Hirose, M.: A Study of Providing Colors to Change Olfactory Perception - Using ”flavor of color”. In: ASIAGRAPH in Tokyo 2008, vol. 2(2), pp. 265–268 (2008) 11. Bensafi, M., Rouby, C.: Individual Differences in Odor Imaging Ability Reflect Differences in Olfactory and Emotional Perception. Chemical Senses 32, 237–244 (2007) 12. Lawless, H.T.: Exploration of fragrance categories and ambiguous odors using multidimensional scaling and cluster analysis. Chemical Senses (1989) 13. Smith, T.F., Waterman, M.S.: Identification of Common Molecular Subsequences. J. Mol. Biol. 147, 195–197 (1981)
Towards Noninvasive Brain-Computer Interfaces during Standing for VR Interactions Hideaki Touyama Toyama Prefectural University 5180 Kurokawa, Imizu, Toyama 939-0398, Japan
[email protected]
Abstract. In this study, we propose a portable Brain-Computer Interface (BCI) aiming to realize a novel interaction with VR objects during standing. The ElectroEncephaloGram (EEG) was recorded under two experimental conditions: I) the subject was during sitting at rest and II) during simulated walking conditions in indoor environment. In both conditions, the Steady-State Visual Evoked Potential (SSVEP) was successfully detected by using computer generated visual stimuli. This result suggested that the EEG signals with portable BCI systems would provide a useful interface in performing VR interactions during standing in indoor environment such as immersive virtual space. Keywords: Brain-Computer Interface (BCI), Electroencephalogram (EEG), Steady-State Visual Evoked Potential (SSVEP), standing, immersive virtual environment.
1 Introduction The Brain-Computer Interfaces (BCIs) are communication channels with which the computer or machine can be operated only by human brain activities [1]. In recent years, the useful applications using virtual reality technology were demonstrated and the feasibility of the BCI in immersive virtual environment was presented [2]-[4]. The steady-state visual evoked potential (SSVEP) can be used in order to determine user's eye-gaze directions [5]. For example, Cheng et al investigated the virtual phone [6]. Furthermore, Trejo et al developed a realistic demonstration to control a moving map display [7]. However, most of the previous works on BCI application based on SSVEP have been performed on a standard computer monitor. In an immersive virtual environment, based on the SSVEP, the author made a demonstration on the control of a 3D object in real time according to the eye-gaze directions of the user [8]. One of the problems in the BCI applications is in the motivation of the user. Usually, the BCIs have been used by the user sitting in a chair in order to avoid the additional artifacts from the muscles activities. This made the user unable to use the system for long time. Then, aiming to use BCI applications in immersive virtual environment, the author investigated the feasibility of virtual reality (VR) interactions with the subject in physically moving conditions. R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 290–294, 2011. © Springer-Verlag Berlin Heidelberg 2011
Towards Noninvasive Brain-Computer Interfaces during Standing for VR Interactions
291
This paper is organized as follows. In section 2, experimental settings are explained. In section 3, the results of our experiments are shown. Discussions and conclusions are mentioned in the following sections.
2 Experiments The healthy male participated in the experiments as the subject. He was unsophisticated person for the EEG experiments. In the experiment I, the subject was comfortably sitting on an arm-chair facing visual stimuli of computer generated images on a monitor. There were two flickering visual stimuli in the visual field as shown in Figure 1. In the experiment II, during the measurements, the subjects were instructed to do simulated walking. Here, the position of the visual stimuli was adjusted to the height of the subjects’ eyes.
Left 4Hz
Right 6Hz
Fig. 1. Flickering visual stimuli EEG electrode
Flickering visual stimuli A portable amplifier
A subject under mimic walking
A laptop PC for data acquisition & signal processing
Fig. 2. The experimental setup
During both experiments, the scalp electrodes were applied in order to perform EEG recordings. In this study, one-channel EEG signals were investigated from Oz
292
H. Touyama
according to the international 10/20 system [9]. A body-earth and a reference electrode were on a forehead and on a left ear lobe, respectively. The analogue EEG signals were amplified at a multi-channel bio-signal amplifier (Polymate II (AP216), TEAC Corp. Japan) which is compact and portable. The subject was wearing a tiny bag on his belt in which the compact amplifier was included. The amplified signals were sampled at 200 Hz. The digitized EEG data was stored in a laptop computer (apart from the subject in this study). The experimental setup is shown in Figure 2. The experiment I and II were performed alternatively with the interval of rest for about 1 minute. In the experiment I, one session was for gazing at left (flickering at 4Hz) visual stimulus and the following session was for right (6Hz) one. It was same for the experiment II. Here, one session consisted of 30 sec of EEG measurements.
3 Results In order to extract the features of SSVEP induced by the flickering stimuli we adopted the frequency analysis on the collected EEG data. Figure 3 shows the results of the power spectral density in Fast Fourier Transform (FFT) analysis in simulated walking conditions. The clear SSVEP (fundamental and the harmonic signals) were observed in the frequency corresponding flickering stimuli. 3
) B (d2.5 itys ne 2 D alr1.5 tc ep S 1 re ow0.5 P
0 0
4Hz
5
10 15 Frequency (Hz)
20
3
) dB ( 2.5 yit sn 2 e D la1.5 trc peS 1 re w o0.5 P
0 0
6Hz
5
10 15 Frequency (Hz)
20
Fig. 3. The results of frequency analysis (Upper: Gazing at 4Hz-flickering stimulus, Lower: Gazing at 6Hz-one)
Furthermore, to see the quality of EEG signals we investigated the single shot data. The EEG segments for 3 sec were taken from all collected sessions. For all segments, the FFT analysis was applied to have feature vectors. After that, the pattern recognition
Towards Noninvasive Brain-Computer Interfaces during Standing for VR Interactions
293
algorithm was applied in order to classify two brain states (EEG signals in 4Hz and 6Hz flickering conditions).The number of the feature dimension was reduced by using Principal Component Analysis followed by the classification by Linear Discriminant Analysis. To estimate the classification performance a leave-one-out method was adopted, where only one data was used for the testing and the others were for the training. It was found that the pattern recognition performance was successfully obtained to be 88.3% for the experiment I (sitting conditions). And we found the similar performance for the experiment II (simulated walking conditions). Thus, in both conditions, we could decode the subjects’ eye gaze directions more than 85% of accuracy.
4 Discussions The results of the SSVEP classification show the feasibility of BCI application for VR interactions during standing or physically moving. The group of the author tested the BCI performance by developing an online CG controlling system. In the preliminary study, the cursor control was possible even with the subjects during standing. In immersive virtual environment the users interact with virtual objects usually in standing conditions and thus the result in this paper encourages the development of BCI application in CAVE-like display system [10]. One of the problems in BCI applications has been in the posture of the user. The BCI has been used by the user sitting in a chair in order to avoid the additional artifacts from the muscles activities. This has made the user less motivated and unable to use the system for long time. In our studies the subjects did not report the less motivated in standing conditions. Note that the group of the author reported that in ambulatory context P300 evoked potential could be detected by using auditory stimuli in indoor [11] and even in outdoor environment [12].
5 Conclusions In this study, we proposed a portable BCI aiming to realize a novel interaction with VR objects during standing. The EEG was recorded during simulated walking conditions in indoor environment. The SSVEP was successfully detected by using computer generated visual stimuli. This result suggested that the EEG signals with portable BCI systems would provide a useful interface in performing VR interactions during standing in indoor environment such as CAVE-like system. Acknowledgment. This work is partly supported by the Telecommunication Advancement Foundation.
References 1. Wolpaw, J.R., Birbaumer, N., McFarland, D.J., Pfurtscheller, G., Vaughan, T.M.: Brain computer interfaces for communication and control. Clinical Neurophysiology 113(6), 767–791 (2002) 2. Bayliss, J.D.: The use of the evoked potentials P3 Component for Control in a virtual apartment. IEEE Transaction on Neural Syatems and Rehabilitation Engineering 11(2) (2003)
294
H. Touyama
3. Pfurtscheller, G., Leeb, R., Keinrath, C., Friedman, D., Neuper, C., Guger, C., Slater, M.: Walking from thought. Brain Research 1071, 145–152 (2006) 4. Fujisawa, J., Touyama, H., Hirose, M.: EEG-based navigation of immersing virtual environments using common spatial patterns. In: Proc. of IEEE Virtual Reality Conference (2008) (to appear) 5. Middendorf, M., McMillan, G., Calhoun, G., Jones, K.S.: Brain-Computer Interfaces Based on the Steady-State Visual-Evoked Response. IEEE Transactions on Rehabilitation Engineering 8(2), 211–214 (2000) 6. Cheng, M., Gao, X., Gao, S., Xu, D.: Design and Implementation of a Brain-Computer Interface With High Transfer Rates. IEEE Transactions on Biomedical Engineering 49(10), 1181–1186 (2002) 7. Trejo, L.J., Rosipal, R., Matthews, B.: Brain-computer interfaces for 1-D and 2-D cursor control: designs using volitional control of the EEG spectrum or steady-state visual evoked potentials. IEEE Trans. Neural. Syst. Rehabil. Eng. 14(2), 225–229 (2006) 8. Touyama, H., Hirose, M.: Steady-State VEPs in CAVE for Walking Around the Virtual World. In: Stephanidis, C. (ed.) UAHCI 2007 (Part II). LNCS, vol. 4555, pp. 715–717. Springer, Heidelberg (2007) 9. Jasper, H.H.: The ten-twenty electrode system of the international federation. Electroenceph. Clin. Neurophysiol. 10, 370–375 (1958) 10. Cruz-Neira, C., Sandin, D.J., DeFanti, T.A.: Surround-screen projection-based virtual reality: The design and implementation of the CAVE. In: Proc. ACM SIGGRAPH 1993, pp. 135–142 (1993) 11. Lotte, F., Fujisawa, J., Touyama, H., Ito, R., Hirose, M., Lécuyer, A.: Towards Ambulatory Brain-Computer Interfaces: A Pilot Study with P300 Signals. In: 5th Advances in Computer Entertainment Technology Conference (ACE), pp. 336–339 (2009) 12. Maeda, K., Touyama, H.: in preparation (2011)
Stereoscopic Vision Induced by Parallax Images on HMD and Its Influence on Visual Functions Satoshi Hasegawa1, Akira Hasegawa1,2, Masako Omori3, Hiromu Ishio2, Hiroki Takada4, and Masaru Miyao2 1
Nagoya Bunri University, 365 Maeda Inazawa Aichi Japan
[email protected] 2 Nagoya University, Furo-cho Chikusa-ku Nagoya Japan 3 Kobe Women's University, Suma Kobe Japan 4 University of Fukui, Bunkyo Fukui Japan
[email protected]
Abstract. Visual function of lens accommodation was measured while subjects used stereoscopic vision in a head mounted display (HMD). Eyesight with stereoscopic Landolt ring images displayed on HMD was also studied. In addition, the recognized size of virtual stereoscopic images was estimated using the HMD. Accommodation to virtual objects was seen when subjects viewed stereoscopic images of 3D computer graphics, but not when the images were displayed without appropriate binocular parallax. This suggests that stereoscopic moving images on HMD induced the visual accommodation. Accommodation should be adjusted to the position of virtual stereoscopic images induced by parallax. The difference in the distances of the focused display and stereoscopic image may cause visual load. However, an experiment showed that Landolt rings of almost the same size were distinguished regardless of virtual distance of 3D images if the parallax was not larger than the fusional upper limit. However, congruent figures that were simply shifted to cause parallax were seen to be larger as the distance to the virtual image became longer. The results of this study suggest that stereoscopic moving images on HMD induced the visual accommodation by expansion and contraction of the ciliary muscle, which was synchronized with convergence. Appropriate parallax of stereoscopic vision should not reduce the visibility of stereoscopic virtual objects. The recognized size of the stereoscopic images was influenced by the distance of the virtual image from display. Keywords: 3-D Vision, Lens Accommodation, Eyesight, Landolt ring, Size Constancy.
1 Introduction Stereoscopic vision (3D) technology using binocular parallax images has become popular, used for movies, television, camera, and mobile displays. 3D vision enables the display of real and exciting images with information of stereoscopic space. However, 3D viewing may cause asthenopia more often than watching 2D images or R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 297–305, 2011. © Springer-Verlag Berlin Heidelberg 2011
298
X. Zhang et al.
natural vision. Influences of 3D viewing on visual functions should be studied. It is necessary to understand the mechanisms of recognition and the effects of 3D vision to make safe and natural 3D images. Three examinations have been conducted in order to study the effects of 3D vision on visual functions. First we measured the recognized size of a figure displayed stereoscopically (Experiment 1). Same size figures could be perceived as if their sizes were different (Fig. 1) because of the size constancy. How are the sizes of stereoscopic figure recognized?
Fig. 1. Example of illusion by size constancy. The right sphere looks as if it is larger than the left one, although the sizes are the same.
Another experiment (Experiment 2) was to measure binocular visual acuity while viewing 3D Landolt rings (Fig. 2). The focus is not fixed on the surface of the display, but moving near and far synchronously with the movement of the 3D images being viewed, as we previously reported [1-5]. The accommodation agrees with a convergence fusion image that is different from the virtual display position [6]. Does the visibility of the stereoscopic image deteriorate from the lack of focus?
Fig. 2. Landolt ring and visual acuity measurement
In Experiment 3, lens accommodation was measured while watching 3D vision. Ordinary 3D (cross point camera image) and Power3DTM (Olympus visual
Development of Sizing Systems for Chinese Minors
299
communications Co., Ltd) were used for this experiment. A method to make natural 3D vision was suggested. Details of these Experiments are described below.
2 Methods 2.1 Method of Experiment 1: Recognized Size Estimation The HMD (Vuzix Corp. iWear AV230XL+, 320×240 pixel) displayed a 44 inch virtual screen at a viewing distance of approximately 300 cm (270 cm). The center circle of three circles was shifted horizontally without size change, to make parallax for 9 different 3D virtual distances of 100, 150, 200, 250, 300 (2D), 350, 400, 450 and 500 cm from the eye to the fusion image (Fig. 3a). Subjects viewed these images (Fig. 3b) and recorded recognized size with pencil on the paper sheet shown in Fig. 3c.
Plain 2D (300)*
(a)
(b)
Pop Toward (100)*
(c)
Pop Away (500)* *Fusion Distance (cm) Left Eye Right Eye (a) Examples of 3D images displayed on HMD
(b) Viewing 3D images on HMD
(c) Recognized size recording sheet
Fig. 3. Experiment 1: Recognized size estimation
2.2 Method of Experiment 2: Visual Acuity for 3D Landolt Ring An HMD (same apparatus in Exp. 1) was used. Still images of parallax with a sideby-side format were prepared for display of Landolt rings of 12 sizes (0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.9, 1.0, 1.2, 1.5 and 2.0) and visual acuity of 300 cm. Landolt rings
300
X. Zhang et al.
were shifted horizontally without size change, to make parallax for 9 different stereoscopic virtual 3D virtual distances of 100, 150, 200, 250, 300 (2D), 350, 400, 450 and 500 cm to the fusion image (Fig. 4). Subjects wearing the HMD adjusted the focus of both the left and right glasses first using the dial on the HMD while viewing 300 cm images that had no parallax. They then watched stereoscopic images of 9 distances without changing the focus dial positions. The smallest size of Landolt ring resolved by the subjects was recorded with a value of 0.2-2.0 (eyesight value shown in Fig. 2). Eighteen subjects (24.6 ± 7.9 years) with naked vision or wearing contact lens or glasses were studied only when they could view fusion stereoscopic images.
(a)
Plain 2D
(b)
Pop Toward
(c)
Pop Away
*Fusion Distance (cm) Fig. 4. Examples of 3D images used in Experiment 2: Visual Acuity for 3D Left Eye
Right
2.3 Method of Experiment 3: Accommodation Measurement The image used in Experiment 3 was a moving 3D-CG sphere displayed stereoscopically. The sphere moved virtually in a reciprocating motion toward and away from the observer with the cycle of 10 seconds (Fig. 5). 10 sec.
Far
Middle
Near
Middle
Far
Fig. 5. Moving 3D image used in Experiment 3
Moving images (Fig. 5) were prepared by four type of 2D (Fig. 6a), Pseudo 3D (Fig. 6b), Cross point 3D (Fig. 6c with Fig. 7a) and POWER3DTM (Fig. 6c with Fig. 7b). A modified version of an original apparatus [3] to measure lens accommodation was used in the experiment 3 (Fig. 8). Accommodation was measured for 40 seconds under natural viewing conditions with binocular vision while a 3D image (Fig. 5) moved virtually toward and away from the subject on an HMD (Fig. 8). For the accommodation measurements, the visual distance from the HMD to the subjects’
Development of Sizing Systems for Chinese Minors
301
Far
Middle
Near
Left eye Right eye (a) 2D (no parallax)
Left eye Right eye (b) Pseudo 3D (fixed parallax)
Left eye Right eye (c) 3D (stereoscopic)
Fig. 6. Three parallax modes used in Experiment 3
(2) (1)
Far
Near
Background Cross point
Far object Virtual screen Near object
(1)+(2) Multi-Screen
Multi-Camera
Virtual Camera (a) Cross Point 3D
(b) POWER3DTM
Fig. 7. Two 3D photography modes used in Experiment 3
eyes was 3 cm. The refractive index of the right lens was measured with an accommodo-refractometer (Nidek AR-1100) when the subjects gazed at the presented image via a small mirror with both eyes. The HMD (Vuzix Corp. iWear AV920, 640×480 pixel) was positioned so that it appeared in the upper portion of a dichroic mirror placed in front of the subject’s eyes (Fig. 8). The 3D image was observed
302
X. Zhang et al.
through the mirror. The stereoscopic image displayed in the HMD could be observed with natural binocular vision through reflection in the dichroic mirror, and refraction could be measured at the same time by transmitting infrared rays.
HMD
HMD
Eye Dichroic mirror
Dichroic
Accommodo-refractometer (Nidek AR-1100)
Subjects gazed with both eyes
Fig. 8. Lens accommodation measurement while watching 3D movie on HMD
The subjects were instructed to gaze at the center of the sphere with binocular eyes. All subjects viewed four types of images of 2D, Pseudo 3D, Cross point 3D and POWER 3DTM (Fig. 6, Fig. 7). While both eyes were gazing at the stereoscopic image, the lens accommodation of the right eye was measured and recorded.
3 Results 3.1
Result of Experiment 1: Recognized Size Estimation
Fig. 9 shows the result of the Experiment 1. Four subjects recorded the recognized size of 3D circle on the sheet shown in Fig. 3c. Ratio to the size of 2D (fusion distance: 300 cm) was plotted, except when subjects could not fuse 3D images. Additionally, the theoretical line mentioned below is shown in the same graph. Subject B
Subject C
Subject D
Theor y
Recognized size of the circle (%)
Subject A
Distance from eyes to fusion image of the circle ᧤cm
Fig. 9. Result of Experiment 1
Development of Sizing Systems for Chinese Minors
303
(a) Lower limit of recognition
(b) Rate of subjects possible to fuse and impossible to fuse Fig. 10. Result of Experiment 2
3.2 Result of Experiment 2: Visual Acuity for 3D Landolt Ring The result of Experiment 2 is shown in Fig. 10. Fig. 10a shows the smallest Landolt ring size expressed by the value for visual acuity from a 300 cm distance, averaged for 15 subjects (excluding 3 who could view fusion images for neither parallax). In this graph, ● shows the average of visual acuity points (eyesight value shown in Fig. 2) in which the value of the non-fusion cases was 0.0, and shows the average of only cases of successfully viewed fusion. The number and percentage of subjects who could and could not view fusion images are shown in Fig. 10b. There are a lot of subjects who exceed the fusional upper limit by 100 cm and 150 cm.
△
X. Zhang et al. Accommdation (Diopter) Diopter
304
5
J3D
P3D
G3D
2D
2D Pseudo 3D Cross Point
4.5
FP
4
FP
3.5
FP
3
FP
3
2.5
FP
2.5
FP
2
FP
2
FP
1.5
FP
1.5
J3D
3D
FP
P3D
G3D
2D
FP
1
P
1
P
0.5
P
0.5
P
0
0
0
5
10
14
19
24
29
34
38
Time(sec)
Time (sec )
(a) Accommodation of Subject E
0
5
10
14
Ͳ0.5
19
24
29
34
38
Time(sec)
(a)
Time (sec )
(b) Accommodation of Subject F
Fig. 11. Result of Experiment 3
The fusion limit field is different depending on the individual variation or the characteristic factor of HMD. However, Fig. 10a shows that almost the same size Landolt ring was distinguished regardless of the virtual distance of 3D images if the parallax was not larger than the fusional upper limit for each subject.
△
3.3 Result of Experiment 3: Accommodation Measurement The presented image was a 3D-CG sphere that moved in a reciprocating motion toward and away from the observer with the cycle of 10 sec. (Fig. 5). The subjects gazed at the sphere and accommodation was measured for 40 seconds (Fig. 8). The results for 2D, Pseudo 3D, CrossPoint3D and POWER3D (Fig. 6, Fig. 7) are shown in Fig. 11 for two subjects. Figure 11 (a) shows results for subject E (age: 24, male), and (b) for subject F (age: 39, female). The results showed that large amplitude of accommodation synchronizing with convergence is shown only in both 3D modes of Cross Point 3D and POWER3D but in neither 2D nor Pseudo 3D. Individual difference among subjects was large, however. POWER3D induced larger amplitude accommodations than Cross Point 3D in both subjects.
4 Discussion Stereoscopic images induced the illusion that pop away figures were recognized as being larger than the size on the display screen (Fig. 9), although the rate was saturated except in one subject of in Fig. 9. The theoretical line shown in Fig. 9 is the calculated size ratio according to the principle shown in Fig. 12. The saturation of the expansion of the pop away image size might be caused by ‘size consistency’ (Fig. 1). The reason why the size of the pop-toward image was recognized as being smaller and the pop-away as larger (Experiment 1, Fig. 9) may be that the images on the retina of subjects' eyes agreed with the screen images although the recognized distances were on the fusion point of the parallax images (Fig. 12). Hori and Miyao et. al [6] have reported the accommodation agreed with fusion distance while
▲
Development of Sizing Systems for Chinese Minors
305
watching 3D images. Some scholars have said the 3D images might be unfocused because the accommodations focused on the pop toward/away images though images were displayed on the screen. However, the result of Experiment 2 (Fig. 10) showed that the eyesight did not deteriorate regardless of the pop toward/away distances if subjects could successfully view fusion within the depth of field. Pop away image
Pop up image Eyes
Screen
Fig. 12. The principle of expansion and contraction of recognized 3D images
Accommodation was induced by the movement of the stereoscopic image in 3D mode on the HMD (Fig. 11), and the amplitude was larger in POWER3D than in ordinary Cross Point 3D. One of the reasons why some people feel artificiality in stereoscopic viewing was the fixed camera angles in photography (Fig. 7). POWER3D induced the larger accommodation without fusion failure, and may be more natural for 3D viewers than the conventional method of Cross Point 3D (Experiment 3, Fig. 11). Acknowledgement. This study is partly supported by Olympus Visual Communications Corporation (OVC), Japan. Parts of experiments shown in this paper were done with the help of Mr. H. Saito and Mr. H. Kishimoto, students of Nagoya Bunri University.
References 1. Hasegawa, S., Omori, M., Watanabe, T., Fujikake, K., Miyao, M.: Lens Accommodation to the Stereoscopic Vision on HMD. In: Shumaker, R. (ed.) VMR 2009. LNCS, vol. 5622, pp. 439–444. Springer, Heidelberg (2009) 2. Omori, M., Hasegawa, S., Watanabe, T., Fujikake, K., Miyao, M.: Comparison of measurement of accommodation between LCD and CRT at the stereoscopic vision gaze. In: Shumaker, R. (ed.) VMR 2009. LNCS, vol. 5622, pp. 90–96. Springer, Heidelberg (2009) 3. Miyao, M., Otake, Y., Ishihara, S.: A newly developed device to measure objective amplitude of accommodation and papillary response in both binocular and natural viewing conditions. Jpn. J. Ind. Health 34, 148–149 (1992) 4. Miyao, M., Ishihara, S., Saito, S., Kondo, T., Sakakibara, H., Toyoshima, H.: Visual accommodation and subject performance during a stereographic object task using liquid crystal shutters. Ergonomics 39(11), 1294–1309 (1996) 5. Omori, M., Hasegawa, S., Ishigaki, H., Watanabe, T., Miyao, M., Tahara, H.: Accommodative load for stereoscopic displays. In: Proc. SPIE, vol. 5664, p. 64 (2005) 6. Hori, H., Shiomi, T., Kanda, T., Hasegawa, A., Ishio, H., Matsuura, Y., Omori, M., Takada, H., Hasegawa, H., Miyao, M.: Comparison of accommodation and convergence by simultaneous measurements during 2D and 3D vision gaze. In: HCII 2011, in this Proc. (2011)
Comparison of Accommodation and Convergence by Simultaneous Measurements during 2D and 3D Vision Gaze Hiroki Hori1, Tomoki Shiomi1, Tetsuya Kanda1, Akira Hasegawa1, Hiromu Ishio1, Yasuyuki Matsuura1, Masako Omori2, Hiroki Takada3, Satoshi Hasegawa4, and Masaru Miyao1 1 Nagoya University Japan Kobe Women's University Japan 3 Fukui University Japan and 4 Nagoya Bunri University Japan
[email protected] 2
Abstract. Accommodation and convergence were measured simultaneously while subjects viewed 2D and 3D images. The aim was to compare fixation distances between accommodation and convergence in young subjects while they viewed 2D and 3D images. Measurements were made using an original machine that combined WAM-5500 and EMR-9, and 2D and 3D images were presented using a liquid crystal shutter system. Results suggested that subjects’ accommodation and convergence were found to change the diopter value periodically when viewing 3D images. The mean values of accommodation and convergence among the 6 subjects were almost equal when viewing 2D and 3D images respectively. These findings suggest that the ocular functions when viewing 3D images are very similar to those during natural viewing. When subjects are young, accommodative power while viewing 3D images is similar to the distance of convergence, and the two values of focusing distance are synchronized with each other. Keywords: Stereoscopic Vision, Simultaneous Measurement, Accommodation and Convergence, Visual Fatigue.
1 Introduction Recently, stereoscopic vision technology has been developing. Today, stereoscopic vision is not only used in movie theaters. Each home appliance maker has started to sell 3D TVs and 3D cameras. Lately, mobile devices, such as cellular phones and portable video game machines, have been converted to 3D, and the general public has also started to become very comfortable with stereoscopic vision. Various stereoscopic display methods are proposed. The most general method is to present images with binocular disparity. For 3D TVs, the following two methods are mainly used. One method is polarized display systems that present two different images with binocular disparity to the right and left eyes using polarized filters. The R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 306–314, 2011. © Springer-Verlag Berlin Heidelberg 2011
Comparison of Accommodation and Convergence by Simultaneous Measurements
307
other method is frame sequential systems that present two different images with binocular disparity by time-sharing using liquid crystal shutters. Special 3D glasses are needed to watch 3D images with these two methods. In contrast, the following two methods are mainly used for mobile devices such as cellular phones. One method is parallax barrier systems that separate two different images to present to the right and left eyes with parallax barrier on the display. The other is lenticular systems that separate two different images and present them to the left and right eyes using a hogbacked lens called a lenticular lens. There is less need for special 3D glasses using a polarized display system and frame sequential system to watch 3D images with these two methods. Another method is HMD (Head Mounted Display) systems that separate and present images with glasses. However, despite the progress in this stereoscopic vision technology, the effects on the human body from continuously watching 3D images (such as visual fatigue and motion sickness) have not been elucidated. Lens accommodation (Fig. 1) and binocular convergence (Fig. 2) may provide clues for understanding the causes of various symptoms. It is generally explained to the public that, "During stereoscopic vision, accommodation and convergence are mismatched and this is the main reason for the visual fatigue caused by stereoscopic vision" [1-4]. During natural vision, lens accommodation is consistent with convergence. During stereoscopic vision, while accommodation is fixed on the display that shows the 3D image, convergence of left and right eyes crosses at the location of the stereoimage. According to the findings presented in our previous reports, however, such explanations are mistaken [5-7]. However, our research has not been recognized in the world. This may be because the experimental evidence obtained in our previous studies, where we did not measure accommodation and convergence simultaneously, was not strong enough to convince people. We therefore developed a new device that can simultaneously measure accommodation and convergence. In this paper, we report experimental results obtained using this device, in order to compare the fixation distances during viewing of 2D and 3D images.
Fig. 1. Lens Accommodation
308
H. Hori et al.
Fig. 2. Convergence
2 Method The subjects in this study were 6 healthy young students in their twenties (2 with uncorrected vision, 4 who used soft contact lenses). The aim was to compare fixation distances between accommodation and convergence in young subjects while they viewed 2D and 3D images. We obtained informed consent from all subjects, and approval for the study from Ethical Review Board in the Graduate School of Information Science at the Nagoya University. The details of experimental setup were as follows: We set an LCD monitor 1 m in front of subjects, and presented 2D or 3D images where a spherical object moved forward and back with a cycle of 10 seconds (Fig. 3). In theory, the spherical object appears as a 3D image at 1 m (i.e., the location of LCD monitor) and moves toward the subjects to a distance of 0.35 m in front of them. We asked them to gaze at the center of the spherical object for 40 seconds, and measured their lens accommodation and convergence distance during that time. The 3D and 2D images were presented using a liquid crystal shutter system. Measurements were made three times each. For the measurements, we made an original machine by combining WAM-5500 and EMR-9.
Fig. 3. Spherical Object Movies (Power 3D™ : Olympus Visual Communications, Corp.)
WAM-5500 is an auto refractometer (Grand Seiko Co., Ltd.) that can measure accommodative power with both eyes opened under natural conditions (Fig. 4). It
Comparison of Accommodation and Convergence by Simultaneous Measurements
309
enables continuous recording at a rate of 5 Hz for reliable and accurate measurement of accommodation. WAM-5500 has two measurement modes. One is a static mode, and the other is a dynamic mode. We used the dynamic mode. The instrument was connected to a PC running the WCS-1 software via an RS-232 cable with the WAM5500 set to Hi-Speed (continuous recording) mode. During dynamic data collection, we simply depress the WAM-5500 joystick button once to start recording, and once to stop at the end of the desired time frame. EMR-9 is an eye mark recorder (NAC Image Technology Inc.) that can measure convergence distance (Fig. 5) using the pupillary/corneal reflex method. The specifications are resolution of eye movement of 0.1 degrees, measurement range 40 degrees and measurement rate 60 Hz. Small optical devices of 10 mm width and 30 mm long for irradiation and measurement of infrared are supported by a bar attached to a cap mounted on the face of subject.
Fig. 4 - 5. Auto Refractometer WAM-5500 (Fig.4: left) and Eye Mark Recorder EMR-9 (Fig.5: right)
We used a liquid crystal shutter system combined with the respective binocular vision systems for 2D and 3D (Fig. 6). The experimental environment is shown in Fig. 7 and Table 1. Here, we note that brightness (cd/m2) is a value measured through the liquid crystal shutter, and that the size of the spherical objects (deg) is not equal because the binocular vision systems for 2D and 3D have different display sizes. The images we used in the experiment are from Power 3D™ (Olympus Visual Communications, Corp.). This is an image creation technique to combine near and far views in a virtual space. It has multiple sets of virtual displays, the position of which can be adjusted. When subjects view a close target (crossed view), far view cannot be fused. When they see a far view, the close target (crossed view) is split and two targets are seen. Therefore, Power 3D presents an image that is extremely similar to natural vision.
310
H. Hori et al.
Fig. 6. Uniting WAM-5500 (Fig.4) and EMR-9 (Fig.5)
Fig. 7. Experimental Environment Table 1. Experimental Environment Brightness of Spherical Object (cd/m2) Illuminance (lx) Size of Spherical Object (deg)
Far Near Far Near Far Near
3.6 3 126 0.2 7.7
Comparison of Accommodation and Convergence by Simultaneous Measurements
311
3 Results The measurements for the 6 subjects showed roughly similar results. For 3D vision, results for Subjects A and B are shown in Fig. 8 and Fig. 9 as examples. When Subject A (23 years old, male, soft contact lenses) viewed the 3D image (Fig. 8), accommodation changed between about 1.0 Diopter (100 cm) and 2.5 Diopters (40 cm), while convergence changed between about 1.0 Diopter (100 cm) and 2.7 Diopters (37 cm). The changes in the respective diopter values have almost the same amplitude and are in phase, fluctuating synchronously with a cycle of 10 seconds corresponding to that of the 3D image movement. Similarly, when Subject B (29 years old, male, soft contact lenses) viewed the 3D image (Fig. 9), both accommodation and convergence changed in almost the same way between about 0.8 Diopters (125 cm) and 2.0 Diopters (50 cm). The changes in the respective diopter values have almost the same amplitude and are in phase, fluctuating synchronously with a cycle of 10 seconds corresponding to that of the 3D image movement. For 2D vision, the results for Subject A are shown in Figs. 10 as an example. As stated above (Fig. 8), when he viewed the 3D image, his accommodation and convergence changed between about 1.0 Diopter (100 cm) and 2.5 Diopters (40 cm). They had almost the same amplitude and were in phase, fluctuating synchronously with a cycle of 10 seconds corresponding to that of the 3D image movement. In contrast, when viewing the 2D image (Fig. 10), the diopter values for both accommodation and convergence were almost constant at around 1 Diopter (1 m).
Fig. 8. Subject A (3D image)
312
H. Hori et al.
Fig. 9. Subject B (3D image)
Fig. 10. Subject A (2D image)
Finally, Table 2 shows mean values of accommodation and convergence in the 6 subjects when they viewed 2D and 3D images. The mean values of accommodation and convergence for the 6 subjects when they viewed the 2D image were both 0.96 Diopters. The difference was negligible. When viewing the 3D image, the values of accommodation and convergence were 1.29 Diopters and 1.32 Diopters, respectively.
Comparison of Accommodation and Convergence by Simultaneous Measurements
313
The difference was about 0.03 Diopters, which is also negligible. Therefore, we can say that there is not much quantitative difference in the results between accommodation and convergence when viewing either the 2D or 3D images. In this experiment, there were also a few subjects who could recognize the stereoscopic view but complained that it was not easy to see with stereoscopic vision at the point where the 3D image was closest. Table 2. Mean value of accommodation and convergence
2D 3D
Accommodation
Convergence
Difference
0.96 D (104.2 cm) 1.29 D (77.5 cm)
0.96 D (104.2 cm) 1.32 D (75.8 cm)
0D (0 cm) 0.03 D (1.7 cm)
4 Discussion In this experiment, we simultaneously measured accommodation and convergence while subjects viewed 2D and 3D images for comparison, since it is said that accommodation and convergence are mismatched during stereoscopic vision. Wann et al. (1995) said that within a VR system the eyes must maintain accommodation on the fixed LCD screens, despite the presence of disparity cues that necessitate convergence eye movements in the virtual scene [1]. Moreover, HONG et al. (2010) said that the natural coupling of eye accommodation and convergence in viewing a real-world scene is broken in stereoscopic displays [4]. From the results in Fig. 8 and Fig. 9, we see that when young subjects are viewing 3D images, accommodative power is consistent with the distance of convergence with the liquid crystal shutter systems, and that the values of focusing distances are synchronized with each other. In addition, the results in Figs. 10 and Table 2 suggest that the ocular functions when viewing 3D images are very close to those during natural viewing. In general, it is said that there is a slight difference between accommodation and convergence even during natural viewing, with accommodation focused on a position slightly farther than that of real objects and convergence focused on the position of the real objects. This is said that to originate in the fact that the index is seen even if focus is not accurate because of the depth of field [8]. In our 3D vision experiments, the mean values of accommodation and convergence were found to be 1.29 Diopters and 1.32 Diopters, respectively. This means that accommodation focuses on a position slightly farther than that of convergence by about 0.03 Diopters. Hence, our findings suggest that eye movement when viewing 3D images is similar to that during natural viewing. In the light of the above, the conventional theory stating that within a VR system our eyes must maintain accommodation on the fixed LCD screen may need to be corrected. We can also say that the kind of results presented herein could be obtained because the 3D images used in the experiments were produced not by conventional means but with Power 3D, whose images are extremely close to natural viewing. Therefore, we consider that as long as 3D images are made using a proper method,
314
H. Hori et al.
accommodation and convergence should almost always coincide, even for an image that projects out significantly, and that we can view such images more easily and naturally. Conventional 3D and the Power 3D on HMD have been compared in experiments using Power 3D [6-7]. These previous works also found that Power 3D is superior to conventional 3D.
5 Conclusion In this experimental investigation, we simultaneously measured accommodation and convergence while subjects viewed 2D and 3D images for comparison. The results suggest that the difference in eye movement for accommodation and convergence is equally small when viewing 2D and 3D images. This suggests that the difference between accommodation and convergence is probably not the main reason for visual fatigue, motion sickness, and other problems. The number of subjects in this experiment was only 6, which may be still too small for our findings to be completely convincing. In the near future, we would like to repeat this study with a larger number of subjects. We would also like to simultaneously measure and compare both accommodation and convergence in subjects viewing real objects (natural vision) and 3D images (stereoscopic vision) of those objects made with 3D cameras.
References 1. Wann, J.P., Rushton, S., Mon-Williams, M.: Natural Problems for Stereoscopic Depth Perception in Virtual Environments. Vision Res. 35(19), 2731–2736 (1995) 2. Simon, J.W., Kurt, A., Marc, O.E., Martin, S.B.: Focus Cues Affect Perceived Depth. Journal of Vision 5, 834–862 (2005) 3. David, M.H., Ahna, R.G., Kurt, A., Martin, S.B.: Vergence-accommodation Conflicts Hinder Visual Performance and Cause Visual Fatigue. Journal of Vision 8(33), 1–30 (2008) 4. Hong, H., Sheng, L.: Correct Focus Cues in Stereoscopic Displays Improve 3D Depth Perception. SPIE, Newsroom (2010) 5. Miyao, M., Ishihara, S., Saito, S., Kondo, T., Sakakibara, H., Toyoshima, H.: Visual Accommodation and Subject Performance during a Stereographic Object Task Using Liquid Crystal Shutters. Ergonomics 39(11), 1294–1309 (1996) 6. Miyao, M., Hasegawa, S., Omori, M., Takada, H., Fujikake, K., Watanabe, T., Ichikawa, T.: Lens Accommodation in Response to 3D Images on an HMD. In: IWUVR (2009) 7. Hasegawa, S., Omori, M., Watanabe, T., Fujikake, K., Miyao, M.: Lens Accommodation to the Stereoscopic Vision on HMD. In: Shumaker, R. (ed.) VMR 2009. LNCS, vol. 5622, pp. 439–444. Springer, Heidelberg (2009) 8. Miyao, M., Otake, Y., Ishihara, S., Kashiwamata, M., Kondo, T., Sakakibara, H., Yamada, S.: An Experimental Study on the Objective Measurement of Accommodation Amplitude under Binocular and Natural Viewing Conditions. Tohoku, Exp., Med. 170, 93–102 (1993)
Tracking the UFO’s Paths: Using Eye-Tracking for the Evaluation of Serious Games Michael D. Kickmeier-Rust, Eva Hillemann, and Dietrich Albert Graz University of Technology, Brueckenkopfgasse 1/6, 8020 Graz, Austria {michael.kickmeier-rust,eva.hillemann,dietrich.albert}@tugraz.at
Abstract. Computer games are undoubtedly an enormously successful genre. Over the past years, a continuously growing community of researchers and practitioners made the idea of using the potential of computer games for serious, primarily educational purposes equally popular. However, the present hype over serious games is not reflected in sound evidence for the effectiveness and efficiency of such games and also indicators for the quality of learner-game interaction is lacking. In this paper we look into those questions, investigating a geography learning game prototype. A strong focus of the investigation was on relating the assessed variables with gaze data, in particular gaze paths and interaction strategies in specific game situations. The results show that there a distinct gender differences in the interaction style with different game elements, depending on the demands on spatial abilities (navigating in the threedimensional spaces versus controlling rather two-dimensional features of the game) as well as distinct differences between high and low performers. Keywords: Game-based learning, serious games, learning performance, eye tracking.
1 Introduction Over the past years, digital educational games (DEG) got in the focus of educational research and development. Several commercial online platforms are distributing (semi-) educational mini games (e.g., www.funbrain.com or www.primarygames.com) for the young children, Nintendo DS’ “Dr Kawashima's Brain Training: How Old Is Your Brain?” is a best seller, and a growing number of companies concentrate on educational simulation and game software. Today’s spectrum of learning games is broad, ranging from so-called using off-the-shelf games (COTS) in educational setting to specifically designed curriculum-related learning games. A classification on the basis of the psycho-pedagogical and technical level of games is proposed by [1, 2, 3]. As rich as the number of games is, is also the number of initiatives and projects in this area. Historically, among the founders of the recent hype over game-based learning are doubtlessly Mark Prensky and Jim Gee. Mark Prensky published in 2001 [4] his ground breaking book “Digital Game-based Learning”. His idea of gamebased learning focuses on the concept of digital natives. He argues that the R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 315–324, 2011. © Springer-Verlag Berlin Heidelberg 2011
316
M.D. Kickmeier-Rust, E. Hillemann, and D. Albert
omnipresence of “twitch speed” media such as MTV and computer games have emphasized specific cognitive aspects and de-emphasized others which, in turn has changed the educational demands of this generation. Jim Gee focused on learning principles in video games and how these principles can be applied to K-12 education [5, 6]. His thematically origin is the idea that (well-designed) computer games are very good at challenging the players, at preserving them, and at teaching how to play. On this basis he identified several principles for successful (learning) game design (e.g., learners must be enabled to be active agents or producers and not just passive recipients or consumers; Gee, 2005, p. 6, [7]). Another pioneer was for example David Shaffer in the context of applying regular entertainment computer games (COTS) in education [8]. In the USA a strong focus of research and development has a military background (this strong background is for example mirrored by governmental institutions such the Department of Defence Game Developers’ Community; www.dodgamecommunity.com), resulting in famous games such America’s Army. In Europe widely a more civil approach is pursued, strongly driven by the European Commission. Leading-edge projects are, for example, ELEKTRA (www.elektra-project.org), 80Days (www.eightdays.eu), TARGET (www.reachyourtarget.org), mGBL (www.mg-bl.com), Engagelearning (www.engagelearning.eu), LUDUS (www.ludus-project.eu), or the special interest group SIG-GLUE (www.sig-glue.net). A noteworthy initiative comes from the Network of Excellence GALA (Games and Learning Alliance; www.galanoe.eu), an alliance were Europe’s most import players in the serious games sector attempt to streamline the fragmented field and to increase scientific and economic impact. The recent hype over game-based learning is based on the natural, almost selfevident, link between pedagogical and didactic guidelines and theories and the characteristics of modern computer games [9]. Just as one example, the rich virtual environments of immersive games enable very naturally a meaningful and plausible context for learning and therefore support deeper learning processes. Evidently, computer games have the potential to make knowledge and skills a desirable and valuable asset and they have the potential to make learning a meaningful and important task. By this means, there exists the justifiable hope that also those learners can be reached who are not necessarily keen on learning and who may be reached with other educational measures. In this sense, serious games can be way more than just “chocolate covered broccoli” [10]. Recent research even argues that the immersion and gaming experience impacts the neurotransmitter systems and thus alternating cognitive functions and learning capacity [11]. Despite the great many of advantages of serious games and despite the large amount of related research and development activities, as outlined above, educational computer games, specifically what we termed competitive games, have not become a serious business case so far. This impression coincides with the view of many researchers in the field of game-based learning, who are arguing that t hose games most often are still in their infancy from a scientific and pedagogical perspective (e.g., [12], [13]). Major challenges for research, design, and development are seen, for example, in finding an appropriate balance between gaming and learning activities [3] or finding an appropriate balance between challenges through the game and abilities of the learner (e.g., [14]). One of the most important challenges for research concerns
Tracking the UFO’s Paths: Using Eye-Tracking for the Evaluation of Serious Games
317
the core strength of games, which can be summarized with their enormous intrinsic motivational potential. On the one hand, maintaining a high level of motivation requires an intelligent and continuous real-time adaptation of the game to the individual learner, for example, a continuous balancing of challenge and ability and of problems and learning progress. Essentially, this corresponds to the concept of flow – a highly immersed experience when a person is engaged in a mental and/or physical activity to a level where this person loses track of time and the outside world and when performance in this activity is optimal [15].
2 Evaluating Serious Games An ultimate challenge in the context of serious games, however, is a scientifically sound formative and summative assessment and evaluation. On the one hand, the gaming aspect must be addressed on the other hand the learning aspects. Both, unfortunately, must be considered more than the sum of its components and, more importantly, raise in part contracting demands on the metrics of “good” and “educational” games. When considering the criteria of “good” software in general and conventional learning/teaching software in particular, these are heavily dominated by the idea of performance with the software (in terms of effectivity and efficiency); the ISO standards, for example, are promoting this kind of view. Accordingly, also the metrics and heuristics focus on performance aspects (e.g., error prevention, minimization of task time, minimization of cognitive load, and so on; cf. [16]). In contrast, criteria of “good” games focus on aspects of fun, immersion, pleasure, or entertainment. Heuristics concern, for example, the visual quality, the quality of the story, fairness, curiosity, and also a certain level of challenge (in the sense of task load or cognitive load); Atari founder Nolan Bushnell states in a famous quote, “a good game is easy to learn but hard to master”. Evaluating serious games must consider both, performance aspects as well as recreational aspects. On the one hand, such game is supposed to accomplish a specific goal, that is, teaching a defined set of knowledge/skills/competencies. This goes along with context conditions such as justifiability of development and usage costs, comparability with conventional learning material in terms of effectivity and efficiency, or societal concerns. On the other hand, the great advantages of the medium game are only existent if the game character is in the foreground, that is, being immersion, having fun, experiencing some sort of flow. In the past, several approaches to measure serious games were published. De Freitas and Oliver [17] proposed an evaluation framework which focuses on (i) the application context of a serious game, (ii) learner characteristics, (iii) didactical/pedagogical aspects, and (iv) the concept of “diegesis”, the extent and quality of the game story’s world. Besides such general frameworks, also more specific scales were introduced. An example is the eGameFlow approach [12], which attempts to measure flow experience in educational games by criteria such as challenge, amount of concentration, level of feedback, and so forth. A more recent and complete approach to gauge “good” serious games comes from [18]. The EVADEG framework concentrates on the aspects (i) learning performance, (ii) gaming experience, (iii) game usability, and (iv) the evaluation of adaptive features.
318
M.D. Kickmeier-Rust, E. Hillemann, and D. Albert
The latter is a highly important yet oftentimes neglect factor. In the tradition of adaptive, intelligent tutoring systems, some modern serious games adapt autonomously to the needs, preferences, abilities, goals, and the individual progress of the players [14]. As emphasized by Weibelzahl [19], a scientific correct evaluation of adaptive systems is difficult because, in essence, personalization and adaptivity means that each user/player potentially receive different information, in a different ways, in a different sequence, and in a different format. The introduced frameworks and approaches offer a valuable basis for evaluating serious games. A key problem, however, is how to measure all those aspects in a possibly unobtrusive, reliable, valid, and methodologically correct way. In the present paper, we took up the ideas of EVADEG and introduce eye tracking as means of studying (i) the usability of a game, (ii) the extent of learner satisfaction, and most importantly (iii) the learning efficacy. More concretely, we utilized the considerations and methods to evaluate a learning game prototype which was developed in the course of a European research project (80Days). 2.1 Eye Tracking Observing eye movements has a long tradition in psychology in general and the field of HCI/usability in particular. El-Nasr and Yan [20] describe how perceptive (i.e., bottom up) and cognitive (i.e., top-down) processes orchestrate in the context of 3D videogames. Accordingly, while saliency of objects can grab players’ attention, goalorientation (top-down) in games is more effective for attracting attention. The big challenge for the authors was to develop a new methodology to analyze eye tracking data in a complex 3D environment, which differed considerably from the stimuli used in eye-tracking experiments conducted until then [21]. Basic variables in eye tracking studies are fixations (processing of attended information with stationary eyes) and saccades (quick eye movements occurring between fixations without information processing) [22]. The sequence of fixations establish scan paths through a visual field. Although such eye tracking measures are commonly used, their interpretations remain malleable. To give an example, an important indicator for the depth of processing is fixation duration; the longer a stimulus is attended, the higher is cognitive load and the deeper is the processing]. In immersive games such relationship my not be quite as stable and valid as the might be for other visual processing tasks. In the work of Jennett [23], for example, who investigated the immersion in a game using eye tracking a decrease of fixations per second in the immersive condition was found as compared to an increase in a nonimmersive control condition. Jennett argued that in an immersive game the attention of the players becomes more focused on game-related visual components and therefore less “vulnerable” to distracting stimuli.
3 An Eye Tracking Study on Learning and Gaming 3.1 The 80Days Game Prototype The investigated game prototype was developed in the context of the European 80Days project (www.eighytdays.eu). The game is teaching geography for a target
Tracking the UFO’s Paths: Using Eye-Tracking for the Evaluation of Serious Games
319
audience of 12 to 14 year olds and follows European curricula in geography. In concrete terms, an adventure game was realized within which the learner takes the role of an Earth kid at the age of 14. The game starts when a UFO is landing in the backyard and an alien named Feon is contacting the player. Feon is an alien scout who has to collect information about Earth. The player wants to have fun by flying a UFO and in the story pretends to be an expert in the planet earth. He or she assists the alien to explore the planet and to create a report about the Earth and its geographical features. This is accomplished by the player by means of flying to different destinations on Earth, exploring them, and collecting and acquiring geographical knowledge. The goal is to send the Earth report as a sort of travelogue about Earth to Feon’s mother ship. In the course of the game, the player discloses the aliens’ real intentions – preparing the conquest of the earth – and reveals the “real” goal of the game: The player has to save the planet and the only way to do it is to draw the right conclusion from the traitorous Earth report. Therefore the game play has got two main goals: (1) to help the alien to complete the geographical Earth report, and (2) to save the planet, which is revealed in the course of the story, when the player realizes the true intention of the alien. Figure 1 gives some impressions of the game. Details are given by [14].
Fig. 1. Screenshots of the 80Days demonstrator game; an action adventure – on the basis of an Alien story – to learn geography according to European curricula
3.2 Study Design The study presented in this paper is only one of a long sequence of experiments in several European countries, to evaluate to demonstrator game and to conduct in-depth research on the relationships and mechanisms in the context of using computer games for learning. Due to the vast complexity of this research battery, we must focus on a rather concise snapshot of this work only. The present results are based on data of 9
320
M.D. Kickmeier-Rust, E. Hillemann, and D. Albert
Austrian children, 4 girls and 5 boys. The participant’s age ranged between 11 and 16 years with the average of 13 years (SD = 1.61). 3.3 Material and Apparatus To record gaze information, the Tobii 1750 eye tracker was used, a device that works with infrared cameras and therefore enables a fully unobtrusive recording of gaze information (Figure 2a). For the pre and post assessments of knowledge, we utilized a paper-pencil knowledge test, and in addition motivational, usability-related, and attention-related scales were issued. For the analyses, we selected three individual scenes: Flying to Budapest, instructive cockpit scene in Budapest, terraforming simulation (cf. Figure 2b).
Fig. 2. Panel a (left) shows an image of the eye tracking set up. Panel b (right) a screen shot of the game’s terraforming simulation. The colored recatangle indicate predefined areas of interest (AOI) for gaze data analyses.
4 Results 4.1 Learning Performance The average score of the knowledge test prior to the gaming session was 32.33 (SD=9.45) and that of the posttest 39.00 (SD =10.22). The difference is statistically significant (T=-3.814, df=8, p=0.005). For girls the average score was 26.25 (SD=11.09) and 31.25 (SD =10.63) respectively. There is no significant difference between pre and posttests. For boys the average score in the pretest was 37.20 (SD=4.44) and that for the posttest 45.20 (SD=4.02), which is a significant difference between these two test scores (T=-6.136, df=4, p=0.004). The results are illustrated in Figure 3. 4.2 Eye Movements For this investigation it was important to get information on how much time in the three different situations participants spend while playing the game and on which parts their eyes fixed. By this way the relative fixation numbers, the duration of the situation as well as the total duration, and the saccade lengths were analyzed.
Tracking the UFO’s Paths: Using Eye-Tracking for the Evaluation of Serious Games
321
Fig. 3. Results of the knowledge tests before and after playing the learning game
These results imply that females spend more time in playing the game. Females’ total duration is M=1018.86 with SD=102.58 in contrast to males’ total duration with M=868.80 with SD=280.72. Especially on the simulation situation females spend more time with M=913.20 (SD=142.28) than males with M=726.41 (SD=278.41). A key question of this investigation is whether good and low performers in terms of learning have distinct gaze patterns/scan paths. Participants who learned more spend more time in playing the game with M=940.10 with SD=23.52 in contrast to participants who learned less with M=781.79 with SD=344.46. These results were present in the different situations as well. For participants who learned more the duration of the flying situation is higher with M=100.09 (SD=79.33) than for the other group with M=59.71 (SD=31.75). The duration for the instruction for a high learning effectiveness is M=70.97 (SD=29.33) and M=45.53 (SD=20.28) for a low learning effectiveness. In the simulation situation the duration is M=769.04 (SD=77.21) for better learners and M=676.55 (SD=362.57) for persons who learned less. Regarding the fixation number/sec for the three different situations participants who learned more have smaller values than participants who learnt less. The total fixation number/sec for better learners is M=0.72 (SD=0.04), in the flying situation the fixation number/sec is M=2.08 (SD=0.41), in the Instruction situation is M=1.94 (SD=0.85), and in the Simulation situation is =2.22 (SD=0.34). For the other group the total fixation number/sec is M=0.77 (SD=0.12), the flying fixation number/sec is M=2.40 (SD=0.33), the instruction fixation number/sec is M=2.22 (SD=0.50), and the simulation fixation number/sec is M=4.48 (SD=4.14). Regarding the first situation participants who learnt more have a longer average fixation length with M=0.45 (SD=0.10) than persons who learnt less with M=0.40 (SD=0.04). Regarding the second situation better learner also have a longer average fixation length with M=0.57 (0.37) than the others who have an average fixation length with M=0.42 (SD=0,06). Regarding the third situation a longer average fixation length with M=0.45 (SD=0.06) is found for good learners. Participants who learnt less have an average fixation length with M=0.35 (SD=0.21) in the simulation situation. In the flying situation the saccade lengths are higher for participants who showed a higher learning
322
M.D. Kickmeier-Rust, E. Hillemann, and D. Albert
effectiveness with M=73.44 (SD=29.64) in contrast to the other group where M=53.05 (SD=27.13). In the second and third situation both groups have nearly the same average saccade lengths. Although the descriptive data shows some differences, MANOVA results showed no significant differences for gender and attention for duration of playing, fixation rate, and saccade lengths. Only learning effectiveness has a significant effect on the duration of the simulation situation with F(1)=186.652, p=0.047. Regarding the fixation lengths in the different situations especially in the simulation part significant differences could be found. On the one hand gender has a significant effect on the fixation length with F(1)=195.77, p=0.045. On the other hand the learning effectiveness has a significant effect with F(1)=372.982, p=0.033. 4.3 Areas of Interest Areas of Interest (AOI) are particular display elements which are predefined by the researcher. AOI analysis is used to quantify gazed data within a defined region of the visual stimulus. The number of fixations on such a particular display element should reflect the importance of that element. More important display elements will be fixated more frequently. When aiming at evaluating serious games, such information is crucially important since it provides very clear indications of which regions on the screen are attended (sufficiently) and, therefore, whether all instructional aspects are attended. The most distinct results of these analyses are that players with high learning performance have in total a larger number of fixations (Figure 4). By the prototypical example of AOI 6, which was the most often attended one, we can say that children who learnt more spend 38.67 % (SD=0.65) on it and children who are not so good learners spend 32.23% (SD=4.32) on it (Figure 4). Persons with a higher attention value spend 36.24% (SD=2.77) on AOI6 in contrast to persons with a lower attention who spend more time with 38.53% (SD=0.86) on it. Similarly, we found that high performers showed significantly longer saccades in the flying scene of the game. This is an indication that children who learnt well in general exhibited a more “calm” and smooth distribution of fixations on the screen.
Fig. 4. Average fixation duration and saccade length
Tracking the UFO’s Paths: Using Eye-Tracking for the Evaluation of Serious Games
323
5 Summary The empirical results indicate that the children can benefit from playing computer games for learning purposes. The most distinct finding we presented here is the fact that extreme groups such as high and low performers exhibit different visual patterns. While the good learners scan the visual field evenly with longer saccades and attend relevant areas on the screen more frequently and in a more stable fashion. The results of our investigation also show that there are distinct gender differences in the interaction style with different game elements, depending on the demands on spatial abilities (navigating in the three-dimensional spaces versus controlling rather two-dimensional features of the game) as well as distinct differences between high and low performers in terms of learning. In addition to the comparisons on the level of participants, on the basis of gaze density maps aggregated from the date in combination with qualitative interviews with the subjects, we identified design recommendations for further improvements of the game prototype in particular as well as games in general. Finally, our study showed that using eye tracking can be successfully applied to measure critical aspects with regard to the quality of serious games. Acknowledgements. The research and development introduced in this work is funded by the European Commission under the seventh framework programme in the ICT research priority, contract number 215918 (80Days, www.eightydays.eu).
References 1. Kickmeier-Rust, M.D.: Talking digital educational games. In: Kickmeier-Rust, M.D. (ed.) Proceedings of the 1st International Open Workshop on Intelligent Personalization and Adaptation in Digital Educational Games, Graz, Austria, October 14, pp. 55–66 (2009a) 2. de Freitas, S.: Learning in immersive worlds. A review of game-based learning (2006), http://www.jisc.ac.uk/media/documents/programmes/ elearning_innovation/gaming%20report_v3.3.pdf (retrieved August 28, 2007) 3. Van Eck, R.: Digital game-based learning. It’s not just the digital natives who are restless. Educause Review, 17–30 (March/April 2006) 4. Prensky, M.: Digital game-based learning. McGraw-Hill, New York (2001) 5. Gee, J.P.: What video games have to teach us about learning and literacy. Palgrave Macmillan, New York (2003) 6. Gee, J.P.: What video games have to teach us about learning and literacy (2nd revised and updated edn.). Palgrave Macmillan, New York (2008) 7. Gee, J.P.: Learning by design: Good video games as learning machines. E-Learning and Digital Media 2(1), 5–16 (2005); Gee, J. P.: What video games have to teach us about learning and literacy. Palgrave Macmillan, New York (2003) 8. Shaffer, D.W.: How computer games help children learn. Palgrave Macmillan, New York (2006) 9. Kickmeier-Rust, M.D., Mattheiss, E., Steiner, C.M., Albert, D.: A psycho- pedagogical framework for multi-adaptive educational games. To appear in International Journal on Game-Based Learning 1(1), 45–58 (in press)
324
M.D. Kickmeier-Rust, E. Hillemann, and D. Albert
10. Habgood, J.: Wii don’t do edutainment. In: Proceedings of Game-based Learning 2009, London, UK, March 19-20 (2009) 11. Demetriou, S.: Motivation in computer games: The impact of reward uncertainty on learning. In: Gómez Chova, L., Martí Belenguer, D., Candel Torres, I. (eds.) Proceedings of Edulearn 2010, Barcelona, Spain, July 5-7 (2010) 12. Fu, F.-L., Su, R.-C., Yu, S.-C.: EgameFlow: A scale to measure learners’ enjoyment of elearning games. Computers & Education 52(1), 101–112 (2009) 13. Oblinger, D.: Games and learning. Educause Quarterly Magazine 29(3), 5–7 (2006) 14. Kickmeier-Rust, M.D., Albert, D.: Micro adaptivity: Protecting immersion in didactically adaptive digital educational games. Journal of Computer Assisted Learning 26, 95–105 (2010) 15. Csikszentmihalyi, M.: Flow: The psychology of optimal experience. Harper and Row, New York (1990) 16. Nielsen, J.: Heuristic evaluation. In: Nielsen, J., Mack, R.L. (eds.) Usability Inspection Methods. John Wiley & Sons, New York (1994) 17. de Freitas, S., Oliver, M.: How can exploratory learning with games and simulations within the curriculum be most effectively evaluated? Computers and Education Special Issue on Gaming 46, 249–264 (2006) 18. Law, E.L.-C., Kickmeier-Rust, M.D., Albert, D., Holzinger, A.: Challenges in the development and evaluation of immersive digital educational games. In: Holzinger, A. (ed.) USAB 2008. LNCS, vol. 5298, pp. 19–30. Springer, Heidelberg (2008) 19. Weibelzahl, S., Lippitsch, S., Weber, G.: Advantages, opportunities, and limits of empirical evaluations: Evaluating adaptive systems. Künstliche Intelligenz 3(2), 17–20 (2002) 20. El-Nasr, M.S., Yan, S.: Visual Attention in 3D Video Games. In: Proceedings of ACE 2006. Hollywood, California (2006) 21. Law, E.L.-C., Kickmeier-Rust, M., Albert, D., Holzinger, A.: Challenges in the development and evaluation of immersive digital educational games. In: Holzinger, A. (ed.) USAB 2008. LNCS, vol. 5298, pp. 19–30. Springer, Heidelberg (2008) 22. Land, M.F.: Eye Movements and the Control of Actions in Everyday Life. Progress in Retinal and Eye Research 25, 296–324 (2006) 23. Jennett, C., Cox, A.L., Cairns, P., Dhoparee, S., Epps, A., Tijs, T., Walton, A.: Measuring and Defining the Experience of Immersion in Games. International Journal of Human Computer Studies 66(9), 641–661 (2008)
The Online Gait Measurement for Characteristic Gait Animation Synthesis Yasushi Makihara1, Mayu Okumura1, Yasushi Yagi1, and Shigeo Morishima2 1
The Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka 567-0047, Japan, 2 Department of Science and Engineering , Waseda University, 3-4-1 Okubo Shinjuku-ku, 1698555 Tokyo, Japan, {makihara,okumura,yagi}@am.sanken.osaka-u.ac.jp,
[email protected]
Abstract. This paper presents a method to measure online the gait features from the gait silhouette images and to synthesize characteristic gait animation for an audience-participant digital entertainment. First, both static and dynamic gait features are extracted from the silhouette images captured by an online gait measurement system. Then, key motion data for various gaits are captured and a new motion data is synthesized by blending key motion data. Finally, blend ratios of the key motion data are estimated to minimize gait feature errors between the blended model and the online measurement. In experiments, the effectiveness of gait feature extraction were confirmed by using 100 subjects from OU-ISIR Gait Database and characteristic gait animations were created based on the measured gait features.
1
Introduction
Recently, audience-participant digital entertainment has gained more attention, where the individual features of participants or users are reflected to computer games and Computer Graphic (CG) cinemas. In EXPO 2005 AICHI JAPAN[1],the Future Cast System (FCS)[2] in the Mitsui-Toshiba pavilion[3] has been performed as a one of large-scale audience-participant digital entertainments. The system captures an audience’s facial shape and texture online, and a CG character’s face in the digital cinema is replaced by the captured audience’ face. In addition, as an evolutional version of the FCS, Dive Into the Movie(DIM) Project[4] tries to reflect not only the individual features of face shape and texture but also those of voice, facial expression, facial skin, body type, and gait. Among these, body type and gait are expected to attract more attention of the audience because there are reflected to the whole body of the CG character. For the purpose of gait measurement, acceleration sensors[5][6][7][8] and Motion Capture (MoCap) systems[9] have been widely used. These systems are, however, unsuitable for an online gait measurement system because it takes much time for the audiences to wear the acceleration sensors or to attach Mo-Cap markers. R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 325–334, 2011. © Springer-Verlag Berlin Heidelberg 2011
326
Y. Makihara et al.
In a computer vision-based gait analysis area, both model-based approaches[10][11]and appearance-based approaches[12][13] have been proposed, which can measure gait feature without any wearable sensors or attached markers. In the model-based methods, a human body is expressed as articulated links or generic cylinders and it is fit to the captured image to obtain both static features like link length and dynamic features like joint angles separately. Although these features can be used for the gait feature measurement, the method is unsuitable for online measurement because of high computational cost and difficulties of model fitting. The appearance-based methods extract gait features directly from the captured images without troublesome model fitting. In general, extract features are composite of both static and dynamic components. Although the composite features are still useful for gait-based person identification[10][11][12][13], they are unsuitable for separate measurement of static and dynamic components. Therefore, we propose a method of online measurement of intuitive static and dynamic gait features from the silhouette sequences and also a method of characteristic gait animation synthesis in the digital cinema. Side-view and frontview cameras capture image sequences of target subject’s straight walk, and Gait Silhouette Volume (GSV) is constructed via silhouette extraction, silhouette size normalization, and registration. Then, the both static and dynamic components are measured separately from the GSV. Because the proposed method extracts the gait features directly from the silhouette sequence without model fitting, its computational cost get much lower than that of the model-based methods, which enables online measurement of the audience’s gait.
2 2.1
Gait Feature Measurement GSV Construction
The first step in gait feature measurement is the construction of a spatiotemporal Gait Silhouette Volume (GSV). First, gait silhouettes are extracted by background subtraction and a silhouette image is defined by a binary image whose pixel value is 1 if it is inside the silhouette and is 0 otherwise. Second, the height and center values of the silhouette region are computed for each frame. Third, the silhouette is scaled so that the height is a pre-determined size, whilst maintaining the aspect ratio. In this paper, the size is set to a height of hg = 60 pixels and a width of wg = 40 pixels. Fourth, each silhouette is registered such that its center corresponds to the image center. Finally, a spatio-temporal GSV is produced by stacking the silhouettes on the temporal axis. Let f(x, y, n) be a silhouette value of the GSV at position (x, y) of the nth frame. 2.2
Estimation of Key Gait Phases
The next step is estimation of two types of key gait phases: Single Support Phase (SSP) and Double Support Phase (DSP) where subject’s legs and arms are the closest and the most spraddle, respectively. The SSP and DSP are estimated as local
The Online Gait Measurement for Characteristic Gait Animation Synthesis
327
minimum and maximum of the second-order moment around the central vertical axis of the GSV in the half gait cycle. The second-order moment at n-th frame within the vertical range of [ht, hb] is defined as (1) where xc is horizontal center of the GSV. Because the SSP and DSP occur once per a half gait cycle alternately, those at i-th half gait cycle are obtained as follows: (2)
and are initialized to zero, and gait period gp is detected by where maximizing the normalized autocorrelation of the GSV for the temporal axis[14]. In the following section, NSSP and NDSP represent the number of the SSP and DSP respectively. Note that SSP and DSP focused on arms, legs, and entire body are computed by defining the vertical range [ht, hb] appropriately. In this paper, the ranges of arms and legs are defined as [0.33hg, 0.55hg] and [0.55hg, hg] respectively. Figure 2 shows the result of the estimation of SSP and DSP.
Fig. 1. Example of GSV
Fig. 2. SSP (left) and DSP (right)
2.3
Fig. 3. Measurement of the waist size
Measurement of the Static Feature
Static features contain the height, the waist size in width and bulge. When the static features are measured, GSV at SSP is used to reduce the influence of the arm swing and the leg bend. First, the number of the silhouette pixels at height y of the side and front GSV are defined as the width w(y) and the bulge b(y) respectively as shown in Fig. 3. Then, their averages WGSV and BGSV within the waist range of heights [yWt, yWb] and [yBt, yBb] are calculated respectively.
328
Y. Makihara et al.
(3)
(4)
(5) Then, given the height on the original size image Hp, the waist size in the width and the bulge on the image Wp and Bp are computed as (6) Finally, given the distance l from the camera to the subject and the focal length f of the camera, the real height H and the waist size in the width W and the bulge B are (7) For statistical analysis of the static features, 100 subjects in OU-ISIR Gait Database[15] were chosen at random and front- and left side-view cameras were used for measurement. Figure 4(a) shows the relation between the measured height and the questionnaire result range of the height. The measured heights almost lie within the questionnaire result range. The measured heights of some short subjects (children) are, however, out of lower bound of the questionnaire result. These errors may result from self-enumeration errors due to the rapid growth rate of the children.
Fig. 4. The measurements and questionnaire results
The Online Gait Measurement for Characteristic Gait Animation Synthesis
329
Figure 4(b) shows the relation between the measured waist size in bulge and the questionnaire result of the weight and it indicates that the waist size correlates with the weight to some extent. For example, the waist size of light subject A and heavy subject B are small and large as shown in Fig. 4(b), respectively. As an exceptional example, the waist size of light subject C becomes large by mistake because he/she wears a down jacket.
Fig. 5. Arm swing areas. In (a), front and back lines are depicted as red and blue lines, respectively. In (b), front and back arm swing areas are painted in red and blue, respectively.
2.4
Measurement of the Dynamic Feature
Step: Side-view silhouettes are used for step estimation. First, walking speed v is computed by distance between silhouette positions at the first and the last frame and elapsed time. Then, the averaged steps’ length are computed by multiplying the walking speed v and the half gait cycle gp=2. Arm swing: The side-view GSV from SSP to DSP is used for arm swing measurement. First, the body front and back boundary line (let them be lf and lb respectively) are extracted from a gait silhouette image at SSP, and then front and back arm swing candidate areas RAf and RAb are set respectively as shown in Fig. 5(a). Next, the silhouette sweep image Fi(x, y) are calculated for i-th interval from SSP to the next DSP as
(8)
(9)
where sign is sign function. Finally, the front and back arm swing areas are computed as areas of the swept pixels of Fi(x, y) in the RAf and RAb respectively. (10)
(11)
330
Y. Makihara et al.
For statistical analysis of the arm swing, the same 100 subjects as the static feature analysis are used. Figure 6 shows the result of the measured the front swing arm area. We can see that the measured arm swings are widely distributed, that is, they are useful cues for synthesizing characteristic gait animation. For example, arm swing for subject A (small), B (middle), and C (large) can be confirmed in the corresponding gait silhouette image at DSP on the graph. In addition, though the asymmetry of the arm swing is not so outstanding, some subjects have the asymmetry such as Subject D. Stoop: We propose two methods of stoop measurement: slope-based and curvaturebased methods. In the both methods, a side-view gait silhouette image at a SSP is used to reduce the influence of the arm swing, and a back contour is extracted from the image. Then, the slope of the back line is computed by fitting the line lb to the back contour, and also the curvature is obtained as maximum k-curvatures of the back contour (k is set to 8 empirically). For statistical analysis of the stoop, the same 100 subjects are used. Figure 7 shows the result of the measured the stoop. By measuring both the slope and the curvature of the back contour, various kinds of the stoops are measured such as large slope and small curvature (e.g., Subject A), large slope and large curvature (e.g., Subject B), and small slope and large curvature (e.g., Subject C).
Fig. 6. Distribution of front arm swing areas and its asymmetry
3 3.1
Fig. 7. Distribution of stoop with slope and curvature
Gait Animation Synthesis Motion Blending
Basically, a new gait animation is synthesized by blending a small number of motion data called key motions in the same way as [16]. First, a subject was asked to walk on a treadmill at a speed of 4 [km/h] and its 3D motion data was captured by using motion capture system “Vicon”[9]. The motion data is composed of n walking styles with variations in terms of step width, arm swing, stoop. Second, two steps of sequence are clipped from the whole sequence to produce key motions M = {mi}. Third, because all the motion data M needs to be synchronized before blending, synchronized motion S = {si} is generated by using time warping[16]. Finally, a blended motion is synthesized as
The Online Gait Measurement for Characteristic Gait Animation Synthesis
331
(12) where αi is a blend ratio for i-th key motion data and a set of blend ratios is denoted by α = {αi}. The remaining issue is how to estimate the blend ratios based on the appearance-based gait features measured in the proposed framework and it is described in the following section. 3.2
Blend Ratio Estimation
First, a texture-less CG image sequence for each key motion is rendered as shown in Fig. 8. Second, m dimensional appearance-based gait feature vi for i-th key motion data si is measured in the same way as described before. Then, let’s assume the gait feature vector of the blended motion data is approximated as a weighted linear sum of those of the key motions vi (13) Then, the blend ratio α is estimated so as to minimize errors between the gait features of the blended model and the online measured gait features v (call it an input vector later). The minimization problem is formulated as the following convex quadratic programming.
(14)
The above minimization problem is solved with the active set method. Moreover, when the number of the key motion data n is larger than the dimension of the gait features m, the solution to Eq. (14) is indeterminate. On the other hand, Eq. (13) is just the approximate expression because the mapping from the motion data domain to the appearance-based gait feature domain is generally nonlinear. Therefore, it is desirable to choose nearer gait features to the input feature. Thus, another cost function is defined as an inner product of the blend ratio and the cost weight vector w of the Euclidean distance from the features of each key motion data to the input feature is defined as w = [||v1-v||, ..., ||vn - v||]T. Given one of the solution to Eq. (14) as αCQP and the resultant blended model vα = V αCQP , the minimization problem can be written as a linear programming.
(15)
Finally, this linear programming is solved with the simplex method.
332
3.3
Y. Makihara et al.
Experimental Results
In this experiment, three gait features: arm swing, step, and stoop are reflected to the CG characters, and seven key motion data and three test subjects are used as shown in Table 1 respectively. Note that the gait features are z-normalized so as to adjust scales among them. The resultant blend ratios are shown in Tab. 2. For example, Subject A with large arm swing gains the largest blend ratio of the Arm Swing L and it results in the synthesized silhouette of the blended model with large arm swing (Fig. 8). The gait features of Subject B (large step and small arm swing) and C (large stoop and large arm swing) are also successfully reflected in both terms of the blend ratios and the synthesized silhouettes as shown in Tab. 2 and Fig. 8. Moreover, the gait animation synthesis in conjunction with texture mapping was realized in the audience-participant digital movie as shown in Fig. 9 and it plays a certain role in identifying the audience in the digital movie. Table 1. Gait features for key motions and inputs Key motion Arm swing L Arm swing S Step L Step S Stoop Recurvature Average Input A B C
Arm swing 3.50 -1.00 -0.34 -0.30 1.11 1.48 0.30 Arm swing 3.40 -0.84 1.83
Step
Stoop
0.51 -1.44 3.00 -2.30 -0.61 -0.68 -1.02 Step
-0.55 -0.63 -1.27 1.12 2.50 -3.00 -1.49 Stoop
0.34 1.13 0.52
0.23 -1.23 2.46
Fig. 8. The synthetic result
The Online Gait Measurement for Characteristic Gait Animation Synthesis
333
Table 2. The blending ratio of the key motion data Key motion/Input Arm swing L Arm swing S Step L Step S Stoop Recurvature Average
A
B
C
0.83 0 0 0 0.17 0 0
0 0.42 0.58 0 0 0 0
0.16 0 0.04 0 0.80 0 0
Fig. 9. Screen shot in digital movie
4
Conclusion
This paper presents a method to measure online the static and dynamic gait features separately from the gait silhouette images. The sufficient distribution of the gait features were observed in the statistical analysis with large-scale gait database. Moreover a method of characteristic gait animation synthesis was proposed with the blend ratio estimation of the key motion data. The experimental results show the gait feature like arm swing, step, and stoop are effectively reflected to the synthesized blended model. Acknowledgement. This work is supported by the Special Coordination Funds for Promoting Science and Technology of Ministry of Education, Culture, Sports, Science and Technology.
References 1. EXPO 2005 AICHI JAPAN, http://www.expo2005.or.jp/en/ 2. Morishima, S., Maejima, A., Wemler, S., Machida, T., Takebayashi, M.: Future cast system. In: ACM SIGGRAPH 2005 Sketches, SIGGRAPH 2005. ACM, New York (2005)
334
Y. Makihara et al.
3. MITSUI-TOSHIBA Pavilion, http://www.expo2005.or.jp/en/venue/pavilionprivateg.html 4. Morishima, S.: Yasushi Yagi, S.N.: Instant movie casting with personality: Dive into the movie system. In: Proc. of Invited Workshop on Vision Based Human Modeling and Synthesis in Motion and Expression, Xian, China, pp. 1–10 (September 2009) 5. Gafurov, D., Helkala, K., Sondrol, T.: Biometric gait authentication using accelerometer sensor. Journal of Computer 1(7), 51–59 (2006) 6. Gafurov, D., Snekkenes, E., Bours, P.: Improved gait recognition performance using cycle matching. In: 2010 IEEE 24th Int. Conf. on Advanced Information Networking and Applications Workshops (WAINA), pp. 836–841 (2010) 7. Rong, L., Jianzhong, Z., Ming, L., Xiangfeng, H.: A wearable acceleration sensor system for gait recognition. In: 2nd IEEE Conf. on Industrial Electronics and Applications, pp. 2654–2659 (2007) 8. Rong, L., Zhiguo, D., Jianzhong, Z., Ming, L.: Identification of individual walking patterns using gait acceleration. In: The 1st Int. Conf. on Bioinformatics and Biomedical Engineering, pp. 543–546 (2007) 9. Motion Capture Systems Vicon, http://www.crescentvideo.co.jp/vicon/d 10. Yam, C., Nixon, M., Carter, J.: Extended model based automatic gait recognition of walking and running. In: Proc. of the 3rd Int. Conf. on Audio and Video-based Person Authentication, Halmstad, Sweden, pp. 278–283 (June 2001) 11. Cuntoor, N., Kale, A., Chellappa, R.: Combining multiple evidences for gait recognition. In: Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 3, pp. 33–36 (2003) 12. Sarkar, S., Phillips, J., Liu, Z., Vega, I., Grother, P., Bowyer, K.: The humanid gait challenge problem: Data sets, performance, and analysis. Trans. of Pattern Analysis and Machine Intelligence 27(2), 162–177 (2005) 13. Han, J., Bhanu, B.: Individual recognition using gait energy image. Trans. on Pattern Analysis and Machine Intelligence 28(2), 316–322 (2006) 14. Makihara, Y., Sagawa, R., Mukaigawa, Y., Echigo, T., Yagi, Y.: Gait recognition using a view transformation model in the frequency domain. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 151–163. Springer, Heidelberg (2006) 15. OU-ISIR Gait Database, http://www.am.sanken.osaka-u.ac.jp/gaitdb/index.html 16. Kovar, L., Gleicher, M.: Flexible automatic motion blending with registration curves. In: ACM SIGGRAPH 2003, pp. 214–224 (2003)
Measuring and Modeling of Multi-layered Subsurface Scattering for Human Skin Tomohiro Mashita1 , Yasuhiro Mukaigawa2, and Yasushi Yagi2 1
Cybermedia Center Toyonaka Educational Research Center, Osaka University, 1-32 Machikaneyama, Toyonaka, Osaka 560-0043, Japan 2 The Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka 567-0047, Japan
[email protected], {mukaigaw,yagi}@am.sanken.osaka-u.ac.jp
Abstract. This paper introduces a Multi-Layered Subsurface Scattering (MLSSS) model to reproduce an existing human’s skin in a virtual space. The MLSSS model consists of a three dimensional layer structure with each layer an aggregation of simple scattering particles. The MLSSS model expresses directionally dependent and inhomogeneous radiance distribution. We constructed a measurement system consisting of four projectors and one camera. The parameters of MLSSS were estimated using the measurement system and geometric and photometric analysis. Finally, we evaluated our method by comparing rendered images and real images.
1
Introduction
Dive into the Movie[1] is a system which, by scanning personal features such as the face, body shape, gait motion and so on, enables the members of an audience to appear in the movie as human characters. Technologies to reproduce a person in virtual space are important for these types of systems. In particular, good reproducibility of the skin is necessary for the further expression of personal features because it includes several aspects such as transparency, fineness, color, wrinkles, hairs, and so on. Simulation of subsurface scattering is important for increasing the quality of the reproduced skin because some of the characteristics of the skin are the result of optical behavior under the skin’s surface. Subsurface scattering is a phenomenon where light incident on a translucent material is reflected multiple times in the material and radiated to a point other than the incident point. If the transparency and inner components of the skin are expressed by simulating subsurface scattering, the expression of personal features will be improved. Measurement and simulation of subsurface scattering are challenging problems. The Monte Carlo simulation method as typified by MCML [2] is one approach for subsurface scattering. This approach aims to simulate photon behavior based on physics and requires extravagant computational resources. When expressing complex media like human skin, it is also difficult to measure the parameters. Diffusion approximation [3] enables effective simulation of dense and R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 335–344, 2011. c Springer-Verlag Berlin Heidelberg 2011
336
T. Mashita, Y. Mukaigawa, and Y. Yagi
homogeneous media. However, there are limitations in the expression for reproducing human skin because this approximation ignores the direction of incident and outgoing light and assumes a homogeneous medium. Jensen et al. covered expressiveness using a combination of the single-scattering and dipole diffusion model [4]. Tariq et al. [5] expressed the inhomogeneity as Spatially Varying Subsurface Scattering. They suggested that an inhomogeneous rendering method is important for the reproduction of real skin. An expression of inhomogeneous scattering media and scattering dependent on the incoming and outgoing light direction is necessary to achieve a more expressive reproduction of real skin. The parameters of the scattering model must be measurable to reproduce real skin. In this paper we propose a Multi-Layered Subsurface Scattering model to achieve this requirement. We experimentally evaluated our system by comparing real images and images synthesized using the estimated parameters of the subsurface scattering. Contribution – The proposed model expresses a subsurface scattering model which is dependent on the direction of incoming and outgoing light. This directional dependency is achieved by the combination of simple scattering and a layered structure. – The proposed model expresses an inhomogeneous scattering medium which is necessary for the reproduction of human skin including personal features. Related Work Rendering Method of Sub-Surface Scattering. Jensen proposed photon mapping [6] which traced individual photons for simulating volumetric subsurface scattering. Later, Jensen et al. proposed a dipole approximation [4] based on a diffusion approximation. Dipole approximation has been improved to a multipole model[7]. Ghosh et al.[8] proposed a practical method for modeling layered facial reflectance consisting of specular reflectance, single scattering and shallow and deep subsurface scattering. These methods were able to provide positive results for rendering photorealistic skin. The above methods are based on diffusion approximation. However, the importance of directional dependency is discussed. Donner et al. showed the disadvantage of diffusion approximation and proposed a spatially- and directionally-dependent model [9]. Mukaigawa et al. analyzed the anisotropic distribution of the lower-order scattering in 2D homogeneous scattering media [10]. Method for Measuring Subsurface Scattering. Various methods have been proposed for measuring subsurface scattering in translucent objects directly using a variety of lighting devices such as a point light source[11], a laser beam[4,12], a projector[5], and a fiber optic spectrometer[13].
Multi-layered Subsurface Scattering for Human Skin
337
Hair Skin surface lipid film Fine wrinkle Corneum
Blood vessel Melanocyte Collagen fiber
}
Epidermis
}
Dermis
}
Hypodermis
Fig. 1. Skin structure Fig. 2. SRD of a point illuminated hand
2 2.1
Expressing Spatial Radiance Distribution by Multi-Layered Subsurface Scattering Spatial Radiance Distribution
Human skin has a complicated multi-layered structure and each layer consists of many components which have their own optical properties. The main components and skin structure are shown in Fig. 1. This complicated structure makes it difficult to render a photorealistic human face or skin and to measure the optical properties of the components of human skin. The structure of human skin is inhomogeneous in three dimensions. This inhomogeneity is reflected in the optical behavior of incident light. The inhomogeneity of optical behavior is observed as a Spatial Radiance Distribution (SRD) in the surface of the human skin. Figure 2 shows the SRD in a hand by point illumination using a green laser pointer. Figure 2 left shows the experimental condition with general light. Figure 2 right shows the SRD with only the light of a laser pointer. In the upper row, laser light is incoming from the top of the hand. In the lower row, laser light is incoming from the left side of the hand. Obviously, the SRD in human skin is not homogeneous and depends on the direction of incoming light. We focus on the SRD and this paper describes the recreation of the SRD. 2.2
Concept of a Multi-Layered Subsurface Scattering Model
A simulation model to express the inhomogeneous and directionally dependent SRD is necessary for reproducing human skin. To reproduce a complicated SRD, an approximation model of the three-dimensional and inhomogeneous optical behavior is required. Furthermore, the parameters of the approximation model must be estimable. The important factor is not the number of parameters but the number of estimable parameters. We propose the Multi-Layered Subsurface Scattering (MLSSS) model. The MLSSS model satisfies the above conditions when using a combination of the multi-layered structure and a simple scattering model. The three-dimensional
338
T. Mashita, Y. Mukaigawa, and Y. Yagi Observed spatial radiance distribution
Camera Incident light Outgoing light
μ0 (xI,0 ) μ1(xI,1 )
ωO
Layer0
Layer1
xI,1
μ2(xI,2 )
xI,2
Layer2
μ3(xI,3 )
Layer3
xO
xI,3
xI
ωI d1
xO,1 d2
xO,2 xO,3
d3
Fig. 3. Concept of MLSSS model
optical behavior is approximated by the layer structure. The observed SRD is expressed by a multiplication of the scattering in each layer. We define the scattering in each layer as simple isotropic scattering. Inhomogeneity of the SRD is expressed by each scattering particle having its own variables. 2.3
Details of the Multi-Layered Subsurface Scattering Model
The concept of this model is shown in Fig. 3. Figure 3 right shows incident light scattered by a translucent material, where the horizontal lines represent layers of the MLSSS model and arrows represent incident and outgoing light. The graph in the upper left shows the observed asymmetric SRD using the camera. The MLSSS model assumes that the asymmetric SRD is a mixture of simple distributions in each layer as shown in Fig. 3 left. The MLSSS model expresses a SRD that depends on the direction of the incident light and the viewpoint of the observer. The Bidirectional Scattering Surface Reflectance Distribution Function (BSSRDF) S expresses the relationship between incident radiance LI and outgoing radiance LO with incident and outgoing directions ωI , ω O and points xI , xO as given in the following equation: Lo (xO , ωO ) = S(xI , ω I ; xO , ω O )LI (xI , ω I )(n(xI ), ω I )dω I dA(xI ), (1) Ω
A
where Ω is a sphere, A is illuminated area, n(xI ) is normal vector at the point xI , and ω I , ω O , and n(xI ) are unit vectors. If an isotropic SRD is assumed, the direction of incoming and outgoing light can be ignored. The MLSSS model
Multi-layered Subsurface Scattering for Human Skin
339
proposed in this paper is an approximation of the BSSRDF but does not ignore the incoming and outgoing direction. We assume incident radiance LI (xI , ω I ) at the point xI from the angle ω I . In the MLSSS model, the observed point in each layer xO,l shifts in proportion to the depth of the layer dl , where l is layer number. The observed outgoing radiance LO is expressed as Lo (xO , ωO ) = Σl LO,l (xO,l ), where xO,l = xO +
(2)
dl ωO . (n, ω O )
(3)
Thus the dependence on the outgoing direction of the MLSSS is defined. The SRD in each layer is assumed to be an isotropic distribution and independent of the direction of incident light similar to the diffusion approximation. The BSSRDF function in each layer is expressed as D(xO , xI ), where xI is the incident point. Thus the relationship between the incident and outgoing radiance in each layer is LO,l = D(xO,l , xI,l )(n(xI ), ω I )LI,l , (4) where ω I is the direction of the incoming light, n(xI ) is the normal vector at xI , LI,l is a part of the incident light scattered in layer l. The incident point in each layer also shifts in proportion to the depth of the layer dl . The incident point in a layer xI,l is obtained from the following equation: dl xI,l = xI + ωI . (5) (n, ω I ) LI,l is obtained by
wl LI , (6) wa + Σl wl where wl is the weight of light scatted in the layer l and wa is the weight of absorbed light. Finally, the relationship between the outgoing radiance LO (xO , ω O ) and the incident radiance LI (xI , ω) is expressed as LI,l =
LO (xO , ω O ) = Σl D(xO,l , xI,l )(n(xI ), ω I )LI (xI , ω). The case of general lighting is expressed as LO (xO , ωO ) = Σl D(xO,l , xI,l )(n(xI ), ω I )LI (xI , ω)dωdA(xI ). Ω
(7)
(8)
A
The BSSRDF function in the MLSSS model is S(xI , ω I ; xO , ω O ) = Σl D(xO,l , xI,l )(n(xI ), ω I )
wl . wa + Σl wl
(9)
This BSSRDF function retains the dependence on direction and consists of isotropic scattering, layer distance, and weight parameters.
340
T. Mashita, Y. Mukaigawa, and Y. Yagi
(a)
(b)
(c)
(d)
(e)
Fig. 4. MLSSS measurement system ((a) Hardware and (b) Configuration of camera and projectors) and Samples of captured images ((c) Structured light, (d) High frequency, (d) Line sweeping)
Observed Intensity
Decomposed Gaussian
Fig. 5. Schematic diagram of parameter estimation
3 3.1
Measuring System Hardware
We have constructed an implement for capturing facial images with pattern projection and a system for parameter estimation by geometric and photometric analysis. The measurement system is shown in Fig. 4(a) and (b). The measurement system consists of 4 LCD projectors and a camera. The inside of the measurement system is a black box with a hole for inserting a face and 4 holes for the projectors. The size of the box is 90cm × 90cm × 90cm. The camera is a Lw160c (Lumenera) with 1392 × 1042 pixel and 12 bit valid data for each pixel. The projectors are EMP-X5 (EPSON) with 1024 × 768 pixel and 2200 lm. The positions of the camera and projectors are shown in Fig. 4(b). 3.2
Geometric and Photometric Analysis
We describe the parameter estimation for the MLSSS model. The geometric parameters estimated are ω O , ωI , n, and the shape of a face. The photometric
Multi-layered Subsurface Scattering for Human Skin Calibration images
Input images (Slit)
Coded structured light
Projector matrix
Input images (High freq.)
Input images (Graycode)
Coded structured light
341
Direct component separation
Direct component
Surface geometry
Camera matrix Radiance distribution separation
Radiance distribution
Distribution Parameter (Layer 0)
Normal
Four light source photometric stereo
Distribution decomposition
...
Distribution Parameter (Layer N)
Fig. 6. Geometric photometric analysis
parameters estimated are the σ and wl included in the scattering particles because we use Gaussian distribution, D(). Figure 6 shows the flow of the geometric and photometric analysis. A structured light, slit pattern and high-frequency pattern are projected from each projector. Figures 4(c), (d), and (e) show samples of captured images with projected patterns. The shape of the face is reconstructed using the coded structured light projection method[14]. The captured images with high-frequency patterns are used for direct component separation [15]. The separated direct components are used for the normal vector estimation using the four light source photometric stereo method [16]. We extracted a one dimensional SRD from this geometric information and the slit line projected images. We took this one dimensional radiance distribution as a distribution mixture of the simple radiance distributions in each layer. Figure 5 shows a schematic diagram of the photometric parameter estimations. Scattering parameters are estimated by an EM algorithm.
4
Experiments
We conducted experiments to evaluate our model and measuring system by rendering. The conditions of rendering are as follows. The camera position was fixed and the same as when the image was captured. The light position was not the same as when the image was captured. To evaluate the MLSSS model, we
342
T. Mashita, Y. Mukaigawa, and Y. Yagi
(a) MLSSS
(b) Direct
(c) MLSSS + Direct
(d) Real
Fig. 7. Synthesized image with stripe projection Real R
Synth R
MLSSS R
Direct R
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0 250
300
350
400
450
500
Fig. 8. A profile of Fig. 7 of red plane and y = 150
used captured images with three light sources for parameter estimation and the remaining unused images were used for comparison with the rendered images. We rendered the image of a face with the camera and a light position identical to when the image was captured. The mixture distribution was decomposed by an EM algorithm, with 20 mixtures. 4.1
Evaluation of Anisotropic Scattering
We rendered an image with a strip light to evaluate the expression for anisotropic scattering. Figure 7(a) shows the rendered subsurface scattering. Figure 7(b) shows the separated direct component of a real image. Figure 7(c) shows the sum of Fig. 7 (a) and Fig. 7(b). Figure 7 (d) shows a real image with the same lighting conditions as the rendered image. We can compare Fig. 7(c) to Fig. 7(d) and it is obvious that the rendered subsurface scattering using the MLSSS model enables a reproduction of the directionally dependent subsurface scattering.
Multi-layered Subsurface Scattering for Human Skin
(a) MLSSS
(b) Direct
(b) MLSSS + (c) Homogeneous Direct MLSSS + Direct
343
(d) Real
Fig. 9. Rendered image with point light source
The red color’s profile from left to right of the image is shown in Fig. 8. In Fig. 8, the position of the light source is on the right side. The pixels with low values are not illuminated. We can see the effect of subsurface scattering from the boundaries between the illuminated and non-illuminated area. There are differences in the illuminated area’s intensity between the Real and Synthesized images. We think that the parameters of subsurface scattering include the influence of the position of the light source. 4.2
Evaluation of Inhomogeneity
We rendered an image of subsurface scattering with a point light source to evaluate the expression of inhomogeneity. Figure 9 shows the result of rendering with a point light source. Figure 9(a) is a synthesized indirect component using the MLSSS model. Figure 9(b) is the direct component of the real image. Figure 9(c) is Fig. 9(b) + Fig. 9(c). Figure 9(d) is rendered using the same parameters for all the scattering particles in each layer. The parameters for Fig. 9(d) are the mean of each parameter. Figure. 9(e) is a real image with the same light source for comparison. We can see the blood vessels, facial hair roots, and inhomogeneous redness in Fig. 9(a) and Fig. 9(c). However, we cannot see the components of the inside of the skin from Fig. 9(c). It is obvious that the inhomogeneity of subsurface scattering is important for the reality of rendered skin. However, the direct component is also important to express the texture of real skin.
5
Conclusions
In this paper, we proposed the MLSSS model. This model expresses the SRD of the surface of translucent material depending on the direction of the incident and outgoing light. The MLSSS model has pixel-level variation of the scattering parameters. We constructed a measurement system consisting of a camera and four projectors. The parameters of the MLSSS model are estimated from the captured images by geometric photometric analysis. In the experiments, images of subsurface scattering are rendered and compared with real images. Future work includes investigating other scattering models and layer structures.
344
T. Mashita, Y. Mukaigawa, and Y. Yagi
References 1. Morishima, S.: Dive into the Movie -Audience-driven Immersive Experience in the Story. IEICE Trans. Information and Systems E91-D(6), 1594–1603 (2008) 2. Wang, L., Jacques, S.L., Zheng, L.: Mcml–monte carlo modeling of light transport in multi-layered tissues. Computer Methods and Programs in Biomedicine 47(2), 131–146 (1995) 3. Stam, J.: Multiple scattering as a Diffusion Process. In: Eurographics Workshop in Rendering Techniques 1995, pp. 51–58 (1995) 4. Jensen, H.W., Marschner, S.R., Levoy, M., Hanrahan, P.: A practical model for subsurface light transport. In: SIGGRAPH 2001, pp. 511–518 (2001) 5. Tariq, S., Gardner, A., Llamas, I., Jones, A., Debevec, P., Turk, G.: Efficient estimation of spatially varying subsurface scattering parameters. In: VMV 2006, pp. 165–174 (2006) 6. Jensen, H.W.: Realistic Image Synthesis using Photon Mapping. AK Peters, Wellesley (2001) 7. Donner, C., Jensen, H.W.: Light diffusion in multi-layered translucent materials. In: SIGGRAPH 2005, pp. 1032–1039 (2005) 8. Ghosh, A., Hawkins, T., Peers, P., Frederiksen, S., Debevec, P.E.: Practical modeling and acquisition of layered facial reflectance. In: SIGGRAPH Asia 2008 (2008) 9. Donner, C., Lawrence, J., Ramamoothi, R., Hachisuka, T., Jensen, H.W., Nayar, S.K.: An Empirical BSSRDF Model. In: SIGGRAPH 2009 (2009) 10. Mukaigawa, Y., Yagi, Y., Raskar, R.: Analysis of Light Transport in Scattering Media. In: CVPR (2010) 11. Mukaigawa, Y., Suzuki, K., Yagi, Y.: Analysis of subsurface scattering under generic illumination. In: ICPR 2008 (2008) 12. Goesele, M., Lensch, H.P.A., Lang, J., Fuchs, C., Seidel, H.P.: Disco - acquisition of translucent objects. In: SIGGRAPH 2004, pp. 835–844 (2004) 13. Weyrich, T., Matusik, W., Pfister, H., Bickel, B., Donner, C., Tu, C., McAndless, J., Lee, J., Ngan, A., Jensen, H.W., Gross, M.: Analysis of human faces using a measurement-based skin reflectance model. In: SIGGRAPH 2006, pp. 1013–1024 (2006) 14. Inokuchi, S., Sato, K., Matsuda, F.: Range imaging system for 3-D object recognition. In: ICPR, pp. 806–808 (1984) 15. Nayar, S.K., Krishnan, G., Grossberg, M.D., Raskar, R.: Fast separation of direct and global components of a scene using high frequency illumination. In: SIGGRAPH 2006, pp. 935–944 (2006) 16. Barsky, S., Petrou, M.: The 4-source photometric stereo technique for threedimensional surfaces in the presence of highlights and shadows. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(10), 1239–1252 (2003)
An Indirect Measure of the Implicit Level of Presence in Virtual Environments Steven Nunnally1 and Durell Bouchard2 1
University of Pittsburgh School of Information Science
[email protected] 2 Roanoke College Department of Math, Computer Science, and Physics
[email protected]
Abstract. Virtual Environments (VEs) are a common occurrence for many computer users. Considering their spreading usage and speedy development it is ever more important to develop methods that capture and measure key aspects of a VE, like presence. One of the main problems with measuring the level of presence in VEs is that the users may not be consciously aware of its affect. This is a problem especially for direct measures that rely on questionnaires and only measure the perceived level of presence explicitly. In this paper we develop and validate an indirect measure for the implicit level of presence of users, based on the physical reaction of users to events in the VE. The addition of an implicit measure will enable us to evaluate and compare VEs more effectively, especially with regard to their main function as immersive environments. Our approach is practical, cost-effective and delivers reliable results. Keywords: Virtual Environments, Presence, Indirect Implicit Measure.
1 Introduction VEs are very important to many aspects as technology advances. Their immersive quality allows many different users to create situations that are either unavailable or impractical to create in real life. The uses of VEs range from simple spatial tasks of showing buildings to perspective buyers before the project is started to creating the same emotional and physical reactions in a high risk situation without the danger. Military, rescue, and medical personnel especially can utilize VEs to improve their skills without exposing themselves to risky situations while reducing cost at the same time, an important application since these personnel will soon be expected to act with precision in the worst of scenarios. Another great use of this emerging technology is fixing psychological disorders. VEs are being used to confront an individual’s worst phobia or to create a reaction, to help treat disorders like Post-Traumatic Stress. Unfortunately, this research is hindered because the different tools necessary to measure the different aspects of VEs are limited. The key to all of the applications listed above is the immersive quality that VEs possess. The term for this quality is presence, which is the user's perceived level of authenticity of the displayed R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 345–353, 2011. © Springer-Verlag Berlin Heidelberg 2011
346
S. Nunnally and D. Bouchard
environment. The above application must create the emotional and physical reactions to the presented situation in order to accomplish their goal of increasing efficiency when the real situation occurs [5]. Direct measures, like questionnaires, have been used in the past to measure the direct effect of the environment on the user. Researchers have worked to validate numerous questionnaires that not only support evidence that one VE has a higher level of presence than another, but can often give evidence to determine the specific trait that increases the level of presence [6]. These direct measures help researchers compare different features of different VEs to find which enables the greatest direct increase of presence on the user. However, some features may not consciously affect the user, making a direct measure insufficient. To advance research in this field, we must also be able to determine the indirect effects that the VEs have on users by measuring the implicit levels of presence as well as the explicit levels. In related work in Psychology such implicit measures have proven to be better predictors of behavior than direct and explicit measures, which could prove more important in an application where the emotional and physical reaction is the key to success [4]. Currently, no indirect measurement of the implicit level of presence is readily available for researchers. There are a few earlier studies which have attempted to find an indirect measure of the implicit level of presence. One of these studies attempted to use a method they named behavioral realism, which measures a reaction to an event within the environment. One attempt was with postural responses [2]. Freeman et al. tried this using a video of a car racing around a race track while measuring the subject’s response to the hairpin turns. They used a questionnaire in order to compare the sway data with the questionnaire data. They did this using many different presence altering features. One was stereoscopic vs. monoscopic video and another was different screen sizes [1,2]. They concluded that their data showed weak support for the use of this behavioral realism measurement in evaluating the VE features. The indirect measure did not correlate with the questionnaire. The conclusion rejected the hypothesis because it was necessary for the direct measure and indirect measure to correlate, which would only be coincidental if there was correlation because the measurements simply measure presence in different ways [3]. First there was not a large enough difference between the features that were used to increase presence given the number of participants. Tan showed that participants performed tasks much better on larger displays than on smaller ones, even when the visual angle was the same, supporting the idea that different field of views (FOVs) should affect presence [5]. The experiment detailed in this paper uses the CAVE at Roanoke College to rework the screen size experiment of Freeman. This CAVE encapsulates 170 degrees of the user’s FOV for the larger display, as compared to Freeman’s 50 degrees. Freeman also used a passive activity in the attempt to measure presence. The level at which the user is involved in the VE can also greatly affect presence [6]. This measurement uses an active response to measure presence and might only show results with an active environment. Finally, the hypothesis more correctly states that this same measurement can be used to measure presence because of the higher level of involvement and the greater difference between the features of the VEs. This will be supported by showing that this measurement rejects the hypothesis that the CAVE, or immersive condition, and the desktop display, or non-immersive condition, has a
An Indirect Measure of the Implicit Level of Presence in Virtual Environments
347
lesser or equal level of presence. Further, the measurement will show a greater reliability than the explicit measure of presence, with more consistent results and a greater confidence.
2 Experimental Evaluation 2.1 Procedure The design listed here is meant to correct the problems in the experiment described above. This experiment is a 1 by 1 comparison, for a more direct and less complicated experiment. Every subject was administered the test on both the non-immersive and the immersive condition to set up a within subject comparison, half started with the non-immersive condition. Participants actively navigated a virtual racetrack using a steering wheel and foot pedals to drive the racecar. The steering wheel was set up so that it could not adjust so the device would not accidentally move during the test, creating false movement in the postural sway. Participants were allowed to adjust their chair at the beginning of the experiment for comfort, but were then asked to keep the chair’s position fixed so that the data would be taken from the same distance and the FOV would be comparable between conditions. Participants wore suspenders and a head band that had infrared lights attached. With these lights, a Wiimote was used to track the person’s head and shoulder position to determine their sway. The character’s physical location and direction in the virtual environment was logged about every 30 milliseconds as was the position of the 4 infrared lights. The participants were not told about the indirect measure of the implicit level of presence, to not bias the results. The participant was then given a calibration sequence. First the steering wheel was turned in both directions with as little head and body movement as possible. Then the user turned both the steering wheel and their head simultaneously in both directions. This sequence was later used to minimize the effects of head turning and shouldering moving used to complete the task so that only postural sway based on a higher implicit level of presence was measured, instead of a difference between the screen sizes, as the user will be more likely to turn their head for the immersive condition. The racetrack had many different types of curves and turns to force the participant to take turns at different speeds. The participants were allowed a short training period, which was the same course in the opposite direction. The subject could get used to the environment and the control aspects of the experiment during this training period. At this time the participant would fill out the questionnaire to get used to the questions, so knowledge of the questionnaire did not bias the results of the second direct measurement. The course was a circuit, so that the participants would not finish before the trial time expired. The participants were asked to stay on the track and warned that bumps and hills off the track were designed to slow them down. After each trial the subjects were given a presence questionnaire to get the direct measure of presence of the condition (shown in Fig. 1). There were four questions. The first three were used to directly measure their explicit level of presence, while the fourth was used to determine whether or not the subject should continue with the experiment.
348
S. Nunnally and D. Bouchard
Fig. 1. The questionnaire used for this experiment
Twenty-two Roanoke College undergraduate students, eighteen male, volunteered for this experiment. A $25$ prize was awarded to the participant with the fastest lap time as an incentive used to increase involvement, adding to the presence of both VEs in the experiment. 2.2 Apparatus Most of the simulation is handled by Epic Games’ Unreal Tournament Game Engine (2003). Epic Games created this engine for a first-person shooter video game, but the weapons and crosshairs were made invisible for the purposes of this experiment. The racetrack was developed using Epic Games’ Unreal Editor. The movements were controlled using Gamebots, which is a command based language that passes messages between the user and the game character. This allows for a program to control the character’s movement so that the character moves like a car, with acceleration and deceleration, depending on the pedals position. This could also log the input device variables for analysis. The input device was a Logitech steering wheel and foot pedals. A program calculated the speed and rotational direction based off of the previous state of the character and the current state of the input device. Just like a normal car, the accelerator pedal would accelerate the car faster the further the pedal was pushed. The car would decelerate if no pedals were pushed and decelerate faster if the brake was pushed. The participants’ motion was recorded using 2 Wii sensor bars that are infrared lights mentioned in the previous section. They were attached to the participant with Velcro, using suspenders to trace the shoulders and a head band to trace the head (shown in Fig. 2). The lights were recorded with a Wiimote, which passed the information in pixel coordinates to a program which recorded the coordinates of the 4 infrared lights. These coordinates were used to measure the subject’s position.
An Indirect Measure of the Implicit Level of Presence in Virtual Environments
349
Fig. 2. Apparatus used to measure the users body position. The user is also using the steering wheel and foot pedals and the screen used for the non-immersive condition.
The display for the small screens was on a standard 18 inch CRT display, which fills a FOV of about 28 degrees. The large screen was displayed on the CAVE, which fills a FOV of about 170 degrees. The CAVE uses 4 Epson Projectors to project the image on three wall sized screens. The middle screen is 12 ft. wide by 8 ft. tall and the other two screens act as wings that are both 6 ft. wide by 8 ft. tall. The wings wrap around the user slightly to gain the 170 degrees FOV advantage.
3 Analysis The analysis for the indirect measure begins with two sets of data: the character’s rotational position and the four infrared light coordinates. Both have synchronized timestamps for all values. First, the measurement must be derived from the raw data. The recorded character‘s rotational position is not useful for this measurement, but the difference between the rotational values represents the position of the steering wheel, which is used to determine when the event of turning the vehicle occurs and to what extent. Next, some interpolation must be used for the data. The infrared lights were recorded as a pixel coordinate, but whenever one pixel was not recorded the x and y position of that light was recorded as a zero. All of these zeros were replaced using interpolation from the first point before the light was dropped to the next available point. Next, the timestamps of both must match to allow direct comparison between the two sets of data. The steering wheel position started and ended with each trial, and was therefore the base data that was used for comparison. The infrared coordinates
350
S. Nunnally and D. Bouchard
were thus taken as a weighted average for each of the timestamps used alongside the steering wheel position. This gives an interpolated position of the subject’s position for each record of the steering wheel position. The steering wheel data can range to some point number n in both the negative and positive direction so that the center position of the steering wheel is zero. The subjects’ physical position must match this condition in order to compare the results, so that they are comparable. This method takes the average horizontal position, or x value, of all four infrared lights and averages them any time the steering wheel data are within the 10% range in the center of its minimum and maximum number to find the subject’s resting location. It is assumed that if the steering wheel is near the center, than the vehicle is traveling in a near straight direction and no centripetal acceleration would be felt by the driver. Now each value of the averaged horizontal position is changed so that it represents the difference from the resting point, and therefore ranges from some number negative number to some positive number and the resting position is near zero, much like the steering wheel data. The data are then normalized so that both sets range from -1 to 1 using the minimum and maximum points of both sets.
Fig. 3. A graphical representation of the data before the correlation is calculated. The lines represent the steering wheel position and the participants position away from their resting point.
Then, the calibration taken during the experiment was used to take away all motion not related to postural sway. The non-immersive condition used only the steering wheel calibration, since the screen is not large enough for head turning. The immersive condition used the other calibration sequence described in the procedure. The
An Indirect Measure of the Implicit Level of Presence in Virtual Environments
351
calibration data was used taking the subject’s horizontal position change based on the percentage the steering wheel is turned. This value is added to the horizontal position at all times based on the amount the wheel is turned and which condition is in trial. To find the percentage of correlation (shown in Fig. 3) between the two sets of data, the number with the least absolute value is divided by the number with the greatest absolute value at each timestamp. This value is then averaged through the trial to find the presence value of that condition. This number is between -1 and 1, where -1 shows completely negative correlation, an unlikely possibility with this measurement, zero shows no correlation, and thus no implicit level of presence, and 1 shows a completely positive correlation, and thus a perfect implicit level of presence is felt. The questionnaire data was simple to analyze. The first three answers are a value between 0 and 100. These values are averaged since the questions ask about presence in different ways. This should minimize deviation with a greater number of answers for better accuracy in the VE’s presence value. A greater number should show greater presence.
4 Results The immersive conditioned produced a 0.1903 average correlation, whereas the nonimmersive condition only had a 0.0338 average (as shown in Fig. 4). 21 of the 24 participants' values confirmed that the CAVE is more immersive, with a presence value greater for the immersive condition. The difference between the values for the conditions was significant for this measurement (p < 0.00001).
Fig. 4. A graph of the results for the indirect measure of the implicit level of presence
The direct measure of the explicit level of presence had an average value of 66.5909 for the immersive condition and 58.7121 for the non-immersive condition (as
352
S. Nunnally and D. Bouchard
shown in Fig. 5). Only, 15 of the 24 participants' values confirmed that the CAVE is more immersive, but the difference was significant for this measurement (p < 0.03).
Fig. 5. A graph of the results for the direct measure of the explicit level of presence
5 Conclusions The experiment produced evidence supporting the use of the indirect measure of the implicit level of presence for research with VEs. Furthermore, it does so more reliably and with a higher confidence level than the direct measure of the explicit level or presence used in this study. This shows that the direct measurement of the explicit level of presence fails to capture important aspects of a VE that relate to subconscious processes which can have great impact on behavior. This adds a powerful tool to VE researchers' arsenal in order to advance the technology. The utility offered by this new measurement offers great insight into the differences of implicit and explicit levels of presence using an indirect and direct level of presence.
6 Future Work Our results motivate further work in which our measure can be used to predict the behavior of participants, particularly for events that aim to induce emotional response or reflexes. The validated measurement should be tested against different questionnaires to see if this is comparable to other questionnaires, and discover the advantage of using one over another. This experiment used a very simple questionnaire from Freeman et al.’s experiments, but more complex questionnaires exist. Making this research more usable with different behavioral events should also be considered and tested. Our measurement only works with tasks that can make use of centripetal acceleration, like the driving task presented here. This is limiting because
An Indirect Measure of the Implicit Level of Presence in Virtual Environments
353
it may not be suitable for all researchers to test their presence enhancing features in a driving VE. This is an example of how this specific indirect measurement can work. For other tasks, possibly more generic tasks, one should seek new implicit measures. Also, the method of analyzing the data uses the minimums and maximums within the data sets to normalize the data which could minimize bias between subjects if the bias between users' and their movement comes from predetermined movement extremes. Additional Experiments could confirm or deny this idea, which could aid the research efforts by creating easier comparisons between experiments. If these steps are achieved then indirect measures of presence can be used alongside direct measures to discover differences in VEs that have not yet been possible. Testing could begin to decide whether or not different senses like sounds or smells could have a significant effect on presence. A cost-benefit analysis could be completed so each feature of a VE can be examined for its price and the added presence value that is achieved. This could help schools and companies who may currently have doubts by showing exactly what presence value they will obtain for a certain price tag. Then schools, medical facilities, or the military can consider whether or not it is affordable to introduce as a new training or exploratory environment to aide in these efforts. Acknowledgments. At Roanoke College I would like to thank Dr. Bouchard and Dr. Childers for helping me with the project. At the University of Pittsburgh I would like to thank Dr. Lewis and Dr. Kolling for supporting the project. Special thanks to Dr. Hughes for starting my interest in this project.
References 1. Freeman, J., Avons, S.E., et al.: Effect of Stereoscopic Presentation, Image Motion, and Screen Size on Subjective and Objective Corroborative Measures of Presence. Presence 10(3), 298–311 (2001) 2. Freeman, J., Avons, S.E., et al.: Using Behavioural Realism to Estimate Presence: A Study of the Utility of Postural Responses to Motion-Stimuli. Presence 9(2), 149–164 (2000) 3. Hofmann, W., Gawronski, B., Gschwendner, T., Le, H., Schmitt, M.: A Meta-Analysis on the Correlations Between the Implicit Association Test and Explicit Self-Report Measures. University of Trier, Germany (2004) (unpublished manuscript) 4. De Houwer, J.: What Are Implicit Measures and Why Are We Using Them. In: The Handbook of Implicit Cognition and Addiction, pp. 11–28. Sage Publishers, Thousand Oaks (2006) 5. Tan, D.S., Gergle, D., et al.: Physically Large Displays Improve Performance on Spatial Tasks. ACM Transactions on Computer-Human Interaction 13(1), 71–99 (2006) 6. Witmer, R.G., Singer, M.J.: Measuring Presence in Virtual Environments: A Presence Questionnaire. Presence 7(3), 225–240 (1998)
Effect of Weak Hyperopia on Stereoscopic Vision Masako Omori1, Asei Sugiyama2, Hiroki Hori3, Tomoki Shiomi3, Tetsuya Kanda3, Akira Hasegawa3, Hiromu Ishio4, Hiroki Takada5, Satoshi Hasegawa6, and Masaru Miyao4 1
Facility of Home Ergonomics, Kobe Women’s University, 2-1 Aoyama, Higashisuma, Suma-ku, Kobe-city 654-8585, Japan 2 Department of Information Engineering, School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan 3 Department of Information Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan 4 Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan 5 Graduate School of Engineering, Human and Artificial Intelligent Systems, University of Fukui, 3-9-1 Bunkyo, Fukui 910-8507, Japan 6 Department of Information and Media Studies, Nagoya Bunri University, 365 Maeda, Inazawa-cho, Inazawa-city, Aichi 492-8520, Japan
[email protected]
Abstract. Convergence, accommodation and pupil diameter were measured simultaneously while subjects were watching 3D images. The subjects were middle-aged and had weak hyperopia. WAM-5500 and EMR-9 were combined to make an original apparatus for the measurements. It was confirmed that accommodation and pupil diameter changed synchronously with convergence. These findings suggest that with naked vision the pupil is constricted and the depth of field deepened, acting like a compensation system for weak accommodation power. This suggests that people in middle age can view 3D images more easily if positive (convex lens) correction is made. Keywords: convergence, accommodation, pupil diameter, middle age and 3D image.
1 Introduction Recently, a wide variety of content that makes use of 3D displays and stereoscopic images is being developed. Previous studies reported effects on visual functions after using HMDs. Sheehy and Wilkinson (1989) [1] observed binocular deficits in pilots following the use of night vision goggles, which have similarities to the HMDs used for VR systems. Mon-Williams et al (1993) [2] showed that physiological changes in the visual system occur after periods of exposure of around 20 min. Howarth (1999) [3] discussed the oculomotor change which might be expected to occur during immersion in a virtual environment whilst wearing HMD. R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 354–362, 2011. © Springer-Verlag Berlin Heidelberg 2011
Effect of Weak Hyperopia on Stereoscopic Vision
355
Meanwhile, effects such as visual fatigue and motion sickness from continuously watching 3D images and the influence of binocular vision on human visual function remain insufficiently understood. Various studies have been performed on the influence of stereoscopic images on visual function [4] [5] [6]. Most prior studies discussed the effects of visual image quality and extent of physical stress. These studies have employed bioinstrumentation or surveys of subjective symptoms [7]. To find ways to alleviate visual fatigue and motion sickness from watching 3D movies further studies are needed. Under natural viewing conditions the depth of convergence and accommodation agrees in young subjects. However, when viewing a stereoscopic image using binocular parallax, it has been thought that convergence moves with the position of the reproduced stereoscopic image, while accommodation remains fixed at the image display, resulting in contradictory depth information between convergence and accommodation, called discordance, in the visual system [8]. With the aim of qualitatively improving stereographic image systems, measurements under stereoscopic viewing conditions are needed. However, from objective measurements of the accommodation system, Miyao et al., [9] confirmed that there is a fluctuating link between accommodation and convergence in younger subjects during normal accommodation. In middle-aged and elderly people gazing at forward and back movement for a long time, it is said that the discordance is such that even during natural viewing there is a slight difference between accommodation and convergence, with accommodation focused on a position slightly farther than that of real objects and convergence focused on the position of the real objects. However, we obtained results that indicate discordance between accommodation and convergence does not occur in younger subjects gazing at a stereoscopic view for a given short time. Weal (1975) [10] reported the deterioration of near visual acuity in healthy people is accelerated after 45 years of age. We found a similar tendency in near vision in this experiment [11]. Similar to presbyopia, cataract cloudiness gradually becomes severer with age after middle age. Sun and Stark [12] also reported that middle-aged subjects have low accommodative power, that their vision should be properly corrected for VDT use, and that more care should be taken to assure they have appropriate displays than for their younger counterparts. In fact, it may be possible for middle-aged and elderly people with weak hyperopia to supplement accommodative power when they are watching 3D images by deepening the depth of field with trigger pupil contraction. However, this pupil contraction due to near reaction makes it a little harder to see with reduced light. The possibility is therefore suggested that pupil contraction is alleviated with correction by soft contact lenses. The purpose of this experiment was to investigate pupil expansion by simultaneously measuring accommodation, convergence and pupil diameter.
2 Methods 2.1 Accommodative and Convergence Measurement and Stimulus In this experiment, visual function was tested using a custom-made apparatus. We combined a WAM-5500 auto refractometer and EMR-9 eye mark recorder to make an
356
M. Omori et al.
original machine for the measurements. The WAN-5500 auto refractometer (Grand Seiko Co., Ltd.) can measure accommodation power with both eyes opened under natural conditions, and the EMR-9 eye mark recorder (Nac Image Technology, Ltd) can measure the convergence distance. 3D images were presented using a liquid crystal shutter system. In this experiment, 3D image was shown with a display set 60 cm in front of subjects. The distance between the subjects’ eyes and the target on the screen was 60 cm (1.00/0.6 = 1.67 diopters (D)) (Note: diopter (D) = 1/distance (m); MA (meter angle) = 1/distance (m)). The scene for measurements and the measurement equipment are shown in Fig. 1. Convergence, accommodation and pupil diameter were measured simultaneously while subjects were watching 3D images.
Fig. 1. Experimental Environment
2.2 Experiment Procedure The subjects were three healthy middle-aged people (37, 42, and 45 years old) with normal uncorrected vision, one healthy younger person (25 years old) with correction using soft contact lenses and one healthy elderly person (59 years old) with normal uncorrected vision. The subjects were instructed to gaze at the center of the sphere with the gaze time was set at 40 seconds. All subjects had a subjective feeling of stereoscopic vision. While both eyes were gazing at the stereoscopic image, the lens accommodation of the right eye was measured and recorded. Informed consent was obtained from all subjects and approval was received from the Ethical Review Board of the Graduate School of Information Science at Nagoya University. The concept of stereoscopic vision is generally explained to the public as follows: During natural vision, lens accommodation (Fig. 2) coincides with lens convergence (Fig. 3). Gaze time was 40 seconds, and the accommodation of the right eye was measured and recorded while the subjects gazed at the stereoscopic image with both eyes. The sphere moved virtually in a reciprocating motion range of 20 cm to 60 cm in front of the observer with a cycle of 10 seconds (Fig. 4). They gazed at the open-field stereoscopic target under binocular and natural viewing conditions.
Effect of Weak Hyperopia on Stereoscopic Vision
357
Fig. 2. Lens Accommodation
Fig. 3. Convergence
Fig. 4. Spherical Object Movies (Power 3D™ : Olympus Visual Communications, Corp.)
Measurements were made three times under two conditions: (1) with the subjects using uncorrected vision and (2) with subjects using soft contact lenses (+1.0 D). Subjects used their naked eyes or wore soft contact lenses, and their refraction was corrected to within ±0.25 diopter. (“Diopter” is the refractive index of a lens and an index of accommodation power. It is the inverse of meters, for example, 0 stands for infinity, 0.5 stands for 2 m, 1 stands for 1 m, 1.5 stands for 0.67 m, 2 stands for 0.5 m, and 2.5 stands for 0.4 m). Middle-aged and elderly subjects with normal vision of the naked eyes also wore soft contact lenses (+1.0).
358
M. Omori et al.
The experiment was conducted according to the following procedures (Table 1). Subjects’ accommodation and convergence were measured as they gazed in binocular vision at a sphere presented in front of them. The illuminance of the experimental environment was about 36.1 (lx), and the brightness of the sphere in this environment was 5.8 (cd/m2). Table 1. Experimental Environment Brightness of Spherical Object(cd/m2)
5.8
illuminance(lx)
36.1 Far
0.33
Near
12
Size of Spherical Object (deg)
3 Results Convergence, accommodation and pupil diameter were measured simultaneously while subjects were watching 3D images. The following results were obtained in experiments in which subjects were measured with naked eyes or while wearing soft contact lenses, with their refraction corrected to within ±0.25 diopters (Figure 5, 6, 7). Figure 5 shows the results for Subject A (25 years of age), Figure 6 shows the results for Subject B (45 years of age) and Figure 7 shows results for Subject C (59 years of age). Figures 8 and 9 show the results for subjects who wore soft contact lenses for near sight. Figure 8 shows measurement results for Subject B (45 years of age), and Figure 9 shows the results for Subject C (59 years of age). These Figures show accommodation and convergence with diopters on the left side vertical axis. The right vertical axis shows pupil diameter. Table 2 shows the average pupil diameter for middle-aged Subject B and elderly Subject C. Subjects’ convergence was found to change between about one diopter (1 m) and five diopters (20 cm) regardless of whether they were wearing the soft contact lenses. The diopter value also fluctuated with a cycle of 10 seconds. In addition, we confirmed that the accommodation and pupil diameter changed synchronously with convergence. Thus, the pupil diameter became small and accommodation power became large when the convergence distance became small. The accommodation amplitude every 10 seconds was from 2D to 2.5D with the naked eye (Figure 5), Figure 6 shows 0.5D, Figure 7 shows from 0.5D to 0.8D, and when the subjects were wearing soft contact lenses (+1.0 D) for mild presbyopia, Figure 8 shows from 0.5D to 1D, and Figure 9 shows from 0.5D to 1.5D. From the results in Table 2, it is seen that the average pupil diameter with naked eyes was larger than with corrected soft contact lenses. The dilation in the diameter was 0.4 mm for Subject B and 0.2 mm for Subject C.
Effect of Weak Hyperopia on Stereoscopic Vision
359
Table 2. Average of pupil diameter with uncorrected and soft contact lenses (+1.0 D)
middle-aged Subject B
elderly Subject C
Average of pupil diameter with subjects using uncorrected vision
3.69 mm
2.02 mm
Average of pupil diameter with subjects using soft contact lenses (+1.0 D)
4.05 mm
2.20 mm
accommodation
distance movement of 3D image
pupil diameter (mm)
pupil diameter
convergence
Fig. 5. Subject A (25 years of age) wore soft contact lenses for near sight
pupil diameter
distance movement of 3D image
accommodation
Fig. 6. Subject B (45 years of age) with naked eyes
pupil diameter (mm)
convergence
Fig. 7. Subject C (59 years of age) wore soft contact lenses for near sight
M. Omori et al.
convergence
convergence
distance movement of 3D image
pupil diameter (mm)
pupil diameter
accommodation
Fig. 8. Subject B (45 years of age) wore soft contact lenses for near sight
pupil diameter distance movement of 3D image
pupil diameter (mm)
360
accommodation
Fig. 9. Subject C (59 years of age) wore soft contact lenses for near sight
4 Discussions It was shown that the focus moved to a distant point as the virtual movement of the visual target away from the subject. The change occurred at a constant cycle of 10 seconds, synchronously with the movement of the 3D image. By measuring the accommodation movement in response to the near and far movement of the 3D image, the distant view was shown to be about from 1D to 5D (1 meter to 0.2 meters). These results were consistent with the distance movement of the 3D image (0.6 – 0.2 meter). Thus, we were able to measure the results as subjects’ watched 3D image with both eyes. Figure 5 shows large movements in both accommodation and convergence. Wann et al. [13] stated that within a virtual reality system, the eyes of a subject must maintain accommodation at the fixed LCD screen, despite the presence of disparity cues that necessitate convergence eye movements to capture the virtual scene. Moreover, Hong et al. [14] stated that the natural coupling of eye accommodation and convergence while viewing a real-world scene is broken when viewing stereoscopic displays. In the 3D image with the liquid crystal shutter system, the results of this study differed from those of a previous study in which accommodation was fixed on the LCD. Meanwhile, the change in accommodation was smaller than the large movement seen in convergence. These results show the influence of aging in the deterioration of accommodation. However, accommodation in Subject B and Subject C was fixed behind the display. Accommodation was not fixed on the display in any case. This also did not match the previous study. At the closest the distance, the difference between accommodation and convergence was about 4D in Figure 6, about 5-6D in Figure 7, about 2-3D in Figure 8 and about 4-5D in Figure 9. In Figures 5–7, the accommodation change gradually becomes smaller and more irregular, and the values become closer to 0. This is related to a lack of accommodation power due to presbyopia. In Figures 6–7, the pupil diameter becomes smaller in synchronization with the near vision effort of convergence. It is suggested that a near response occurs with 3D images similar to
Effect of Weak Hyperopia on Stereoscopic Vision
361
that with real objects. It is reported that near response occurs gradually from 0.3 m (3.3D), and then pupil diameter reaches the maximum with rapid contraction at 0.2 m (5D). Figures 6–8 show similar results. However, contraction of pupil diameter is not seen in Figure 5. The above suggests that the reason of middle-aged subject are able to view 3D images stereoscopically is that have supplemented accommodation power is supplemented with a deepened depth of field from pupil contraction. Thus, it is thought that with contraction of pupil diameter, images with left-right parallax can be perceived. On the other hand, pupil contraction implies that a decreased amount of light enters the retina. Therefore, it is suggested that elderly people perceive things as being darker than younger people do. In this study, rapid changes in pupil diameter were seen from 5 second before the start of measurements (Figures 5–8). It may be that light reaction occurred because of the rapid change in the display from presentation of 3D images. It is reported that the amount of pupil diameter contraction from the light reaction becomes progressively smaller with age. The present experimental result was the same. It takes about 1 sec until the changes from the light reaction are over. Therefore, in this experiment, average pupil diameter from 10 sec to 20 sec, when no influence is seen, was compared. In middle-aged Subject B and elderly Subject C, pupil diameter became about 10% larger when they wore soft contact lenses than with normal vision for near sight. Thus, it is suggested that pupil contraction is reduced as a result of compensation by soft contact lenses with near eye sight. Especially, it was shown in Figures 8 and 9 that accommodation follows convergence. This suggests that people in middle age can view 3D images more easily if positive (convex lens) correction is made.
5 Conclusions In this study we used 3D images with a virtual stereoscopic view. The influences of age and visual functions on stereoscopic recognition were analyzed. We may summarize the present experiment as follows. 1. Accommodation and convergence change occurred at a constant cycle of 10 seconds, synchronously with to the movement of the 3D image. 2. In the middle-aged subject and elderly subject, accommodation showed less change than convergence. 3. The pupil diameter of the middle-aged subject and elderly subject contracted in synchronization with near vision effort of convergence. 4. Discordance of accommodation and convergence was alleviated with near sight correction with soft contact lenses. Contraction of pupil diameter was also alleviated. These findings suggest that with naked vision the pupil is constricted and the depth of field is deepened, acting like a compensation system for weak accommodation power. When doing visual near work, a person’s ciliary muscle of accommodation constantly changes the focal depth of the lens of the eye to obtain a sharp image. Thus, when the viewing distance is short, the ciliary muscle must continually contract for accommodation and convergence. In contrast, when attention is allowed to wander over distant objects, the eyes are focused on infinity and ciliary muscles remain
362
M. Omori et al.
relaxed (Kroemer & Grandjean, 1997) [15]. Consequently, it is thought that easing the strain of the ciliary muscle due to prolonged near work may prevent accommodative asthenopia. In addition, this study is suggested that pupil contraction is reduced as a result of compensation by soft contact lenses with near eye sight. This suggests that people in middle age can view 3D images more easily if positive (convex lens) correction is made.
References 1. Sheehy, J.B., Wilkinson, M.: Depth perception after prolonged usage of night vision goggles. Aviat. Space Environ. Med. 60, 573–579 (1989) 2. Mon-Williams, M., Wann, J.P., Rushton, S.: Binocular vision in a virtual world: visual deficits following the wearing of a head-mounted display. Ophthalmic Physiol. Opt. 13(4), 387–391 (1993) 3. Howarth, P.A.: Oculomotor changes within virtual environments. Appl. Ergon. 30, 59–67 (1999) 4. Heron, G., Charman, W.N., Schor, C.M.: Age changes in the interactions between the accommodation and vergence systems. Optometry & Vision Science 78(10), 754–762 (2001) 5. Schor, C.: Fixation of disparity: a steady state error of disparity-induced vergence. American Journal of Optometry & Physiological Optics 57(9), 618–631 (1980) 6. Rosenfield, M., Ciuffreda, K.J., Gilmartin, B.: Factors influencing accommodative adaptation. Optometry & Vision Science 69(4), 270–275 (1992) 7. Iwasaki, T., Akiya, S., Inoue, T., Noro, K.: Surmised state of accommodation to stereoscopic three-dimensional images with binocular disparity. Ergonomics 39(11), 1268–1272 (1996) 8. Hoffman, D.M., Girshick, A.R., Akeley, K., Banks, M.S.: Vergence-accommodation conflicts hinder visual performance and cause visual fatigue. Journal of Vision 8(3), 33.1– 30 (2008) 9. Miyao, M., Ishihara, S.Y., Saito, S., Kondo, T.A., Sakakibara, H., Toyoshima, H.: Visual accommodation and subject performance during a stereographic object task using liquid crystal shutters. Ergonomics 39(11), 1294–1309 (1996) 10. Weale, R.A.: Senile changes in visual acuity. Transactions of the Ophthalmological Societies of the United Kingdom 95(1), 36–38 (1975) 11. Omori, M., Watanabe, T., Takai, J., Takada, H., Miyao, M.: An attempt at preventing asthenopia among VDT workers. International J. Occupational Safety and Ergonomics 9(4), 453–462 (2003) 12. Sun, F., Stark, L.: Static and dynamic changes in accommodation with age. In: Presbyopia: Recent Research and Reviews from the 3rd International Symposium, pp. 258–263. Professional Press Books/Fairchild Publications, New York (1987) 13. Wann, J.P., Rushton, S., Mon-Williams, M.: Natural problems for stereoscopic depth perception in virtual environments. Vision Res. 35(19), 2731–2736 (1995) 14. Hong, H., Sheng, L.: Correct focus cues in stereoscopic displays improve 3D depth perception. SPIE, Newsroom (2010) 15. Kroemer, K.H.E., Grandjean, E.: Fitting the Task to the Human, 5th edn. Taylor & Francis, London (1997)
Simultaneous Measurement of Lens Accommodation and Convergence to Real Objects Tomoki Shiomi1, Hiromu Ishio1, Hiroki Hori1, Hiroki Takada2, Masako Omori3, Satoshi Hasegawa4, Shohei Matsunuma5, Akira Hasegawa1, Tetsuya Kanda1, and Masaru Miyao1 1
Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan 2 Graduate School of Engineering, Fukui University, 3-9-1 Bunkyo, Fukui, 910-8507, Japan 3 Facility of Home Economics, Kobe Women’s University, 2-1 Aoyama, Higashisuma, Sumaku, Kobe 654-8585, Japan 4 Department of Information and Media Studies, Nagoya Bunri University, Maeda365, Inazawa-cho, Inazawa, Nagoya 492-8620, Japan 5 Nagoya Industrial Science Research Institute, 2-10-19 sakae, naka-ku, Nagoya 460-0008, Japan
[email protected]
Abstract. Human beings can perceive that objects are three-dimensional (3D) as a result of simultaneous lens accommodation and convergence on objects, which is possible because humans can see so that parallax occurs with the right and left eye. Virtual images are perceived via the same mechanism, but the influence of binocular vision on human visual function is insufficiently understood. In this study, we developed a method to simultaneously measure accommodation and convergence in order to provide further support for our previous research findings. We also measured accommodation and convergence in natural vision to confirm that these measurements are correct. As a result, we found that both accommodation and convergence were consistent with the distance from the subject to the object. Therefore, it can be said that the present measurement method is an effective technique for the measurement of visual function, and that even during stereoscopic vision correct values can be obtained. Keywords: simultaneous measurement, eye movement, accommodation and convergence, natural vision.
1 Introduction Recently, 3-dimensional images have been spreading rapidly, with many opportunities for the general population to come in contact with them, such as in 3D films and 3D televisions. Manufacturers of electric appliances, aiming at market expansion, are strengthening their line of products with digital devices related 3D. Despite this increase in 3D products and the many studies that have been done on binocular vision, the influence of binocular vision on human visual function remains R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 363–370, 2011. © Springer-Verlag Berlin Heidelberg 2011
364
T. Shiomi et al.
insufficiently understood [1, 2, 3, 4]. In considering the safety of viewing virtual 3-dimensional objects, investigations of the influence of stereoscopic vision on the human body are important. Though various symptoms, such as eye fatigue and solid intoxication, are seen often when humans continue to view 3-dimensional images, neither solid intoxication nor eye fatigue is a symptom seen in the conditions in which we usually live, socalled natural vision. One of the reasons often given for this is that lens accommodation (Fig.1) and convergence (Fig.2) are inconsistent.
Fig. 1. Principle of lens accommodation
Fig. 2. Principle of convergence
Accommodation is a reaction that changes refractive power by changing the curvature of the lens with the action of the musculus ciliaris of the eye and the elasticity of the lens, so that an image of the external world is focused on the retina. Convergence is a movement where both eyes rotate internally, functioning to concentrate the eyes on one point to the front. There is a relationship between accommodation and convergence, and this is one factor that enables humans to see one object with both eyes. When an image is captured differently with right and left eyes (parallax), convergence is caused. At the same time, focus on the object is achieved by accommodation. Binocular vision using such mechanisms is the main method of presenting 3-dimensional images, and many improvements have been made [5, 6]. In explaining the inconsistencies above, it is said that accommodation is always fixed on the screen where the image is displayed, while convergence intersects at the position of the stereo images. As a result, eye fatigue, solid intoxication, and other symptoms occur. However, we obtained results that indicate inconsistency between accommodation and convergence does not occur [7]. Even so, it is still often explained that inconsistency is a cause of eye symptoms. One reason is that we could not simultaneously measure accommodation and convergence in our previous study, and the proof for the results was insufficient. To resolve this inconsistency, it was thought that measuring accommodation and convergence simultaneously was needed. We therefore developed a method to simultaneously measure accommodation and convergence. Comparison with measurements of natural vision is essential in investigating stereoscopic vision. For such comparisons, it is first necessary to make sure that the
Simultaneous Measurement of Lens Accommodation and Convergence to Real Objects
365
measurements of natural vision are accurate. We therefore focused on whether we could accurately measure natural vision, and we report the results of those measurements.
2 Method The experiment was done with six healthy young males (age: 20~37). Subjects were given a full explanation of the experiment in advance, and consent was obtained. Subjects used their naked eyes or wore soft contact lenses (one person with uncorrected vision, 5 who wore soft contact lenses), and their refraction was corrected to within ±0.25 diopter. (“Diopter” is the refractive index of lens. It is an index of accommodation power. It is the inverse of meters, for example, 0 stands for infinity, 0.5 stands for 2 m, 1 stands for 1 m, 1.5 stands for 0.67 m, 2 stands for 0.5 m, and 2.5 stands for 0.4 m). Devices used in this experiment were an auto ref/keratometer, WAM-5500 (Grand Seiko Co. Ltd., Hiroshima, Japan) and an eye mark recorder, EMR-9 (NAC Image Technology Inc., Tokyo, Japan). 2.1 WAM-5500 The WAM-5500 (Fig. 3) provides an open binocular field of view while a subject is looking at a distant fixation target, and has two measurement modes, static mode and dynamic mode. We used the dynamic mode in this experiment. The accuracy of the WAM-5500 in measuring refraction in the dynamic mode of operation was evaluated using the manufacturer’s supplied model eye (of power -4.50 D). The WAM-5500 set to Hi-Speed (continuous recording) mode was connected to a PC running the WCS-1 software via an RS-232 cable that allows refractive data collection at a temporal resolution of 5 Hz. No special operation was needed during dynamic data collection. It was necessary to depress the WAM-5500 joystick button once to start and again to stop recording at the beginning and end of the desired time frame, respectively. The software records dynamic results, including time (in seconds) of each reading for pupil size and MSE (mean spherical equivalent) refraction in the form of an Excel Comma Separated Values (CSV) file [8, 9].
Fig. 3. Auto ref/keratometer WAM-5500 (Grand Seiko Co. Ltd., Hiroshima, Japan)
366
T. Shiomi et al.
2.2 EMR-9 The EMR-9 (Fig. 4) measured eye movement using the papillary/corneal reflex method. The horizontal measurement range was 40 degrees, the vertical range was 20 degrees, and the measurement rate was 60 Hz. This consisted of two video cameras fixed to the left and right sides of the face, plus another camera (field-shooting unit) fixed to the top of the forehead. Infrared light sources were positioned in front of each lower eyelid. The side cameras recorded infrared light reflected from the cornea of each eye while the camera on top of the forehead recorded pictures shown on the screen. After a camera controller superimposed these three recordings with a 0.01 s electronic timer, the combined recording was recorded on a SD card. Movement of more than 1 degree with a duration greater than 0.1 s was scored as an eye movement. A gaze point was defined by a gaze time exceeding 0.1 s. This technique enabled us to determine eye fixation points. The wavelength of the infrared light was 850 nm. After data were preserved on an SD card, they were read into a personal computer [10,11].
Fig. 4. EMR-9 (NAC Image Technology Inc., Tokyo, Japan)
2.3 Experiment These two devices were combined, and we simultaneously measured focus distances of accommodation and convergence when subjects were gazing at objects (Fig. 5). The experiment was conducted according to the following procedures. Subjects’ accommodation and convergence were measured as they gazed in binocular vision at an object (tennis ball: diameter 7 cm) presented in front of them. The object moved in a range of 0.5 m to 1 m, with a cycle of 10 seconds. Measurements were made four times every 40 seconds. The illuminance of the experimental environment was about 103 (lx), and the brightness of the object in this environment was 46.9 (cd/m2).
Simultaneous Measurement of Lens Accommodation and Convergence to Real Objects
367
Fig. 5. Pattern diagram of measurements
3 Results In this study, we simultaneously measured subjects’ accommodation and convergence while they were gazing at an object in binocular vision. The results of these measurements were comparable for all subjects. The results of the experiment for two subjects are shown as typical examples (Fig. 6, Fig. 7). In Fig. 6 and 7, “accommodation” stands for focal length of lens accommodation, and “convergence” stands for convergence focal length. These figures show that the accommodation and convergence of both subject A and B changed in agreement. Moreover, the change in the diopter value occurred with a cycle of about ten seconds. Maximum diopter values of accommodation and convergence of A and B were both about 2D, which is equal to 0.5 m. This was consistent with the distance from the subject to the object. On the other hand, their minimum values were accommodation distance of 1 D, equal to 1 m, and convergence distance of 0.7 D, equal to 1.43 m. Convergence was consistent with the distance to the object, but accommodation was focused a little beyond the object (about 0.3 D).
Fig. 6. Example of measurement: subject A
368
T. Shiomi et al.
Fig. 7. Example of measurement: subject B
4 Discussion In this experiment, we used the WAM-5500 and the EMR-9. The experiment using the WAM-5500 examined the performance, and measurements with accuracy of 0.01D ± 0.38D were possible by examining the results of measurements with WAM5500 from the agreement with subjective findings within the range from -6.38 to +4.88D [9]. Eyestrain and transient myopia were also investigated using the WAM5500 [12, 13]. Experiments that examined the accuracy of DCC (dynamic cross cylinder) have also been conducted, and significant differences in the values of test and measurement data were found. The reliability of DCC was questioned [14]. Queiros et al. [8] investigated the influence of the lens on the adjustment for paralysis and hyperopia using the WAM-5500. With respect to the eye mark recorder, Egami et al. [11] investigated differences according to the age in tiredness and the learning effect, showing several kinds of pictures. Sasaki [10] tried forecasting people’s movements from the data of the glance obtained from the eye mark recorder, and improved running of a support robot based on it. In addition, Nakashima et al. [15] examined the possibility of early diagnosis of dementia from senior citizens’ eye movements with the eye mark recorder. The eye mark recorder has thus been used in various types of research. As mentioned above, much research has investigated the performance and characteristics of these instruments, and experiments using them have been conducted. In this experiment, we measured the accommodation distance and the convergence distance while subjects watched an object. We calculated convergence distance based on coordinated data for both eyes from pupil distance. Our results showed that subjects’ accommodation and convergence changed to a position between a near and far position from them while they were gazing at the object. Moreover, these changes occurred at a constant cycle, tuned to the movement of the object. Therefore, subjects viewed the object with binocular vision, and we could measure the results. The accommodation weakened about 0.3 D when there was
Simultaneous Measurement of Lens Accommodation and Convergence to Real Objects
369
an object in the furthest position and the point of 1 m. This indicates that the lens may not be accommodated strictly at about 0.4D, nearly in agreement with our previous findings [16]. While convergence was almost consistent with the distance from the subject to the object, accommodation was often located a little beyond the object. This is thought to originate from the fact that the index is seen even if focus is not accurate because of the depth of field.These measurements were done in healthy young males. In this case, it can be said that accommodation and convergence were consistent with distance to the object when the subjects were gazing at the object. Further investigation is needed to see whether the same results will be obtained in different conditions, such as when the subjects are woman, not emmetropic, or older. In conclusion, it was possible to simultaneously measure both accommodation and convergence when subjects were gazing at an object. It can be said that the present measurement method is an effective technique for the measurement of visual function, and that correct values can be obtained even during stereoscopic vision. Additionally, in future studies, higher quality evaluation of 3-dimensional images will be possible by comparing subjects when they are viewing a 3-dimensional image and when they are viewing the actual object.
References 1. Donders, F.C.: On the Anomalies of Accommodation and Refraction of the Eye. New Sydenham Soc., London (1972); Reprinted by Milfort House, Boston (1864) 2. Fincham, E.F.: The mechanism of Accommodation. Br. J. Ophthalmol. 21, monograph supp. 8 (1937) 3. Krishman, V.V., Shirachi, D., Stark, L.: Dynamic measures of vergence accommodation. Am. J. Optom. Physiol. Opt. 54, 470–473 (1977) 4. Ukai, K., Tanemoto, Y., Ishikawa, S.: Direct recording of accommodative response versus accommodative stimulus. In: Breinin, G.M., Siegel, I.M. (eds.) Advances in Diagnostic Visual Optic, pp. 61–68. Springer, Berlin (1983) 5. Cho, A., Iwasaki, T., Noro, K.: A study on visual characteristics binocular 3-D images. Ergonomics 39(11), 1285–1293 (1996) 6. Sierra, R., et al.: Improving 3D Imagerywith Variable Convergence and Focus Accommodation for the Remote Assessment of Fruit Quality. In: SICE-ICASE International Joint Conference 2006, pp. 3553–3558 (2006) 7. Miyao, M., et al.: Visual accommodation and subject performance during a stereographic object task using liquid crystal shutters. Ergonomics 39(11), 1294–1309 (1996) 8. Queirós, A., González-Méijome, J., Jorge, J.: Influence of fogging lenses and cycloplegia on open-field automatic refraction. Ophthal. Physiol. Opt. 28, 387–392 (2008) 9. Sheppard, A.L., Davies, L.N.: Clinical evaluation of the Grand Seiko Auto Ref/Keratometer WAM-5500. Ophthal. Physiol. Opt. 30, 143–151 (2010) 10. Sakaki, T.: Estimation of Intention of User Arm Motion for the Proactive Motion of Upper Extramity Supporting Robot. In: 2009 IEEE 11th International Conference on Rehabilitation Robotics, Kyoto International Conference Center, Japan, June 23-26 (2009) 11. Egami, C., Morita, K., Ohya, T., Ishii, Y., Yamashita, Y., Matsuishi, T.: Developmental characteristics of visual cognitive function during childhood according to exploratory eye movements. Brain & Development 31, 750–757 (2009)
370
T. Shiomi et al.
12. Tosha, C., Borsting, E., Ridder, W.H., Chase, C.: Accommodation response and visual discomfort. Ophthal. Physiol. Opt. 29, 625–633 (2009) 13. Borsting, E., Tosha, C., Chase, C., Ridder, W.H.: Measuring Near-Induced Transient Myopia in College Students with Visual Discomfort. American Academy of Optometry 87(10) (2010) 14. Benzoni, J.A., Collier, J.D., McHugh, K., Rosenfield, M., Portello, J.K.: Does the dynamic cross cylinder test measure the accommodative response accurately? Optometry 80, 630– 634 (2009) 15. Nakashima, Y., Morita, K., Ishii, Y., Shouji, Y., Uchimura, N.: Characteristics of exploratory eye movements in elderly people: possibility of early diagnosis of dementia. Psychogeriatrics 10, 124–130 (2010) 16. Miyao, M., Otake, Y., Ishihara, S., Kashiwamata, M., Kondo, T., Sakakibara, H., Yamada, S.: An Experimental Study on the Objective Measurement of Accommodative Amplitude under Binocular and Natural Viewing Conditions. Tohoku J. Exp. Med. 170, 93–102 (1993)
Comparison in Degree of the Motion Sickness Induced by a 3-D Movie on an LCD and an HMD Hiroki Takada1,2, Yasuyuki Matsuura3, Masumi Takada2, and Masaru Miyao3 1
Graduate School of Engineering, University of Fukui, 3-9-1 Bunkyo, Fukui 910-8507, Japan 2 Aichi Medical University, 21 Iwasaku Karimata, Nagakute, Aichi 480-1195, Japan 3 Nagoya University, Furo-cho, Chikusa-Ku, Nagoya 464-8601, Japan
[email protected]
Abstract. Three-dimensional (3D) television sets are already on the market and are becoming increasingly popular among consumers. Watching stereoscopic 3D movies, though, can produce certain adverse affects such as asthenopia and motion sickness. Visually induced motion sickness (VIMS) is considered to be caused by an increase in visual-vestibular sensory conflict while viewing stereoscopic images. VIMS can be analyzed both psychologically and physiologically. According to our findings reported at the last HCI International conference, VIMS could be detected with the total locus length and sparse density, which were used as analytical indices of stabilograms. In the present study, we aim to analyze the severity of motion sickness induced by viewing conventional 3D movies on a liquid crystal display (LCD) compared to that induced by viewing these movies on a head-mounted display (HMD). We quantitatively measured the body sway in a resting state and during exposure to a conventional 3D movie on an LCD and HMD. Subjects maintained the Romberg posture during the recording of stabilograms at a sampling frequency of 20 Hz. The simulator sickness questionnaire (SSQ) was completed before and immediately after exposure. Statistical analyses were applied to the SSQ subscores and to the abovementioned indices (total locus length and sparse density) for the stabilograms. Friedman tests showed the main effects in the indices for the stabilograms. Multiple comparisons revealed that viewing the 3D movie on the HMD significantly affected the body sway, despite a large visual distance. Keywords: visually induced motion sickness, stabilometry, sparse density, liquid crystal displays (LCDs), head-mounted displays (HMDs).
1 Introduction The human standing posture is maintained by the body’s balance function, which is an involuntary physiological adjustment mechanism called the “righting reflex” [1]. This righting reflex, which is centered in the nucleus ruber, is essential to maintain the standing posture when locomotion is absent. The body’s balance function utilizes sensory signals such as visual, auditory, and vestibular inputs, as well as proprioceptive R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 371–379, 2011. © Springer-Verlag Berlin Heidelberg 2011
372
H. Takada et al.
inputs from the skin, muscles, and joints [2]. The evaluation of this function is indispensable for diagnosing equilibrium disturbances like cerebellar degenerations, basal ganglia disorders, or Parkinson’s disease [3]. Stabilometry has been employed for a qualitative and quantitative evaluation of this equilibrium function. A projection of a subject’s center of gravity onto a detection stand is measured as an average of the center of pressure (COP) of both feet. The COP is traced for each time step, and the time series of the projections is traced on an x-y plane. By connecting the temporally vicinal points, a stabilogram is created, as shown in Fig. 1. Several parameters are widely used in clinical studies to quantify the degree of instability in the standing posture: for instance, the area of sway (A), total locus length (L), and locus length per unit area (L/A). It has been revealed that the last parameter is particularly related to the fine variations involved in posture control [1]. Thus, the L/A index is regarded as a gauge for evaluating the function of proprioceptive control of standing in human beings. However, it is difficult to clinically diagnose disorders of the balance function and identify the decline in equilibrium function by utilizing the abovementioned indices and measuring patterns in a stabilogram. Large interindividual differences might make it difficult to understand the results of such a comparison. Mathematically, the sway in the COP is described by a stochastic process [4]–[6]. We examined the adequacy of using a stochastic differential equation (SDE) and investigated the most adequate equation for our research. G(x), the distribution of the observed point x, is related in the following manner to V(x), the (temporally averaged) potential function, in the SDE, which has been considered to be a mathematical model of sway:
r r 1 V ( x ) = − ln G( x ) + const. 2
(1)
The nonlinear property of SDEs is important [7]. There are several minimal points of potential. In the vicinity of these points, local stable movement with a highfrequency component can be generated as a numerical solution to the SDE. We can therefore expect a high density of observed COP in this area on the stabilogram. The analysis of stabilograms is useful not only for medical diagnoses but also for achieving upright standing control in two-legged robots and preventing falls in elderly people [8]. Recent studies have suggested that maintaining postural stability is one of the major goals of animals, [9] and that they experience sickness symptoms in circumstances wherein they have not acquired strategies to maintain their balance [10]. Although the most widely known theory of motion sickness is based on the concept of sensory conflict [10]–[12], Riccio and Stoffregen [10] argued that motion sickness is instead caused by postural instability. Stoffregen and Smart (1999) reported that the onset of motion sickness may be preceded by significant increases in postural sway [13]. The equilibrium function in humans deteriorates when viewing 3-dimensional (3D) movies [14]. It has been considered that this visually induced motion sickness (VIMS) is caused by a disagreement between vergence and visual accommodation while viewing 3D images [15]. Thus, stereoscopic images have been devised to reduce this disagreement [16]–[17].
Comparison in Degree of the Motion Sickness Induced by a 3-D Movie
373
VIMS can be measured by psychological and physiological methods, and the simulator sickness questionnaire (SSQ) is a well-known psychological method for measuring the extent of motion sickness [18]. The SSQ is used in this study to verify the occurrence of VIMS. The following parameters of autonomic nervous activity are appropriate for the physiological method: heart rate variability, blood pressure, electrogastrography, and galvanic skin reaction [19]–[21]. It has been reported that a wide stance (with the midlines of the heels 17 or 30 cm apart) significantly increases the total locus length in the stabilograms of individuals with high SSQ scores, while the length for individuals with low scores is less affected by such a stance [22]. In our report at the last HCI International 2011, we reported that VIMS could be detected by the total locus length and sparse density, which were used as the analytical indices of stabilograms [23]. The objective of the present study is to compare the degree of motion sickness induced by viewing a conventional 3D movie on a liquid crystal display (LCD) with that from viewing a 3D movie on a head-mounted display (HMD). We quantitatively measured body sway during the resting state, exposure to a 3D movie on an LCD, and that on an HMD.
2 Material and Methods Ten healthy subjects (age, 23.6 ± 2.2 years) voluntarily participated in the study. All of them were Japanese and lived in Nagoya and its surrounding areas. They provided informed consent prior to participation. The following subjects were excluded from the study: subjects working night shifts, those dependent on alcohol, those who consumed alcohol and caffeine-containing beverages after waking up and less than 2 h after meals, those using prescribed drugs, and those who may have had any otorhinolaryngologic or neurological disease in the past (except for conductive hearing impairment, which is commonly found in the elderly). In addition, the subjects had to have experienced motion sickness at some time during their lives. We ensured that the body sway was not affected by environmental conditions. Using an air conditioner, we adjusted the temperature to 25 °C in the exercise room, which was kept dark. All the subjects were tested in this room from 10 a.m. to 5 p.m. Three kinds of stimuli were presented in random order: (I) a static circle with a diameter of 3 cm (resting state); (II) a conventional 3D movie that showed a sphere approaching and moving away from the subjects, irregularly; and (III) the same motion picture as shown in (II). Stimuli (I) and (II) were presented on an LCD monitor (S1911- SABK, NANAO Co., Ltd.). The distance between the LCD and the subjects was 57 cm. On the other hand, the subjects wore an HMD (iWear AV920; Vuzix Co. Ltd.) during exposure to the last movie (III). This wearable display is equivalent to a 62-inch screen viewed at a distance of 2.7 m. The subjects stood without moving on the detection stand of a stabilometer (G5500; Anima Co. Ltd.) in the Romberg posture, with their feet together for 1 min before the sway was recorded. Each sway of the COP was then recorded at a sampling frequency of 20 Hz; the subjects were instructed to maintain the Romberg posture for the first 60 s. The subjects viewed one of the stimuli, that is, (I), (II), or (III), from the beginning till the end. They filled out an SSQ before and after the test.
374
H. Takada et al.
We calculated several indices that are commonly used in the clinical field [24] for stabilograms, including the “area of sway,” “total locus length,” and “total locus length per unit area.” In addition, new quantification indices that were termed SPD S2, S3, and total locus length of chain [25] were also estimated.
3 Results The SSQ results are shown in Table 1 and include the nausea (N), oculomotor discomfort (OD), and disorientation (D) subscale scores, along with the total score (TS) of the SSQ. No statistical differences were seen in these scores among the stimuli presented to the subjects.
(a)
(c)
(b)
Fig. 1. Typical stabilograms observed when subjects viewed the static circle (a), the conventional 3D movie on the LCD (b), and the same 3D movie on the HMD (c). Table 1. Table 1 Subscales of SSQ after exposure to 3D movies Movies N OD D TS
(II) 14.3 ± 4.8 16.7 ± 4.0 22.3 ± 9.3 19.8 ± 5.8
(III) 11.4 ± 3.7 18.2 ± 4.1 23.7 ± 8.8 19.8 ± 5.3
Comparison in Degree of the Motion Sickness Induced by a 3-D Movie
(a)
375
(b)
(**p < 0.01, *p < 0.05, Ⴄp < 0.1)
Fig. 2. Typical results of Nemenyi tests for the following indicators: total locus length (a) and SPD (b)
However, there were increases in the scores after exposure to the conventional 3D movies. Although there were large individual differences, sickness symptoms seemed to appear more often with the 3D movies. Typical stabilograms are shown in Fig. 1. In these figures, the vertical axis shows the anterior and posterior movements of the COP, and the horizontal axis shows the right and left movements of the COP. The sway amplitudes that were observed during exposure to the movies (Fig. 1b–1c) tended to be larger than those of the control sway (Fig. 1a). Although a high COP density was observed in the stabilograms (Fig. 1a), this density decreased during exposure to the movies (Fig. 1b–1c). According to the Friedman test, the main effects were seen in the indices of the stabilograms, except for the chain (p < 0.01). Nemenyi tests were employed as a posthoc procedure after the Friedman test (Fig. 2). Five of the six indices were enhanced significantly by exposure to the 3D movie on the HMD (p < 0.05). Except for the total locus length, there was no significant difference between the values of the indices measured during the resting state and exposure to the 3D movie on the LCD (p < 0.05).
4 Discussion A theory has been proposed to obtain SDEs as a mathematical model of body sway on the basis of stabilograms. We questioned whether the random force vanished from the mathematical model of the body sway. Using our Double-Wayland algorithm [26]–[27], we evaluated the degree of visible determinism in the dynamics of the COP sway. Representative results of the Double-Wayland algorithm are shown in Fig. 3. We calculated translation errors Etrans derived from the time series x (Fig. 3a, 3c). The translation errors Etrans’ were also derived from their temporal differences (differenced time series). Regardless of whether a subject was exposed to the 3D movie on the HMD (III), the Etrans’ was approximately 1.
376
H. Takada et al.
Time Series Differenced Time Series
Time Series Differenced Time Series
trans)
1.5
Translation Error (E
Translation Error (E trans)
1.5
1.0
0.5
1.0
0.5
(b)
(a) 0.0
1
2
3
4
5
6
7
8
9
0.0
10
1
2
Dimension of Embedding space
Time Series Differenced Time Series
4
5
6
7
8
9
10
Time Series Differenced Time Series
trans)
1.5
Translation Error (E
Translation Error (E trans)
1.5
3
Dimension of Embedding space
1.0
0.5
1.0
0.5
(c) 0.0
1
(d) 2 3 4 5 6 7 8 9 Dimension of Embedding space
10
0.0
1
2
3
4
5
6
7
8
9
10
Dimension of Embedding space
Fig. 3. Mean translation error for each embedding space. The translation errors were estimated from the stabilograms that were observed when subjects viewed a static circle (a)–(b) and conventional 3D movie on the HMD (c)–(d). We derived the values from the time series y (b), (d).
The translation errors in each embedding space were not significantly different from those derived from time series x and y. Thus, Etrans > 0.5 was obtained using the Wayland algorithm, which implies that the time series could be generated by a stochastic process in accordance with a previous standard [28]. This 0.5 threshold is half of the translation error resulting from a random walk. Body sway has previously been described by stochastic processes [4]–[7], which were shown using the DoubleWayland algorithm [29]. Moreover, 0.8 < Etrans’ < 1 exceeded the translation errors Etrans estimated by the Wayland algorithm, as shown in Fig. 3b. However, the translation errors estimated by the Wayland algorithm were similar to those obtained from the temporal differences, except for the case in Fig. 3b, which agrees with the abovementioned explanation of the dynamics for controlling a standing posture. The exposure to 3D movies would not change the dynamics into a deterministic one. Mechanical variations were not observed in the locomotion of the COP. We assumed that the COP was controlled by a stationary process, and the sway during exposure to the static control image (I) could be compared with that when the subject viewed the
Comparison in Degree of the Motion Sickness Induced by a 3-D Movie
377
conventional 3D movie on the HMD. The indices for the stabilograms might reflect the coefficients in stochastic processes, although no significant difference in translation error was seen in a comparison of the stabilograms measured during exposure to (I) and (III). Regarding the system to control our standing posture during exposure to the 3D movie on the LCD (II), similar results were obtained. The anterior-posterior direction y was considered to be independent of the medial-lateral direction x [30]. SDEs on the Euclid space E2 (x, y)
∋
U x
w t
(2)
U y
w t
(3)
have been proposed as mathematical models for generating stabilograms [4]–[7]. Pseudorandom numbers were generated by the white noise terms wx(t) and wy(t). Constructing nonlinear SDEs from the stabilograms (Fig. 1) in accordance with Eq. (1) revealed that their temporally averaged potential functions, Ux and Uy, have plural minimal points, and fluctuations can be observed in the neighborhood of these points [7]. The variance in the stabilogram depends on the form of the potential function in the SDE; therefore, the SPD is regarded as an index for its measurement. Regardless of the display on which the 3D movies were presented, multiple comparisons indicated that the total locus length during exposure to the stereoscopic movies was significantly larger than that during the resting state (Fig. 2b). As shown in Fig. 1b and 1c, obvious changes in the form and coefficients of the potential function (1) occur. Structural changes might occur in the time-averaged potential function (1) with exposure to stereoscopic images, which are assumed to reflect the sway in the center of gravity. We considered that the decrease in the gradient of the potential increased the total locus length of the stabilograms during exposure to the stereoscopic movies. The standing posture becomes unstable because of the effects of the stereoscopic movies. Most of the indices during exposure to the 3D movie on the HMD were significantly greater than those in the resting state, although there was no significant difference between the indices of the stabilograms during the resting state and those during exposure to the 3D movie on the LCD (Fig. 2). In this study, the apparent size of the LCD was greater than that of the HMD. Despite the size and visual distance, the 3D movie on the HMD affected the subject’s equilibrium function. Hence, by using the indicators involved in the stabilograms, we noted postural instability during exposure to the conventional stereoscopic images on the HMD. The next step will involve an investigation with the goal of proposing guidelines for the safe viewing of 3D movies on HMDs.
References 1. Okawa, T., Tokita, T., Shibata, Y., Ogawa, T., Miyata, H.: Stabilometry - Significance of locus length per unit area (L/A) in patients with equilibrium disturbances. Equilibrium Res. 55(3), 283–293 (1995) 2. Kaga, K., Memaino, K.: Structure of vertigo. Kanehara, Tokyo 23-26, 95–100 (1992)
378
H. Takada et al.
3. Okawa, T., Tokita, T., Shibata, Y., Ogawa, T., Miyata, H.: Stabilometry - Significance of locus length per unit area (L/A). Equilibrium Res. 54(3), 296–306 (1996) 4. Collins, J.J., De Luca, C.J.: Open-loop and closed-loop control of posture: A random-walk analysis of center of pressure trajectories. Exp. Brain Res. 95, 308–318 (1993) 5. Emmerrik, R.E.A., Van Sprague, R.L., Newell, K.M.: Assessment of sway dynamics in tardive dyskinesia and developmental disability: Sway profile orientation and stereotypy. Moving Disorders 8, 305–314 (1993) 6. Newell, K.M., Slobounov, S.M., Slobounova, E.S., Molenaar, P.C.: Stochastic processes in postural center-of-pressure profiles. Exp. Brain Res. 113, 158–164 (1997) 7. Takada, H., Kitaoka, Y., Shimizu, Y.: Mathematical index and model in stabilometry. Forma 16(1), 17–46 (2001) 8. Fujiwara, K., Toyama, H.: Analysis of dynamic balance and its training effect - Focusing on fall problem of elder persons. Bulletin of the Physical Fitness Research Institute 83, 123–134 (1993) 9. Stoffregen, T.A., Hettinger, L.J., Haas, M.W., Roe, M.M., Smart, L.J.: Postural instability and motion sickness in a fixed-base flight simulator. Human Factors 42, 458–469 (2000) 10. Riccio, G.E., Stoffregen, T.A.: An ecological theory of motion sickness and postural instability. Ecological Physiology 3(3), 195–240 (1991) 11. Oman, C.: A heuristic mathematical model for the dynamics of sensory conflict and motion sickness. Acta Otolaryngologica Supplement 392, 1–44 (1982) 12. Reason, J.: Motion sickness adaptation: A neural mismatch model. J. Royal Soc. Med. 71, 819–829 (1978) 13. Stoffregen, T.A., Smart, L.J., Bardy, B.J., Pagulayan, R.J.: Postural stabilization of looking. Journal of Experimental Psychology. Human Perception and Performance 25, 1641–1658 (1999) 14. Takada, H., Fujikake, K., Miyao, M., Matsuura, Y.: Indices to detect visually induced motion sickness using stabilometry. In: Proc. VIMS 2007, pp. 178–183 (2007) 15. Hatada, T.: Nikkei electronics, vol. 444, pp. 205–223 (1988) 16. Yasui, R., Matsuda, I., Kakeya, H.: Combining volumetric edge display and multiview display for expression of natural 3D images. In: Proc. SPIE, vol. 6055, pp. 0Y1–0Y9 (2006) 17. Kakeya, H.: MOEVision: Simple multiview display with clear floating image. In: Proc. SPIE, vol. 6490, p. 64900J (2007) 18. Kennedy, R.S., Lane, N.E., Berbaum, K.S., Lilienthal, M.G.: A simulator sickness questionnaire (SSQ): A new method for quantifying simulator sickness. International J. Aviation Psychology 3, 203–220 (1993) 19. Holomes, S.R., Griffin, M.J.: Correlation between heart rate and the severity of motion sickness caused by optokinetic stimulation. J. Psychophysiology 15, 35–42 (2001) 20. Himi, N., Koga, T., Nakamura, E., Kobashi, M., Yamane, M., Tsujioka, K.: Differences in autonomic responses between subjects with and without nausea while watching an irregularly oscillating video. Autonomic Neuroscience. Basic and Clinical 116, 46–53 (2004) 21. Yokota, Y., Aoki, M., Mizuta, K.: Motion sickness susceptibility associated with visually induced postural instability and cardiac autonomic responses in healthy subjects. Acta Otolaryngologia 125, 280–285 (2005) 22. Scibora, L.M., Villard, S., Bardy, B., Stoffregen, T.A.: Wider stance reduces body sway and motion sickness. In: Proc. VIMS 2007, pp. 18–23 (2007)
Comparison in Degree of the Motion Sickness Induced by a 3-D Movie
379
23. Fujikake, K., Miyao, M., Watanabe, T., Hasegawa, S., Omori, M., Takada, H.: Evaluation of body sway and the relevant dynamics while viewing a three-dimensional movie on a head-mounted display by using stabilograms. In: Shumaker, R. (ed.) VMR 2009. LNCS, vol. 5622, pp. 41–50. Springer, Heidelberg (2009) 24. Suzuki, J., Matsunaga, T., Tokumatsu, K., Taguchi, K., Watanabe, Y.: Q&A and a manual in stabilometry. Equilibrium Res. 55(1), 64–77 (1996) 25. Takada, H., Kitaoka, Y., Ichikawa, S., Miyao, M.: Physical meaning on geometrical index for stabilometry. Equilibrium Res. 62(3), 168–180 (2003) 26. Wayland, R., Bromley, D., Pickett, D., Passamante, A.: Recognizing determinism in a time series. Phys. Rev. Lett. 70, 530–582 (1993) 27. Takada, H., Morimoto, T., Tsunashima, H., Yamazaki, T., Hoshina, H., Miyao, M.: Applications of Double-Wayland algorithm to detect anomalous signals. FORMA 21(2), 159–167 (2006) 28. Matsumoto, T., Tokunaga, R., Miyano, T., Tokuda, I.: Chaos and time series, Baihukan, Tokyo, pp. 49–64 (2002) (in Japanese) 29. Takada, H., Shimizu, Y., Hoshina, H., Shiozawa, Y.: Wayland tests for differenced time series could evaluate degrees of visible determinism. Bulletin of Society for Science on Form 17(3), 301–310 (2005) 30. Goldie, P.A., Bach, T.M., Evans, O.M.: Force platform measures for evaluating postural control: Reliability and validity. Arch. Phys. Med. Rehabil. 70, 510–517 (1989)
Evaluation of Human Performance Using Two Types of Navigation Interfaces in Virtual Reality Luís Teixeira1, Emília Duarte2, Júlia Teles3, and Francisco Rebelo1 1
Ergonomics Laboratory. FMH/Technical University of Lisbon, Estrada da Costa, 1499-002 Cruz Quebrada - Dafundo, Portugal 2 UNIDCOM/IADE – Superior School of Design, Av. D. Carlos I, no. 4, 1200-649 Lisbon, Portugal 3 Mathematics Unit. FMH/Technical University of Lisbon, Estrada da Costa, 1499-002 Cruz Quebrada - Dafundo, Portugal {lmteixeira,jteles,frebelo}@fmh.utl.pt,
[email protected]
Abstract. Most of Virtual Reality related studies use a hand-centric device as a navigation interface. Since this could be a problem when is required to manipulate objects or it can even distract a participant from other tasks if he has to “think” on how to move, a more natural and leg-centric interface seems more appropriate. This study compares human performance variables (distance travelled, time spent and task success) when using a hand-centric device (Joystick) and a leg-centric type of interface (Nintendo Wii Balance Board) while interacting in a Virtual Environment in a search task. Forty university students (equally distributed in gender and number by experimental conditions) participated in this study. Results show that participants were more efficient when performing navigation tasks using the Joystick than with the Balance Board. However there were no significantly differences in the task success. Keywords: Virtual Reality, Navigation interfaces, Human performance.
1 Introduction The most common navigation interfaces used in Virtual Reality (VR) are hand-centric. This might pose a problem when besides navigating in the Virtual Environment (VE), it is also required to further interact with it, for example, manipulate objects. It can also be a problem if the hand-centric navigation can distract in some way a participant from other tasks in the VE, i.e. when the participant has to “think” on how to move. Also, and since a hand-centric interface does not reproduce a natural navigation movement, it might not allow similar performance as other type of interface, such as one that makes use of legs or feet movement to represent motion. Because of the abovementioned limitations, navigation interfaces are an important issue for VR and, as such, several attempts in creating new types of interface have been done (e.g. [1], [2], [3]). Slater, Usoh et al. [1] used a walk-in-place technique to navigate in the VE. Peterson, Wells et al. [2] presented a body-controller interface called Virtual Motion Controller that uses the body to generate motion commands. R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 380–386, 2011. © Springer-Verlag Berlin Heidelberg 2011
Evaluation of Human Performance Using Two Types of Navigation Interfaces in VR
381
Beckhaus, Blom and Haringer [3] propose two different types of interface for navigation in VEs. One is based in a dance pad usually used for dance games. The other interface proposed is a chair-based interface. Another possible solution might pass by using already existing interfaces (e.g. interfaces used for game consoles) in a new perspective, as navigation interfaces for VR. In recent years, new game interfaces have been created which are more entertaining and involving than previous ones (for example the Nintendo Wiimote interface when comparing to a more traditional gamepad). This new type of interface can be adapted and used for VR navigation or for some other kind of interaction, giving it a new meaning. Although most of these new interfaces are hand-centric, there is one interface, the Nintendo® Wii Balance Board [4] (Balance Board hereafter), that uses the weight of the person and can be controlled by using the lower-body of the user, making it a more close to a natural type of navigational interface. Hilsendeger, Brandauer, Tolksdorf and Fröhlich [5] used the Balance Board as a navigational interface for VEs. The solution they proposed was by defining a uservector of the movement that would translate to movement within the VE, based on the pressure made in each of the four sensors. They presented two forms of navigation in the VE: direct control of speed and acceleration mode. The direct control of speed uses the leaning of the participant on the platform to create the desired movement and speed in the VE. The acceleration mode only requires that the participant leans in the direction they pretend to go, which would create the desired acceleration for the movement. After that, the participant could stand still and the velocity would be the same. However, there are few studies that compare the users’ performance in VE when using a leg-centric and a hand-centric navigation interface. So, the main objective of this study is to compare two types of navigational interface (Balance Board and Joystick) in VEs using performance variables as time spent, distance travelled and task success. It was hypothesized that individuals that use the Balance Board as a navigation interface have better performance in the search task.
2 Method 2.1 Study’s Design and Protocol To test the formulated hypothesis, it was developed an experimental study with two conditions, Balance Board and Joystick. These experimental conditions were evaluated taking into account a search task in a VE using the following performance variables: time spent, distance travelled and task success. The study used a between-subject design. The selection criteria for the participants of this study only allowed users that were university students (between 18 and 35 years old), had fluency in the Portuguese language, had no color vision deficiencies (tested by the Ishihara Test [6]), could not use glasses (corrective lenses were allowed) because the Head-Mounted Display did not allow the use of glasses, and participants that reported being in good physical and mental health.
382
L. Teixeira et al.
The experimental session was divided in four stages: (1) signing of consent form and introduction to the study; (2) training; and (3) simulation (4) open-ended questions. (1) In the beginning of the session, participants signed a consent form and were advised that they could end the experiment at any time. In this part of the experimental session they were also introduced to the study and to the equipment in a way to learn how they would use it to interact with the simulation. Participants were told that we were testing new VR software that automatically captures human interaction’s data. This was told in order to reduce the possibility of any bias from the participants while trying to deliberately perform better with the specific navigation interface. (2) Participants were placed in a training VE to familiarize with the equipment. The environment contained a small room with a pillar in the center of it and a connection to a zigzag type of corridor. Participants were told that they could explore the area freely until they felt able to control the navigational interface. After that, the researcher asked them to specifically go around the pillar in both directions and go to the end of the corridor. If the participant could achieve these small goals without difficulties, the researcher would consider that the participant was able to do the simulation. (3) The scenario was an end-of-day routine check where the participants had as the main task to push six buttons in the VE. There were messages in boards on the VE that directed the participant to the buttons. They were also told that the first instruction was in the “Meeting Room”. The total number of buttons was omitted in the instructions. The simulation ended after 20 minutes (the researcher would stop the simulation if the participant seemed lost in the environment) or if the participant reached a specific end point in the VE after activating a certain trigger. After the simulation, participants were interviewed (open-ended questions) about the difficulties that they experienced while immersed and about their overall opinion regarding the interaction quality. 2.2 Sample Forty university students (20 males and 20 females) participated and were equally distributed in gender and number by two experimental conditions. The participants declared that they have not used the Balance Board before. For the Joystick condition, participants had between 19 and 34 years old (mean = 22.9, SD = 3.32), and for the Balance Board condition, they had between 18 and 29 years old (mean = 21.10, SD = 3.11). 2.3 Virtual Environment The VE was designed in the idea to try to promote the immersion of the participants and also to try to create a more natural interaction with the navigational interfaces at study. As such, the VE was an office building, containing four symmetrical rooms (meeting room, laboratory, cafeteria and warehouse), each measuring 12 by 12 meter.
Evaluation of Human Performance Using Two Types of Navigation Interfaces in VR
383
The rooms were separated by two perpendicular axes of corridors and circumvented by another corridor, 2 meter wide each. There were six buttons placed on the walls distributed in the VE. The participants were directed to each button through messages with instructions placed on boards in each room. An orientation signage system was also designed in order to help participants find the respective rooms. These signs were wall-mounted directional signs, in panels, with pictorials, arrows and verbal information. The VE was modeled in Autodesk 3dsMax v2009 and exported through the plugin Ogremax v1.6.23 and presented by the ErgoVR software. 2.4 Equipment The equipment used for this study in both experimental conditions was: (a) two magnetic motion trackers from Ascension-Tech®, model Flock of Birds, with 6DOF, used for the motion detection of the head and arm; (b) Head-Mounted Display from Sony®, model PLM-S700E; (c) Wireless headphones from Sony®, model MDRRF800RK; (d) Graphics Workstation with an Intel® i7 processor, 8 Gigabytes of RAM and a nVIDIA® QuadroFX4600. For the Balance Board condition it was used a Nintendo® Wii Balance Board as a navigation interface and for the Joystick condition was used a Thrustmaster® USB Joystick. 2.5 Navigation The Balance Board (see Fig. 1) has four pressure sensors in its corners that are used to measure the user’s center of balance and weight. The center of balance is the projection of the center of mass over the Balance Board platform. That projection can be used as a reference to the corresponding movement in the VE.
Fig. 1. Images of the Nintendo Wii Balance Board. Seen from above on the Left and seen from below on the Right (images from Nintendo®’s official website and the manual).
The solution used in this study for the navigation, regarding the Balance Board, is similar to the direct control of speed mentioned by Hilsendeger, Brandauer, Tolksdorf and Fröhlich [5]. That is, the navigation is made by leaning on the platform, by applying more pressure on different areas of it. If the participant wants to move
384
L. Teixeira et al.
forward (or backward) in the VE, he/she just needs to apply more pressure on the forward (or backward) sensors of the platform. If the participant applies more pressure on the left or the right sensors of the platform, the virtual body will rotate over its own axis. The forward or backward plus leaning left or right movement’s combination has the expected result which is that the virtual body will move forward or backward while rotating left or right. The same leaning movement principle was used for the navigation regarding the Joystick.
3 Results The variables that were automatically collected by the ErgoVR software [7] were the Time spent, Distance travelled and Task success. Time spent is the time, in seconds, and Distance travelled is the distance, in meters, that goes from the start of the simulation to the moment when the participant reached the trigger (or they decided to stop the simulation). Task success is given for the number of pressed buttons at the end of the simulation. Due to the violation of normality assumptions, the Mann-Whitney test was used to compare the two conditions (Joystick and Balance Board interfaces) concerning the performance variables (total distance travelled, time spent in the simulation and the success rate of the search task). The statistical analysis, performed in IBM® SPSS® Statistics v19, was conducted at a 5% significance level. Results show that the time spent (see Fig. 2) by the Joystick users in the simulation (mdn = 382.25 s) was significantly lower than for the Balance Board users (mdn = 605.05 s), with U = 82.0, z = -3.192 and p < 0.001.
Fig. 2. Boxplot regarding Time Spent, in seconds, by experimental condition
Regarding the distance travelled (see Fig. 3), Joystick users (mdn = 265.46 m) travelled significantly less distance than Balance Board users (mdn = 344.32 m), with U = 95.0, z = -2.840 and p = 0.002.
Evaluation of Human Performance Using Two Types of Navigation Interfaces in VR
385
Fig. 3. Boxplot regarding Distance travelled, in meters, by experimental condition
For the task success (see Fig. 4), i.e. number of pressed buttons, there were no significantly differences between the Joystick users (mdn = 4) and the Balance Board users (mdn = 3.5), with U = 153.5, z = -1.294 and p = 0.199.
Fig. 4. Boxplot regarding Task success by experimental condition
4 Discussion and Conclusion To verify that users’ that use the Balance Board as a navigation interface have better performance in a search task, it was made a comparison between two types of navigation interfaces (Balance Board and Joystick) according to performance variables (time spent, distance travelled and task success). The results show that the hypothesis was not verified because participants were more efficient concerning some performance variables, namely time spent and distance, when performing navigation tasks using the Joystick than with the Balance Board, but the results also show that there were no significant differences in the task success (number of pressed buttons). The higher times and distances when using the Balance Board can be connected with a somewhat not-natural rotational movement, but that did not affected the task success.
386
L. Teixeira et al.
Based on informal opinions from the participants at the end of the test, it was noticed a higher enthusiasm from those who interacted with the Balance Board, yet this interface also got the worst critics mainly because in some situations, the participants seemed frustrated with the navigation. The somewhat unnatural rotational movement might explain the differences found. The Balance Board results could also been affected by a lower adaptation time with the navigational interface. With the results gathered we can see at least three paths to continue to work on. The first would be the improvement of the navigational control with the Balance Board, especially the rotational movement that is not very natural and was the aspect that got worst critics from the participants. Participants stated that although after a while in the VE they did not have to “think” to make the forward or backward movement, when they had to change directions, especially in small spaces or when against a wall, they always had to “think” to make the appropriate movement. This fact could be the cause of the higher times and distances to perform the requested tasks. The second path that can be taken is that, even with these higher values of time and distance and some difficulties with the rotational movement, the participants’ sense of presence and immersion might be higher than with the joystick and as such should be investigated. The third would be to create a less subjective training moment, where would be possible to define different criteria, analyzed automatically and that would consider the participant able to control well enough the navigational controls before passing to the simulation moment. The present research provides evidence that interfaces used for games can be a viable option for VR-based studies when performance variables such as time spent and distance travelled are not important for the study but it is important the search task completion.
References 1. Slater, M., Usoh, M., Steed, A.: Taking steps: the influence of a walking technique on presence in virtual reality. ACM Trans. Comput.-Hum. Interact. 2(3), 201–219 (1995) 2. Peterson, B., Wells, M., Furness III, T.A., Hunt, E.: The Effects of the Interface on Navigation in Virtual Environments. In: Proceedings of Human Factors and Ergonomics Society 1998 Annual Meeting, pp. 1496–1505 (1981) 3. Beckhaus, S., Blom, K.J., Haringer, M.: Intuitive, Hands-free Travel Interfaces for Virtual Environments. In: New Directions in 3D User Interfaces Workshop of IEEE VR 2005, pp. 57–60. Shaker Verlag, Ithaca (2005) 4. Nintendo Wii Balance Board official website, http://www.nintendo.com/wii/console/accessories/balanceboard 5. Hilsendeger, A., Brandauer, S., Tolksdorf, J., Fröhlich, C.: Navigation in Virtual Reality with the Wii Balance Board. In: 6th Workshop on Virtual and Augmented Reality (2009) 6. Ishihara, S.: Test for Colour-Blindness, 38th edn. Kanehara & Co., Ltd., Tokyo (1988) 7. Teixeira, L., Rebelo, F., Filgueiras, E.: Human interaction data acquisition software for virtual reality: A user-centered design approach. In: Kaber, D.B., Boy, G. (eds.) Advances in Cognitive Ergonomics, pp. 793–801. CRC Press, Boca Raton (2010)
Use of Neurophysiological Metrics within a Real and Virtual Perceptual Skills Task to Determine Optimal Simulation Fidelity Requirements Jack Vice1, Anna Skinner1, Chris Berka3, Lauren Reinerman-Jones2, Daniel Barber2, Nicholas Pojman3, Veasna Tan3, Marc Sebrechts4, and Corinna Lathan1 1 AnthroTronix, Inc. 8737 Colesville Rd., L203 Silver Spring, MD 20910, USA Institute for Simulation and Training, University of Central Florida 3100 Technology Parkway, Orlando, FL 32826, USA 3 Advanced Brain Monitoring, Inc. 2237 Faraday Ave., Ste 100 Carlsbad, CA 92008, USA 4 The Catholic University of America, Department of Psychology, 4001 Harewood Rd., NE Washington, DC 20064, USA {askinner,jvice,clathan}@atinc.com, {lreinerm,dbarber}@ist.ucf.edu, {chris,npojman,vtan}@b-alert.com,
[email protected] 2
Abstract. The military is increasingly looking to virtual environment (VE) developers and cognitive scientists to provide virtual training platforms to support optimal training effectiveness within significant time and cost constraints. However, current methods for determining the most effective levels of fidelity in these environments are limited. Neurophysiological metrics may provide a means for objectively assessing the impact of fidelity variations on training. The current experiment compared neurophysiological and performance data for a real-world perceptual discrimination task as well as a similarlystructured VE training task under systematically varied fidelity conditions. Visual discrimination and classification was required between two militarilyrelevant (M-16 and AK-47 rifle), and one neutral (umbrella) stimuli, viewed through a real and virtual Night Vision Device. Significant differences were found for task condition (real world versus virtual, as well as visual stimulus parameters within each condition), within both the performance and physiological data.
1 Introduction The military is increasingly looking to VE developers and cognitive scientists to provide virtual training platforms to support optimal training effectiveness within significant time and cost constraints. However, validation of these environments and scenarios is largely limited to subjective reviews by warfighter subject matter experts (SMEs), who may not be fully aware of, or able to articulate, the cues they rely on during situation assessments and decision-making. Warfighters are trained to use a decision process referred to as the OODA loop: Observe, Orient, Decide, Act. In order to successfully execute appropriate actions, it is necessary to observe the R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 387–399, 2011. © Springer-Verlag Berlin Heidelberg 2011
388
J. Vice et al.
environment, using appropriate cues to develop accurate situational awareness and orient to contextual and circumstantial factors, before a decision can be made and acted upon. “Intuitive” decision-making relies on this process, but at a pace that is too rapid to be decomposed and assessed effectively using standard methods of during- and after-action review. The United States Marine Corps has recently developed a training program, known as Combat Hunter, which emphasizes observation skills in order to increase battlefield situational awareness and produce proactive small-unit leaders that possess a bias for action (Marine Corps Interim Publication, 2011). Although extensive theoretical and empirical research has been conducted examining the transfer of training from VEs to real world tasks (e.g., Lathan, Tracey, Sebrechts, Clawson, & Higgins 2002; Sebrechts, Lathan, Clawson, Miller, & Trepagnier, 2003), objective metrics of transfer are limited and there is currently a lack of understanding of the scientific principles underlying the optimal interaction requirements a synthetic environment should satisfy to ensure effective training. Existing methods of transfer assessment are for the most part limited to indirect, performance-based comparisons and subjective assessments, as well as assessments of the degree to which aspects of the simulator match the real world task environment. The method of fidelity maximization assumes that increased fidelity relates to increased transfer; however, in some cases, lower-fidelity simulators have been shown to provide effective training as compared to more expensive and complex high-fidelity simulators., and while the approach of matching core components and logical structure is promising, methods of determining which aspects of fidelity are most critical to training transfer for a given task are limited. Performance-based assessments are typically compared before and after design iterations in which multiple fidelity improvements have been implemented, making it difficult or impossible to identify which fidelity improvements correlate to improved training. Thus, a need exists for more objective and efficient methods of identifying optimal fidelity and interaction characteristics of virtual simulations for military training. In 2007, a potential method for determining fidelity requirements for training simulation component fidelity was proposed by Vice, Lathan, Lockerd, and Hitt. By comparing physiological response and behavior between real and VE training stimuli, Vice et al. hypothesized that such a comparison could potentially inform which types of fidelity will have the highest impact on transfer of training. Skinner et al (2010) expanded on this approach in the context of high-risk military training. Physiologically-based assessment metrics, such as eye-tracking and electroencephalogram (EEG) have been shown to provide reliable measures of cognitive workload (e.g., Berka et al., 2004) and attention allocation (Carroll, Fuchs, Hale, Dargue, & Buck, 2010), as well as cognitive processing changes due to fidelity and stimulus variations within virtual training environments (Crosby & Ikehara, 2006; Skinner, Vice, Lathan, Fidopiastis, Berka, & Sebrechts, 2009; Skinner, Sebrechts, Fidopiastis, Berka, Vice, & Lathan, 2010; Skinner, Berka, O’Hara-Long, & Sebrechts, 2010). Previous related research has demonstrated that event-related potentials (ERP’s) are sensitive to even slight variations in virtual task environment fidelity, even in cases in which task performance does not significantly differ. A pilot study was conducted (Skinner et al., 2009) in which variations in the fidelity of the stimuli (high
Use of Neurophysiological Metrics within a Real and Virtual Perceptual Skills Task
389
versus low polygon count) in a visual search/identification task did not result in performance changes; however, consistent and distinguishable differences were detected in ERP early and late components. The results of a second study (Skinner, et al., 2010) demonstrated that ERPs varied across four classes of vehicles and were sensitive to changes in the fidelity of the vehicles within the simulated task environment. While performance, measured by accuracy and reaction times, distinguished between the various stimulus resolution levels and between classes of vehicles, the ERPs further highlighted interactions between resolution and class of vehicle, revealing subtle but critical aspects affecting the perceptual discrimination for the vehicles within the training environment. The objective of the current study was to collect physiological and performance data for participants completing a real world perceptual skills task, as well as a similarly-structured VE training task in varied fidelity conditions, and to compare the data sets in an effort to identify the impact of the various task conditions on both behavioral and neurophysiological metrics.
2 Method Within the current study, visual discrimination and classification was required between 3 stimuli: positive (M-16), negative (AK-47), and neutral (umbrella) viewed through a real or virtual AN/PVS-14 Night Vision Device (NVD). The stimuli were partially occluded; only 6 inches of the front portion of the stimuli were visible, sticking out from a hallway, 20 feet from the seated observer. Stimuli were perpendicular to the hallway wall, and parallel to the ground. The real world (RW) conditions used a hallway constructed from foam board; the layout of the hallway is shown in Figure 1. Within the VE condition, a virtual hallway and virtual target objects were developed that were matched to RW task conditions and viewed through a virtual NVD model. This task design also reduced confounding variables such as field of view (FOV) in both the RW and VE conditions. Within the RW conditions, the FOV of the observer was restricted by the NVD. Participants were seated with their dominant eye up to the eyecup of the NVD, which was mounted on a tripod, and their non-dominant eye was covered by a patch. Within the VE conditions, subjects were seated 15 inches from a flat screen 19-inch monitor on which stimuli were displayed, with their dominant eye up to an NVD eyecup and a plastic tube designed to match the FOV for the in the RW task environment, which was mounted on a tripod, and their non-dominant eye was covered by a patch. A shutter mechanism was used to show or hide the visual stimuli in both conditions, and was synced to an open source data logging and visualization tool to fuse data from the physiological sensors and the task environment. Stimulus viewing time was 3 seconds with an interstimulus interval (ISI) of 7 seconds to allow enough time to swap the stimuli in the RW condition. Two task conditions were completed by all participants within the RW setting: ambient light conditions (RW ambient) and with infrared lighting (RW IR). The order of stimulus presentation was randomized by a computer program. A photograph of the RW task environment, taken through the NVD, and a screen shot of the VE are shown in Figure 1 and 2 respectively.
390
J. Vice et al.
Fig. 1. Real world AK-47 stimulus
Fig. 2. Virtual Environment M-16
Based on previous studies and the specific VE task characteristics, two fidelity components (resolution and color depth) were identified that were expected to reflect the greatest impact on performance and neurophysiological response; these were selected to be systematically varied to assess concomitant physiological changes. All other fidelity components were kept constant at a standard, default level during the experiments so as not to impact results. Three fidelity configurations were used in the VE condition: Low Resolution/High Color Depth (LoHi), High Resolution/Low Color Depth (HiLo), and High Resolution/High Color Depth (HiHi). The VE task scenarios were designed as closely to the RW scenes as possible by using pictures taken (without magnification) from the perspective of a participant in the RW task condition. Lighting within the VE was designed to match lighting conditions for RW Ambient lighting condition. A total of 40 participants were recruited for this experiment. Pilot testing was conducted with 5 participants, and 35 participated in the formal study. Approximately half of the participants started in the RW task environment, and half started in the VE task condition. The order of conditions within RW and VE was randomized in a block design. A total of 25 trials were presented for each of the 9 unique images (3 stimuli x 3 fidelity conditions) in the VE condition. A total of 25 trials were also presented for each of the 6 unique images (3 stimuli x 2 fidelity conditions) in the RW condition. The order of stimulus presentation was randomized for each subject, and the order of conditions was balanced across subjects. Data collected included accuracy, reaction time (RT), and EEG using a 9-channel EEG cap.
3 Results 3.1 Performance Data Performance data were assessed in terms of both accuracy (percent correct) and reaction time. Effects were assessed both across and within task conditions (RW and VE) for each stimulus type using repeated measures analysis of variance (ANOVA). Thirty-four participants completed the experiment in total. Based on a screening criterion to eliminate performers that were not able to perform above chance, those
Use of Neurophysiological Metrics within a Real and Virtual Perceptual Skills Task
391
with accuracy scores below 33% for any condition were removed; thus, 23 participants were included in the performance and physiological analyses. The Greenhouse-Geisser adjustment in SPSS was used to correct for violations of sphericity. A 3 x 5 (stimulus x fidelity) repeated measures ANOVA for percent correct showed a main effect for stimulus (F(1.77,39.03) = 4.25, p = .025), such that participants correctly identified the AK47 (M = 98.2%) more often than both the umbrella (M = 96.4%, p = .038) and the M16 (M = 96.1%, p = .003), with no significant difference between performance on the umbrella and M16. The main effect for fidelity, as well as the stimulus x fidelity interaction, was not statistically significant (p > .05). A 3 x 5 (stimulus x fidelity) repeated measures ANOVA for response time (RT) for correct responses found a significant main effect for stimulus (F(1.36,29.87) = 37.94, p < .001), such that RTs were faster for the AK47 (M = 1.289s) than either the umbrella (M = 1.356s, p = .005) or the M16 (M = 1.602s, p < .001), with the umbrella also faster than the M16 (p < .001). Thus, no speed accuracy trade-offs are evident; based on these results the AK-47 appears to have been the easiest stimulus to identify across all fidelity conditions, followed by the umbrella; the M-16 appears to have been the most difficult stimulus to identify. A significant fidelity main effect was also found (F(1.89,41.56) = 16.00, p < .001), such that response time in the RW IR condition (M = 1.702s) was slower than all other fidelity conditions (RW Ambient: M = 1.467s, VE LoHi: M = 1.313s, VE HiLo: M = 1.318s, VE HiHi: M = 1.278; p < .001 in all cases), and RTs were faster in the VE HiHi fidelity than the RW Ambient condition (p < .001). No significant differences were found in the post-hoc comparison of any other fidelity conditions. Finally, the interaction between stimulus and fidelity was found to be significant (F(4.44,97.69) = 2.86, p = .023). The effect is driven by the fact that participants responded slower to the umbrella in the RW Ambient condition (M = 1.420s) than in the VE HiHi condition (M = 1.178s, p < .001). Reaction time for the M16 was also slower in the RW Ambient fidelity (M = 1.690s) than the VE LoHi (M = 1.471s, p = .016), VE HiLo (M = 1.492s, p .011) and VE HiHi (M = 1.453s, p < .001). Reaction time for the AK47 exhibited no significant difference between the RW IR fidelity and any of the VE fidelities (p > .05 in each case).
Fig. 3. Mean RTs for correct trials for each stimulus by fidelity condition
392
J. Vice et al.
3.2 Neurophysiological Data Single trial ERP waveforms that included artifacts such as eyeblinks or excessive muscle activity were removed on a trial-by-trial basis using the B-Alert automated software. Additionally, trials with data points exceeding plus or minus 70 µV were filtered and removed before averages were combined for the grand mean analysis across all 23 participants. The ERP waveforms were time locked to the presentation of the testbed stimuli and ERPs were plotted for the two seconds post-stimulus presentation leading up to the response. Figure 4 highlights the ERP components of interest for a set of sample ERP waveforms.
Fig. 4. ERP waveform after stimulus presentation over a 2 second window
Based on previous research findings, indicating relevance to the current task, the following ERP waveform components were examined: N1, P2, and the late positivity (500-1200ms). Analysis of these components examined the effects of fidelity condition and stimulus type at various electrode sites. Initial analyses have focused on the three midline electrode sites (Fz, Cz, PO), providing indications of the impact of fidelity and stimulus variations at the frontal, central, and partietal/occipital regions of the brain. Figure 5 displays the grand mean ERP waveforms for each of the fidelity conditions by stimulus type at the three midline sites. The various conditions (VE and RW) are clearly differentiated across the waveforms by stimulus type and electrode site. The RW conditions display noticeably lower amplitude positive waveform components (P2 and late positivity) than the VE conditions across all sites and stimuli, as well as less pronounced negative (N1) components for all stimuli at the Fz and Cz sites. These waveforms were further examined for statistically significant effects of fidelity condition and stimulus type within the N1, P2, and Late Positivity components for the following comparisons: VE HiHi compared to both RW conditions Ambient and IR), as well as VE LoHi and VE HiLo compared to the RW Ambient Condition. The comparison of VE HiHi to the RW conditions was conducted to examine the relationship of the maximal fidelity condition to the RW transfer task conditions. The comparison of VE LoHi and VE HiLo to RW Ambient sought to identify which fidelity trade-off resulted in physiological responses that mapped more closely to the RW task under standard (ambient lighting) conditions.
Use of Neurophysiological Metrics within a Real and Virtual Perceptual Skills Task
393
Fig. 5. 3VE and 2RW conditions for each stimulus at the Fz, Cz, and PO sites
N1 Amplitude. The window used for the analysis of the N1 peak amplitude ranged between 40ms-175ms from the initial onset of the stimulus. The N1 was assessed for maximized fidelity (VE HiHi) compared to both RW conditions (Ambient and IR), revealing a main effect for EEG channel (Figure 6), as well as a significant interaction effect for fidelity by channel (Figure 7) in which the HiHi condition elicited a significantly larger N1 component at the Cz electrode site. Table 1. VE HiHi, RW Ambient, and RW IR N1 Statistical Analysis Source Channel Fidelity*Channel
Fig. 6. N1 Main effect for Channel
DF
F 2 4
23.67 6.43
p <.0001 0.0001
Fig. 7. N1 Fidelity x Channel interaction
The N1 Amplitude was also assessed for the comparison of fidelity trade-offs (VE LoHi and VE HiLo) to the RW Ambient condition. As shown in Table 1, a
394
J. Vice et al.
significant main effect was found for fidelity and channel, and significant interactions were found for fidelity by stimulus and for fidelity, channel, and stimulus. The significant interactions are shown in Figures 8, 9, and 10. Table 2. VE LoHi, VE HiLo, and RW Ambient N1 Statistical Analysis Source Fidelity Channel Fidelity*Channel Fidelity*Stimulus Fidelity*Channel*Stimulus
Fig. 8. N1 Fidelity x Channel
DF
F 2 2 4 4 8
5.24 19.95 3.47 4.08 2.94
p 0.0091 <.0001 0.0111 0.0044 0.0041
Fig. 9. N1 Fidelity x Stimulus
Fig. 10. VE LoHi, VE HiLo, and RW Ambient N1 Fidelity x Channel x Stimulus
P2 Amplitude. The window used for the analysis of the P2 amplitude was between 100ms-312ms. The P2 was assessed for maximized fidelity (VE HiHi) compared to both RW conditions (Ambient and IR). As shown in Table 3, this analysis revealed highly significant main effects for fidelity (Figure 11) and for channel (Figure 12), with the VE condition eliciting a significantly higher P2 than both RW conditions, and the Fz electrode site demonstrating lower P2 effects than Cz and PO. This increased P2 within the VE, compared to the RW conditions likely reflects additional required for processing of features within the VE.
Use of Neurophysiological Metrics within a Real and Virtual Perceptual Skills Task
395
Table 3. VE HiHi, RW Ambient, and RW IR P2 Statistical Analysis Source
DF
Fidelity Channel
F 2 2
Fig. 11. P2 Main effect for Fidelity
p
32.06 11.91
<.0001 <.0001
Fig. 12. P2 Main effect for Channel
The P2 amplitude was also assessed for the comparison of fidelity trade-offs (VE LoHi and VE HiLo) to the RW Ambient condition. As shown in Table 4, significant main effects were found for fidelity and channel (Figure 13), with the Ambient condition and the Fz electrode site generating the smallest P2 components, and a significant interaction was found for fidelity by stimulus (Figure 14). The HiLo condition elicited the highest P2 peak for the AK-47, but a noticeably lower P2 peak for the umbrella. This may indicate that a salient or critical feature of the AK-47 is degraded when the color depth is reduced, but that lower color depth may actually require less feature processing for the umbrella. Table 4. VE LoHi, VE HiLo, and RW Ambient P2 Statistical Analysis Source Fidelity Channel Fidelity*Stimulus
Fig. 13. P2 Main effect for Channel
DF
F 2 2 4
16.82 12.12 3.15
P <.0001 <.0001 0.0181
Fig. 14. P2 Fidelity x Stimulus Interaction
Late Positivity. The Late Positive Component, ranging from 500ms-1200ms after the presentation of the stimuli was assessed for maximized fidelity (VE HiHi) compared
396
J. Vice et al.
to both RW conditions (Ambient and IR). This analysis revealed main effects for fidelity condition and for EEG channel (Figure 15), as well as a significant interaction for fidelity by stimulus. The significant interaction is shown in Figure 16, in which the RW Ambient condition displays a positive late component for the M-16 and AK47, but a negative late component for the umbrella. Table 5. VE HiHi, RW Ambient, and RW IR Late Positivity Statistical Analysis Source
DF
Fidelity Channel Fidelity*Stimulus
Fig. 15. Main effect for Channel
F 2 2 4
P 3.81 38.4 2.86
0.0297 <.0001 0.0281
Fig. 16. Late Positivity Fidelity x Stimulus
The Late Positive Component was also assessed for the comparison of fidelity trade-offs (VE LoHi and VE HiLo) to the RW Ambient condition. As shown in Table 6, a significant main effect was found for EEG channel (Figure 17), and significant interaction effects were found for fidelity by stimulus, as well as fidelity, channel, and stimulus. The significant interactions are shown in Figures 18 and 19. Table 6. VE LoHi, VE HiLo, and RW Ambient Late Positivity Statistical Analysis Source Channel Fidelity*Stimulus Fidelity*Channel*Stimulus
Fig. 17. Main effect for Channel
DF
F 2 4 8
31.84 3.22 3.06
P <.0001 0.02 0.003
Fig. 18. Late Positivity Fidelity x Stimulus
Use of Neurophysiological Metrics within a Real and Virtual Perceptual Skills Task
397
Fig. 19. Late Positivity Fidelity x Channel x Stimulus
4 Discussion The goal of this study was to identify the VE fidelity configurations that provided a perceptual experience that most closely mimicked the RW task and to relate the neurophysiological data results to the performance results in an effort to better understand the relationship between task performance and neurophysiological response within a perceptual skills task. We expected to observe degraded performance and distinctive differentiation between physiological signatures in association with degraded fidelity. Within the performance data, both accuracy and response times indicated a main effect for stimulus type in which the AK-47 was the easiest stimulus to identify, followed by the umbrella, with the M16 being the most challenging stimulus to identify. An effect for fidelity condition was also found, indicating that RTs for the RW IR condition were significantly slower than all other fidelity conditions, followed by RW Ambient, VE LoHi, VE HiLo, and with the VE HiHi condition demonstrating the fastest RTs. The faster reaction times within the VE conditions could be attributed to the fact that simulated stimuli contain less visual details and features to be processed. Faster reaction times within the highest fidelity condition are likely due to increased distinguishability between salient features. Within the neurophysiological data, significant effects were found for stimulus type and fidelity condition, as well as EEG electrode channel/site along the midline of the brain for three components of the ERP waveform: N1, P1, and the Late Positivity. The RW ERPs were distinct from the VE ERPs, with the VE conditions eliciting higher amplitude ERP waveform components consistent with increased processing of pop-out visual features and object recognition. Thus, while based on the performance data, the VE conditions appeared to be easier, higher levels of processing were going on in the brain within those conditions. Comparisons of the maximal VE condition (HiHi) to both the RW Ambient and RW IR conditions were conducted in order to further explore differentiation between VE and RW neurophysiological response. Of particular interest was the finding that for the Late Positivity, the ERP waveforms were closely matched for the two weapons, and were distinct form the umbrella waveforms, despite the fact that the performance data demonstrate more similarity in accuracy and reaction times for the
398
J. Vice et al.
AK-47 and the umbrella than for the AK-47 and the M-16. Thus, the neurophysiological processing of the weapons may be more closely matched, despite larger differences in response times. Additionally, the VE trade-off conditions (LoHi and HiLo) were compared to the RW Ambient condition in order to identify the optimal fidelity trade-off in the event that the maximized (HiHi) condition could not be implemented due to development limitations. Significant interactions revealed that the optimal fidelity trade-off condition varied based on the stimulus. For example, the HiLo condition elicited the highest P2 peak for the AK-47, but a noticeably lower P2 peak for the umbrella. This may indicate that a salient or critical feature of the AK-47 is degraded when the color depth is reduced, but that lower color depth may actually require less feature processing for the umbrella. The distinctive ERP signatures offer a method to characterize objects within military training scenarios that required higher resolution for effective training, as well as those that could be easily recognized at lower resolutions, thus saving developers time and money by highlighting the most efficient requirements to achieve training efficacy. ERPs can be measured unobtrusively during training, allowing developers to access a metric that could be used to guide scenario development without requiring repeated transfer of training assessments and without relying solely on performance or subjective responses. This novel approach could potentially be used to determine which aspects of VE fidelity will have the highest impact on transfer of training with the lowest development costs for a variety of simulated task environments. These findings will be leveraged under an ongoing research effort to assess the impact of fidelity variations on performance and neurophysiological response within a VE-based perceptual skills training task to further examine the technical feasibility of utilizing neurophysiological measures to assess fidelity design requirements in order to maximize cost-benefit tradeoffs and transfer of training.
References 1. Berka, C., Levendowski, D.J., Cvetinovic, M.M., Petrovic, M.M., Davis, G., Lumicao, M.N., Zivkovic, V.T., Popovic, M.V., Olmstead, R.: Real-Time Analysis of EEG Indexes of Alertness, Cognition, and Memory Acquired With a Wireless EEG Headset. International Journal of Human-Computer Interaction 17(2), 151–170 (2004) 2. Carroll, M., Fuchs, S., Hale, K., Dargue, B., Buck, B.: Advanced Training Evaluation System (ATES): Leveraging Neuro-physiological Measurement to Individualize Training. In: Proceedings of I/ITSEC 2010 (2010) 3. Crosby, M.E., Ikehara, C.S.: Using physiological measures to identify individual differences in response to task attributes. In: Schmorrow, D.D., Stanney, K.M., Reeves, L.M. (eds.) Foundations of Augmented Cognition, 2nd edn., pp. 162–168. Strategic Analysis, Inc., San Ramon (2006) 4. Marine Corps Interim Publication 3-11.01, Combat Hunter. Publication Control Number 146 000009 00 (2011) 5. Skinner, A., Sebrechts, M., Fidopiastis, C.M., Berka, C., Vice, J., Lathan, C.: Psychophysiological Measures of Virtual Environment Training. In: Book chapter in Human Performance Enhancement in High Risk Environments: Insights, Developments & Future Directions from Military Research (2010)
Use of Neurophysiological Metrics within a Real and Virtual Perceptual Skills Task
399
6. Skinner, A., Vice, J., Lathan, C., Fidopiastis, C., Berka, C., Sebrechts, M.: PerceptuallyInformed Virtual Environment (PerceiVE) Design Tool. In: Schmorrow, D.D., Estabrooke, I.V., Grootjen, M. (eds.) FAC 2009. LNCS, vol. 5638, pp. 650–657. Springer, Heidelberg (2009) 7. Skinner, A., Berka, C., Ohara-Long, L., Sebrechts, M.: Impact of Virtual Environment Fidelity on Behavioral and Newurophysiological Response. In: Proceedings of I/ITSEC 2010 (2010) 8. Wickens, C.D., Hollands, J.G.: Engineering psychology and human performance, 3rd edn. Prentice Hall, Upper Saddle River (2000)
Author Index
Abate, Andrea F. I-3 Aghabeigi, Bardia II-279 Ahn, Sang Chul I-61 Akahane, Katsuhito II-197 Albert, Dietrich I-315 Aliverti, Marcello II-299 Almeida, Ana I-154 Amemiya, Tomohiro I-225, II-151, II-407 Amend, Bernd I-270 Ando, Makoto II-206 Andrews, Anya II-3 Aoyama, Shuhei I-45 Ariza-Zambrano, Camilo II-30 Baier, Andreas I-135 Barber, Daniel I-387 Barbuceanu, Florin Grigorie I-164 Barrera, Salvador I-40 Beane, John II-100 Behr, Johannes II-343 Berka, Chris I-387 Bishko, Leslie II-279 Bockholt, Ulrich I-123 Bohnsack, James II-37 Bolas, Mark II-243 Bonner, Matthew II-333 Bordegoni, Monica II-299, II-318 Bouchard, Durell I-345 Bowers, Clint II-37, II-237 Brogni, Andrea I-194, I-214, I-234 Brooks, Nathan II-415 Caldwell, Darwin G. I-194, I-214, I-234, II-327 Campos, Pedro I-12 Cantu, Juan Antonio I-40 Caponio, Andrea I-20, I-87 Caruso, Giandomenico II-299 Cervantes-Gloria, Yocelin II-80 Chang, Chien-Yen II-119, II-243 Charissis, Vassilis II-54 Charoenseang, Siam I-30, II-309 Chen, Shu-ya II-119
Cheng, Huangchong II-20 Choi, Ji Hye I-97 Choi, Jongmyung I-69 Christomanos, Chistodoulos Conomikes, John I-40
II-54
da Costa, Rosa Maria E. Moreira II-217 Dal Maso, Giovanni II-397 Dang, Nguyen-Thong I-144 de Abreu, Priscilla F. II-217 de Carvalho, Luis Alfredo V. II-217 de los Reyes, Christian I-40 Derby, Paul II-100 Di Loreto, Ines II-11 Dohi, Hiroshi II-227 Domik, Gitta II-44 Doyama, Yusuke II-158 Duarte, Em´ılia I-154, I-380 Duguleana, Mihai I-164 Duval, S´ebastien II-377 Ebisawa, Seichiro II-151 Ebuchi, Eikan II-158 Ende, Martin I-135 Enomoto, Seigo I-174, I-204 Erbiceanu, Elena II-289 Erfani, Mona II-279 Fan, Xiumin II-20 Ferrise, Francesco II-318 Flynn, Sheryl II-119 Foslien, Wendy II-100 Frees, Scott I-185 Garcia-Hernandez, Nadia II-327 Gardo, Krzysztof II-141 Gaudina, Marco I-194 Giera, Ronny I-270 Gomez, Lucy Beatriz I-40 Gonz´ alez Mend´ıvil, Eduardo I-20, I-87, II-80 Gouaich, Abdelkader II-11 Graf, Holger II-343 Grant, Stephen II-54
402
Author Index
Ha, Taejin II-377 Hammer, Philip I-270 Han, Jonghyun II-352 Han, Tack-don I-97, I-105 Hasegawa, Akira I-297, I-306, I-354, I-363 Hasegawa, Satoshi I-297, I-306, I-354, I-363 Hash, Chelsea II-279 Hayashi, Oribe I-76 He, Qichang II-20 Hergenr¨ other, Elke I-270 Hern´ andez, Juan Camilo II-30 Hill, Alex II-333 Hillemann, Eva I-315 Hincapi´e, Mauricio I-20, I-87 Hirose, Michitaka I-76, I-250, I-260, I-280, II-158, II-206 Hirota, Koichi I-225, II-151, II-407 Hiyama, Atsushi II-158 Holtmann, Martin II-44 Hori, Hiroki I-306, I-354, I-363 Huck, Wilfried II-44 Hughes, Charles E. II-270, II-289 Hwang, Jae-In I-61 Ikeda, Yusuke I-174, I-204 Ikei, Yasushi I-225, II-151, II-407 Ingalls, Todd II-129 Ingraham, Kenneth E. II-110 Inose, Kenji II-206 Isbister, Katherine II-279 Ise, Shiro I-174, I-204 Ishii, Hirotake I-45 Ishio, Hiromu I-297, I-306, I-354, I-363 Ishizuka, Mitsuru II-227 Isshiki, Masaharu II-197 Izumi, Masanori I-45 Jang, Bong-gyu I-243 Jang, Say I-69 Jang, Youngkyoon II-167 Jung, Younbo II-119 Jung, Yvonne II-343 Kajinami, Takashi I-250, I-260, II-206 Kamiya, Yuki I-40 Kanda, Tetsuya I-306, I-354, I-363 Kang, Changgu II-352 Kang, Kyung-Kyu II-425
Kasada, Kazuhiro I-76 Kawai, Hedeki I-40 Kawamoto, Shin-ichi II-177 Kayahara, Takuro II-253 Keil, Jens I-123 Kelly, Dianne II-54 Kennedy, Bonnie II-119 Kickmeier-Rust, Michael D. I-315 Kim, Dongho II-425 Kim, Gerard J. I-243 Kim, Hyoung-Gon I-61 Kim, Jae-Beom I-55 Kim, Jin Guk II-370 Kim, Kiyoung II-352 Kim, Sehwan I-69, II-377 Kiyokawa, Kiyoshi I-113 Klomann, Marcel II-362 Kolling, Andreas II-415 Kondo, Kazuaki I-204 Kubo, Hiroyuki II-260 Kuijper, Arjan II-343 Kunieda, Kazuo I-40 Laffont, Isabelle II-11 Lakhmani, Shan II-237 Lancellotti, David I-185 Lange, Belinda II-119, II-243 Lathan, Corinna I-387 Lee, Hasup II-253 Lee, Jong Weon II-370 Lee, Seong-Oh I-61 Lee, Seunghun I-69 Lee, Youngho I-69 Lewis, Michael II-415 Li, Lei II-119 Lukasik, Ewa II-141 Ma, Yanjun II-20 MacIntyre, Blair II-333 Maejima, Akinobu II-260 Makihara, Yasushi I-325 Mapes, Dan II-110 Mapes, Daniel P. II-270 Mashita, Tomohiro I-113, I-335 Matsunuma, Shohei I-363 Matsuura, Yasuyuki I-306, I-371 McLaughlin, Margaret II-119 Mendez-Villarreal, Juan Manuel I-40 Mercado, Emilio I-87 Mestre, Daniel I-144
Author Index Milde, Jan-Torsten II-362 Milella, Ferdinando II-397 Miyao, Masaru I-297, I-306, I-354, I-363, I-371 Miyashita, Mariko II-158 Mogan, Gheorghe I-164 Morie, Jacquelyn II-279 Morishima, Shigeo I-325, II-177, II-187, II-260 Moshell, J. Michael II-110, II-289 Mukaigawa, Yasuhiro I-204, I-335 Nakamura, Satoshi I-174, I-204, II-177, II-187 Nam, Yujung II-119 Nambu, Aiko I-280 Nappi, Michele I-3 Narumi, Takuji I-76, I-250, I-260, I-280, II-206 Newman, Brad II-243 Nguyen, Van Vinh II-370 Nishimura, Kunihiro I-280 Nishioka, Teiichi II-253 Nishizaka, Shinya I-260 Norris, Anne E. II-110 Nunnally, Steven I-345 Ogi, Tetsuro II-253 Oh, Yoosoo II-377 Okumura, Mayu I-325 Omori, Masako I-297, I-306, I-354, I-363 Ono, Yoshihito I-45 Onta˜ n´ on, Santiago II-289 Pacheco, Zachary I-40 Panjan, Sarut I-30 Park, Changhoon I-55 Park, James I-97 Pedrazzoli, Paolo II-397 P´erez-Guti´errez, Byron II-30 Pessanha, Sofia I-12 Phan, Thai II-243 Pojman, Nicholas I-387 Polistina, Samuele II-299 Procci, Katelyn II-37 Radkowski, Rafael II-44, II-387 Rebelo, Francisco I-154, I-380 Reinerman-Jones, Lauren I-387 Ricciardi, Stefano I-3 Rios, Horacio I-87
Rizzo, Albert Rovere, Diego
II-119, II-243 II-397
Sacco, Marco II-397 Sakellariou, Sophia II-54 Sanchez, Alicia II-73 Sanders, Scott II-119 San Martin, Jose II-64 Santos, Pedro I-270 Sarakoglou, Ioannis II-327 Schmedt, Hendrik I-270 Sebrechts, Marc I-387 Seif El-Nasr, Magy II-279 Seki, Masazumi II-158 Seo, Jonghoon I-97, I-105 Shim, Jinwook I-97, I-105 Shime, Takeo I-40 Shimoda, Hiroshi I-45 Shinoda, Kenichi II-253 Shiomi, Tomoki I-306, I-354, I-363 Skinner, Anna I-387 Slater, Mel I-234 Smith, Peter II-73 Sobrero, Davide I-214 Stork, Andr´e I-270 Su´ arez-Warden, Fernando II-80 Sugiyama, Asei I-354 Suksen, Nemin II-309 Suma, Evan A. II-243 Sung, Dylan II-90 Sycara, Katia II-415 Takada, Hiroki I-297, I-306, I-354, I-363, I-371 Takada, Masumi I-371 Takemura, Haruo I-113 Tan, Veasna I-387 Tanaka, Hiromi T. II-197 Tanikawa, Tomohiro I-76, I-250, I-260, I-280, II-206 Tateyama, Yoshisuke II-253 Teixeira, Lu´ıs I-380 Teles, J´ ulia I-154, I-380 Terkaj, Walter II-397 Tharanathan, Anand II-100 Thiruvengada, Hari II-100 Tonner, Peter II-270 Touyama, Hideaki I-290 Tsagarakis, Nikos II-327 Turner, Janice II-54
403
404
Author Index
Umakatsu, Atsushi I-113 Urano, Masahiro II-407 Valente, Massimiliano I-214 Van Dokkum, Liesjet II-11 Ventrella, Jeffery II-279 Vice, Jack I-387 Vilar, Elisˆ angela I-154 Vuong, Catherine II-129 Wakita, Wataru II-197 Wang, Huadong II-415 Watanabe, Takafumi II-206 Webel, Sabine I-123 Weidemann, Florian II-387 Werneck, Vera Maria B. II-217 Whitford, Maureen II-119 Winstein, Carolee II-119
Wirth, Jeff II-110 Wittmann, David I-135 Woo, Woontack II-167, II-352, II-377 Yagi, Yasushi I-204, I-325, I-335, II-187 Yamada, Keiji I-40 Yamazaki, Mitsuhiko I-76 Yan, Weida I-45 Yang, Hyun-Rok II-425 Yang, Hyunseok I-243 Yasuhara, Hiroyuki I-113 Yeh, Shih-Ching II-119 Yoon, Hyoseok II-377 Yotsukura, Tatsuo II-177 Yu, Wenhui II-129 Zhang, Xi II-20 Zhu, Jichen II-289