Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y. Vardi Rice University, Houston, TX, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbruecken, Germany
6530
Zhigeng Pan Adrian David Cheok Wolfgang Müller Xubo Yang (Eds.)
Transactions on Edutainment V
13
Editors-in-Chief Zhigeng Pan Zhejiang University, Hangzhou, China E-mail:
[email protected] Adrian David Cheok National University of Singapore, Singapore E-mail:
[email protected] Wolfgang Müller University of Education, Weingarten, Germany E-mail:
[email protected] Guest Editor Xubo Yang Shanghai Jiao Tong University Department of Computer Science and Engineering Shanghai, 200240, China E-mail:
[email protected]
ISSN 0302-9743 (LNCS) e-ISSN 1611-3349 (LNCS) ISSN 1867-7207 (TEDUTAIN) e-ISSN 1867-7754 (TEDUTAIN) ISBN 978-3-642-18451-2 e-ISBN 978-3-642-18452-9 DOI 10.1007/978-3-642-18452-9 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2010942976 CR Subject Classification (1998): K.3.1-2, I.2.1, H.5, H.3, I.3-4
© Springer-Verlag Berlin Heidelberg 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
It is our pleasure to edit the fifth volume of the new journal Transactions on Edutainment. This journal part of the Springer series Lecture Notes in Computer Science, is devoted to research and development in the field of edutainment. Edutainment, also known as educational entertainment or entertainment-education, denotes all forms of entertainment designed to educate as well as to provide fun. This approach is motivated by the increasing demands on individuals for life-long learning and the need to integrate effective learning opportunities throughout life. As such, edutainment has attracted increasing interest in the last few years. The first nine articles of this issue are regular papers. The paper “Innovative Integrated Architecture for Educational Games: Challenges and Merits” is an overview paper. Rania Hodhod et al. define a novel architecture that assists the dual-narrative generation technique to be employed effectively in an adaptive educational game environment. The architecture composes components that individually have shown effectiveness in educational game environments. These components are a graph-structured narrative, a dynamically generated narrative, evolving agents and a student model. An adaptive educational game has been developed to investigate the synergy of the architecture components. In “Beyond Standards: Unleashing Accessibility on a Learning Content Management System,” Silvia Mirri et al. explore various questions and perspectives about the implementation of two accessibility standards in an E-learning platform, achieving inclusion both of the standards and their goals to provide accessibility. In “Design and Implementation of an OpenGL-Based 3D First Person Shooting Game,” Qiaomin Lin et al. present the design and implementation of a 3D first person shooting game. In “Direct Interaction Between Operator and 3D Virtual Environment with a Large-Scale Haptic,” Jie Huang et al. describe a direct interaction system which includes a large-scale 3D virtual environment and string-driven haptic. Key techniques of hand position measurement and string tension control are studied. In “Modeling and Optimizing Joint Inventory in Supply Chain Management”, Min Lu et al. study the new inventory management method called Joint Inventory Management. A model of Joint Inventory for manufacturing and marketing is presented. In “Vision-Based Robotic Graphic Programming System,” Jianfei Ma et al. study a vision-based robot programming system and describe the system architecture including the whole framework, the hardware and software components, as well as the visual feedback control loop structure. In “Integrating Activity Theory for Context Analysis on Large Display,” Fang You et al. apply activity theory to understand the large display usage and show design ideas of large display: centralized mapping and gesture tracing. They take the speaker-audience usage as an example and present two prototypes based on activity-centered analysis. In “Line Drawings Abstraction from 3D
VI
Preface
Models,”Shujie Zhao et al. propose a method to extract feature lines directly from 3D models. With this method, linear feature lines are extracted through finding intersections of two implicit functions that can work without lighting, and rendered with visibility in a comprehensive way. In “Interactive Creation of Chinese Calligraphy with the Application in Calligraphy Education”, Xianjun Zhang et al. proposea semiautomatic creation scheme of Chinese calligraphy and apply the scheme in calligraphy education. The following 12 articles of this issue represent a selection of contributions from DMDCM 2010 held in Chongqing, China, in December 2010. Some topics of the conference are related to that of this new journal, including digital media and processing; digital content management; digital media transmissions; digital rights management; digital museum; geometry modeling; image-based rendering; real-time rendering; computer animation; 3D reconstruction; geographic information system; virtual reality/augmented reality; image/model/video watermarking; image segmentation; multimedia technology; image/model retrieval; cultural relics protection; ancient literature digitization; cultural relic restoration; modeling and rendering for heritage; interactive technology and equipment; media art and digital art; game design and development. These 12 papers cover topics on human–computer interaction, virtual exhibit, face recognition, character animation etc. The paper titles are:“Outline Font Generating from Images of Ancient Chinese Calligraphy,” Tangible Interfaces to Digital Connections; Centralized Versus Decentralized, Research and Implementation of the Virtual Exhibit System of Places of Interest Based on Multi-Touch Interactive Technology; A Highly Automated Method for Facial Expression Synthesis; Sketch-Based 3D Character Deformation; Mean, Laplace– Beltrami Operator for Quadrilateral Meshes; Multi-User 3D-Based Framework for E-Commerce; Coordinate Model for Text Categorization; An Interface to Retrieve Personal Memories Through an Iconic Visual Language; VR-Based Basketball Movement Simulation; Mixed 2D–3D Information for Face Recognition; Research on Augmented Reality Display Method of Scientific Exhibits. The papers in this issue present a large number of application examples in the area of E-learning, games, animation, multimedia, and virtual reality, which gives a broad view on the application of edutainment-related techniques. We would like to express our thanks to all those people who contributed to this issue. They are the authors, the reviewers, and the IPC of the DMDCM 2010 conference who recommended papers to this new journal. Special thanks go to Yi Li and Qiaoyun Chen from the journal’s Editorial Office in Nanjing Normal University for all their effort.
November 2010
Xubo Yang Ruwei Yun
Transactions on Edutainment
This Journal subline serves as a forum for stimulating and disseminating innovative research ideas, theories, emerging technologies, empirical investigations, state-of-the-art methods, and tools in all the different genres of edutainment, such as game-based learning and serious games, interactive storytelling virtual learning enviroments, virtual-reality-based education, and related fields. It covers aspects of educational and game theories, human–computer interaction, computer graphics, artifical intelligence, and systems design.
Editors-in-Chief Zhigeng Pan Adrian David Cheok Wolfgang M¨ uller
Zhejiang University, China NUS, Singapore University of Education Weingarten, Germany
Managing Editor Yi Li
Nanjing Normal University, China
Editorial Board Ruth Aylett Judith Brown Yiyu Cai Maiga Chang Holger Diener Jayfus Tucker Doswell Sara de Freitas Lynne Hall Masa Inakage Ido A Iurgel K´ arp´ ati Andrea Lars Kjelldahl James Lester Nicolas Mollet Ryohei Nakatsu Ana Paiva Abdennour El Rhalibi
Heriot-Watt University, UK Brown Cunningham Associates, USA NTU, Singapore Athabasca University, Canada Fhg-IGD Rostock, Germany Juxtopia Group, USA The Serious Games Institute, UK University of Sunderland, UK Keio University, Japan Universidade do Minho, Portual E¨ otv¨ os Lor´ and University, Hungary KTH, Sweden North Carolina State University, USA IIT, Italy NUS, Singapore INESC-ID, Portugal JMU, UK
VIII
Transactions on Edutainment
Daniel Thalmann Kok-Wai Wong Gangshan Wu Xiaopeng Zhang Stefan Goebel Michitaka Hirose Hyun Seung Yang
EPFL, Switzerland Murdoch University, Australia Nanjing University, China IA-CAS, China ZGDV, Germany University of Tokyo, Japan KAIST, Korea
Editorial Assistant Ruwei Yun Qiaoyun Chen
Nanjing Normal University, China Nanjing Normal University, China
Editorial Office Address: Ninghai Road 122, Edu-Game Research Center, School of Education Science, Nanjing Normal University, Nanjing, 210097, China E-mail:
[email protected];
[email protected] Tel/Fax: 86-25-83598921
Table of Contents
Regular Papers Innovative Integrated Architecture for Educational Games: Challenges and Merits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rania Hodhod, Paul Cairns, and Daniel Kudenko
1
Beyond Standards: Unleashing Accessibility on a Learning Content Management System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Silvia Mirri, Paola Salomoni, Marco Roccetti, and Gregory R. Gay
35
Design and Implementation of an OpenGL Based 3D First Person Shooting Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qiaomin Lin, Zhen Zhao, Dihua Xu, and Ruchuan Wang
50
Direct Interaction between Operator and 3D Virtual Environment with a Large Scale Haptic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jie Huang, Jian Li, and Rui Xiao
62
Modeling and Optimizing of Joint Inventory in Supply Chain Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Min Lu and Haibo Zhao
71
Vision-Based Robotic Graphic Programming System . . . . . . . . . . . . . . . . . Jianfei Mao, Ronghua Liang, Keji Mao, and Qing Tian
80
Integrating Activity Theory for Context Analysis on Large Display . . . . . Fang You, HuiMin Luo, and JianMin Wang
90
Line Drawings Abstraction from 3D Models . . . . . . . . . . . . . . . . . . . . . . . . . Shujie Zhao and Enhua Wu
104
Interactive Creation of Chinese Calligraphy with the Application in Calligraphy Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xianjun Zhang, Qi Zhao, Huanzhen Xue, and Jun Dong
112
Papers from DMDCM 2010 Outline Font Generating from Images of Ancient Chinese Calligraphy . . . Junsong Zhang, Guohong Mao, Hongwei Lin, Jinhui Yu, and Changle Zhou Tangible Interfaces to Digital Connections, Centralized versus Decentralized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matthijs Kwak, Gerrit Niezen, Bram van der Vlist, Jun Hu, and Loe Feijs
122
132
X
Table of Contents
Research and Implementation of the Virtual Exhibit System of Historical Sites Base on Multi-touch Interactive Technology . . . . . . . . . . . Yi Lin and Yue Liu
147
A Highly Automated Method for Facial Expression Synthesis . . . . . . . . . . Nikolaos Ersotelos and Feng Dong
158
Sketch Based 3D Character Deformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mo Li and Golam Ashraf
177
Mean Laplace–Beltrami Operator for Quadrilateral Meshes . . . . . . . . . . . . Yunhui Xiong, Guiqing Li, and Guoqiang Han
189
Multi-user 3D Based Framework for E-Commerce . . . . . . . . . . . . . . . . . . . . Yuyong He and Mingmin Zhang
202
Coordinate Model for Text Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . Wei Jiang and Lei Chen
214
An Interface to Retrieve Personal Memories Using an Iconic Visual Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rui Jesus, Teresa Rom˜ ao, and Nuno Correia
224
VR-Based Basketball Movement Simulation . . . . . . . . . . . . . . . . . . . . . . . . . Lin Zhang and Ling Wang
240
Mixed 2D-3D Information for Face Recognition . . . . . . . . . . . . . . . . . . . . . . Hengliang Tang, Yanfeng Sun, Baocai Yin, and Yun Ge
251
Research on Augmented Reality Display Method of Scientific Exhibits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hui Yan, Ruwei Yun, Chun Liang, Daning Yu, and Baoyun Zhang
259
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
271
Innovative Integrated Architecture for Educational Games: Challenges and Merits Rania Hodhod1,2, Paul Cairns1, and Daniel Kudenko1 1
Computer Science Department, University of York, Heslington, York, YO10 5DW, UK
[email protected], {pcairns,kudenko}@cs.york.ac.uk 2 Faculty of Computer and Information Sciences, Ain Shams University, Abbassia, Cairo, Egypt
Abstract. Interactive Narrative in game environments acts as the main catalyst to provide a motivating learning experience. In previous work, we have described how the use of a dual narrative generation technique could help to resolve the conflict between allowing high player student agency and also the track of the learning process. In this paper, we define a novel architecture that assists the dual narrative generation technique to be employed effectively in an adaptive educational game environment. The architecture composes components that individually have shown effectiveness in educational games environments. These components are graph structured narrative, dynamically generated narrative, evolving agents and a student model. An adaptive educational game, AEINS, has been developed to investigate the synergy of the architecture components. AEINS aims to foster character education at 8-12 year old children through the use of various interactive moral dilemmas that attempt the different student's cognitive levels. AEINS was evaluated through a study involved 20 participants who interacted with AEINS on an individual basis. Keywords: Educational games, interactive narrative, intelligent tutor, character education, ill-defined domains.
1 Introduction Educational games have the potential to provide intrinsically motivating learning experience to the learner. Interactive narrative in educational games is recognized as a valuable support for learning as it allows collaboration of humans and computers in the creation of innovative experiences where both sides are engaged in a meaningful construction process. It also helps make sense of experience, organize knowledge, sparking problem-solving skills and increase motivation. Within these environments, the rich generated stories allow a kind of unintentional learning process to occur through an engaging and appealing experience and the student is seen as an active participant in the construction of his own knowledge. Such tempting characteristics of interactive narrative suggest it as a suitable teaching medium for ill-defined domains such as design, history, inter-cultural competence and ethics. An ill-defined domain is the one exhibits one or more of the following Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 1–34, 2011. © Springer-Verlag Berlin Heidelberg 2011
2
R. Hodhod, P. Cairns, and D. Kudenko
characteristics: (a) lack consistent, unambiguous, and generalizable solutions [1], (b) defense of different decisions is based upon different criteria and it often depends on how the solver conceptualizes the situation [2], (c) can not be described in a finite set of production rules [3] and (d) lack of defined rules that help in progressing in the solution path from the initial step to the final step [4]. The ethics domain is an important ill-defined domain where it affects oneself and others on a daily basis. The importance of the ethics domain is well recognized by schools as moral teaching is invited to be included in every possible curriculum [5]. Teaching ethics aims to develop child and adolescent awareness of social and moral responsibilities, so-called character education. Character education implies the widely shared, pivotally important, core ethical values, such as caring, honesty, fairness, responsibility, and respect for self and others along with supportive performance values that form the basis of good character, such as diligence, a strong work ethic, and perseverance [5]. Efforts have been made to find non-traditional ways of teaching such as role playing that helps students to transfer their knowledge and beliefs into actions [6], brainstorming moral dilemmas [7] and using interactive learning models [8]. Such efforts aim to help students' cognitive development by allowing them to pursue moral actions and see how their decisions affect other people and oneself in relation to others. However, these tools are not able to provide any empirical evidence on the development of the students' moral virtues in a general way. Nonetheless, learning in an ill-defined domain such as inter-cultural competence, as well as ethics, requires (at least) a heightened sense of self-awareness, an ability to self-assess, enhanced perceptive abilities and a proclivity to reflect on experience [9]. Generally, educational games suffer from trying to achieve two contradicting aims: provide a free narrative experience to the player student and the ability to track the learning process at the same time. Combining these two aims can be seen as a very hard task. In the past we investigated the possibility of combining more than one narrative technique in order to tackle this challenge. The dynamic narrative has been chosen for its ability to provide high player student agency and magnifies the feel of control through his ability to affect how the story unfolds. Scripted narrative has been used to construct the teaching moments (moral dilemmas) as it constrains the user agency and allows tracking and assessing the learning process. The dual narrative generation technique managed to produce free narrative that can seamlessly be interleaved with the teaching moments forming one continuous story. This paper defines a novel architecture that assists the dual narrative generation technique to be effectively employed in an educational game environment. The educational game, AEINS, has been developed as a proof-of-concept. It aims to promote and leverage character education through a free user-system interaction and guided learning process. AEINS integrates interactive narrative and intelligent tutoring in order to have effective story telling underpinned by strong learning objectives. The next section motivates our work by presenting issues, controversies, and problems encountered in existing systems. Section 3 presents AEINS and its constituent modules. Finally, evaluation, summary and conclusion are presented in sections 4 and 5.
Innovative Integrated Architecture for Educational Games: Challenges and Merits
3
2 Related Work Narrative-based educational games have been the subject of increasing attention. Work done in this area mainly focuses on embedding pedagogical activities in highly engaging, game-like interactions. There are various features used in edugames that meant to increase the effectiveness of these platforms such as: scripted narrative, dynamic generated narrative, evolving characters and student modeling. The following subsections aim to show how the current existed edugames employ these elements and highlight the synergy of these four features together in a single architecture. Scripted Narrative. Different ideas have been proposed on the kind of interaction between the learning objectives and the game narrative content. Some edugames used scripted narrative in order to have control over the learner's experience. Scripted narrative is not as adaptive as planned narrative because it limits the learner's freedom in the environment [10] and requires an extensive amount of authoring work [11]. However, the scripted approach to interactive narrative can be seen as a 'must' to allow for learner assessment and automated guidance [1]. Scripted narrative was used in the TIME interactive drama prototype developed in the field of medicine [12]. Another edugame is StoryTeller developed in the literacy education domain [13]. The narrative in Storyteller is pre-defined by the children at the beginning of the play causing them to follow well-defined scenarios. A further edugame is BAT ILE developed to help learning binary arithmetic and logic gates [14]. In BAT ILE, the impact of the student's actions on the narrative is not obvious and a dramatic story is missed. The scripted approach has also been used by the game-like environment ELECT BiLAT, a culturally sensitive negotiations simulator that trains high ranking military officers to achieve their military objectives in the field [1]. More edugames are the Tactical Language Trainer System (TLTS) that teaches trainees proper verbal, body language and cultural skills for different languages [15] and the TLCTS edugame developed to help people to acquire communicative skills in foreign languages and cultures [16]. The TLCTS edugame uses pre-authored descriptions of scenes, which identify characters, stage settings, and possible dialog exchanges between characters. Dynamic Generated Story. Many edugames exhibit the presence of a dynamically generated story that allows the production of various stories each time the game is played. This kind of narrative is more adaptable than scripted narrative where the student actions are recognized in affecting how the story unfolds. However, some of these edugames do not take full advantage of storytelling potential seen in interactive drama applications. For example, the Mimesis edugame, developed to help middle school students to learn specific physical concepts, has the learning tasks represented in interactive narrative plans [17]. Each group of learning tasks is designed in a way that leads to one or more educational goal(s). Each learning task is a contained story in itself. Crystal Island is another edugame developed in the microbiology domain [18, 19]. The authors take the same approach as ours in the way that the tutoring and narrative components interact together. The presence of continuous story allows the existence of narrative that 'glues' the generated learning objects together to form one continuous coherent story and preserve the dramatic tension. Another advantage can be seen in its ability to engage the
4
R. Hodhod, P. Cairns, and D. Kudenko
learner and capture his attention through the presence of evolving agents whose personalities evolve over the interaction course with the game. Some edugames manage to have a continuous story such as the TEATRIX edugame [20] and the IN-TALE edugame [21]. TEATRIX is a learning environment designed to help children and their teachers in the whole process of collaborative story creation. In TEATRIX the narrative emerged from the children and the autonomous agents’ interactions to form one continuous coherent story. IN-TALE is an edugame developed for military training skills. It integrates an automated story director and reactive autonomous agents in the context of a story-based military leader training scenario. However, this kind of generated story is not enough to ensure that the trainee is exposed to dramatic and pedagogically relevant situations in an appropriate and contextual order [21]. Other edugames either contain scenes or plots within which the learning objectives existed. A group of scenes or plots forms one story each time. For example, FearNot! a learning environment developed to promote awareness about bullying behavior in schools [22, 23] and the ISAT edugame developed for the interactive training [24]. Conundrum is another edugame that allows learners to experience ethical decision making in realistic scenarios [25]. The ethical situations are encoded as acts and a sequence of scenes lead to a certain conclusion. Scenes used in FearNot!, plots used in ISAT and ethical situations in Conundrum are all separate disconnected stories that can be referred to as teaching moments. These teaching moments form the narrative elements (events) that generate a story as a sequence of events. The presence of continuous story allows highlighting the relationships among the narrative elements, which is considered as a key point for provoking active thinking and supporting meaning construction [26]. Evolving Agents. Narrative environments characterize by its rich worlds in which non-playing characters (agents) can inhabit. Life-like agents with evolving personalities increase the engagement and the realism existed in the game. Their direct reactions to the user actions, in addition to the change in their attitudes and personalities over the game unfold provide high engaging 'hawk'. Lately, FearNot! [11] shows big interest in achieving this kind of evolving characters through the whole game experience. Student Modeling. Adaptation to individual student's needs can be seen as an advantage in any learning environment. Students could benefit from having tutorial guidance when playing educational games as they tend to perform better in more structured pedagogical activities [27]. Some edugames have recognized the importance of employing a student model such as ELECT BILAT [1], BAT ILE [14], TLTS [15], TLCTS [16], Mimesis [17], IN-TALE [21], ISAT [28] and ELEKTRA [29]. Riedl and Stern (2006) believe that narrative and interactivity are diametrically opposed [21]. In other words, coherent narrative and user agency are two seemingly conflicting requirements. In their work they use branching stories and believable agents, yet without assuring the pedagogical effect on the user. The problem becomes not only concerned with narrative generation and the player student agency but also with the ability to track the learning process and provide personalized feedback. A summary of the literature can be found in Table 1. As seen from the above review, edugames used narrative to generate the game story and engage the student. They managed to offer personalized learning through
Innovative Integrated Architecture for Educational Games: Challenges and Merits
5
the presence of either a tutoring model and/or a student model. However, there was always a need to compromise both entertaining aspects and educational aspects. Eventually, one aspect would override the other. Our previous work on the dual narrative generation addressed this issue, but it was not enough to offer the required personalized educational experience. This leads to the integration of more modules to assist this aim. The following section presents those modules and their roles on both the entertainment and the educational sides. Table 1. A visualized summary on the different features of the above-mentioned edugames
3 AEINS The aim of this paper if to focus on the various modules needed in order to help the dual narrative generation technique to achieve its aims in an adaptive educational game environment. In other words, not only engages the user but also act as an adaptive learning medium. Adaptation is an important requirement in learning environments as it provides personalized learning process and personalized feedback which are important for effective education. In order to provide this kind of intelligent tutoring, the following modules are needed: a domain model, a student model and a pedagogical model. Moreover, to provide an interesting engaging environment a story module is also required. All these modules have been integrated in a single architecture that attempts to address the shortcomings of the existing systems. At the first glance, the integration of these modules together is challenging. Difficulties appear in interleaving the learning objects in the STRIPS-Planner generated plans without harming the dramatic effect and the events flow. When and how frequent feedback should be provided is another struggle, in addition to the agents' autonomy and their pedagogical role in delivering learning. More challenges have been found in scripting the learning objects and representing the various existed modules. The next subsections illustrate the techniques used in order to face these challenges. In order to evaluate the developed architecture the educational game, AEINS, has been developed as a proof of concept. Qualitative and quantitative evaluations have been done to assess AEINS.
6
R. Hodhod, P. Cairns, and D. Kudenko
AEINS is an adaptive educational game that aims to help in character education. AEINS is a problem solving environment that engages 8-12 year old children effectively in interactive moral dilemmas in order to practice moral virtues. AEINS has been designed as an endogenous game, where the content material is intimately tied in with the game play. AEINS main aim is to allow students to move from the making moral judgments state to the taking moral actions state, from the knowing state to the doing state, which we consider a very important step in moral education.
Fig. 1. AEINS architecture showing the various modules and their interactions
3.1 The Domain Model The domain model describes the various concepts (i.e. values) in the ethics domain and their relationships. One part of the model defines the principles of character education [30] and represents their relationships and dependencies. Prof. Helen Haste, Emeritus Professor of Psychology, has been consulted on the validity of the designed model. Prof. Helen agreed that the represented model is quite a clear representation of what one might call common sense and popular views. She added that the model can be used by all means as a basis for developing our game. Frame knowledge representation has been used to represent the model for the following reasons: firstly, it allows arranging domain knowledge at different granularities, secondly, it is well-suited for the representation of schematic knowledge, and thirdly, it requires no costly search processes. Each frame has its own name and a set of attributes or slots, which contain values; for instance, the frame for trustworthy moral virtue (root concept) have slots that corresponds to sub-values (subskills) that should be mastered in order to consider the main value (skill) mastered such as be-honest and not-lie values. Lastly, frames representation provides a flexible model as it allows partial ordering of the dependencies and relationships between the domain concepts. In this way, it leaves 'room' for the pedagogical model to choose the next concept to present the student with based on the current student model. The second part of the domain model comprises of a teaching moments repertoire. The teaching moments can be thought of as a variety of ethical problems that require tough decisions. The idea behind the current design of the teaching moment is based on analyzing various moral dilemmas and transforming them to story graph structures, then specify decision points that reflect specified skills. The teaching moments
Innovative Integrated Architecture for Educational Games: Challenges and Merits
7
allow the use of an intelligent tutor to track the student's actions and assess them in the form of a step-by-step follow-up. An example of a graph structured teaching moment is shown in Fig. 2, the variables has been instantiated to specific names and places to simplify the visualization process. Ideally, each teaching moment path/branch describes a story in which the protagonist is the user in the role of making moral decisions. The teaching moments allow students to pursue different procedures for solving the problem based on the student's perception and interpretation of the nature of the problem. The student's understanding gained through this process is situated in their experience and can best be evaluated in terms relevant to this experience.
Fig. 2. An example of a graph structured dilemma (a teaching moment)
Although the different branches of every teaching moment are pre-defined, each teaching moment exhibits variability by allowing different characters and places to present the teaching moment depending on the story-world state. Each teaching moment represents a part of the whole story and focuses on teaching a specific moral virtue in a way that the concept mastery is established within. Each teaching moment has certain prerequisites that must be fulfilled before the execution of the teaching moment takes place. Manipulating a teaching moment's priority is done via represented rules as follows: Trigger: teaching moment TM1 has not been presented and teaching moment TM2 has not been presented and value do_not_cheat is not held by the user and value do_not_lie is held by the user Action: set priority to teaching moment TM2 The representation denotes that if (a) a specific pattern of teaching moments (TM1 and TM2) has not been presented to the student yet and (b) the student holds certain values (do_not_cheat) and does not hold others (do_not_lie), the action part of the rule executes (teaching moment TM2 has priority over teaching moment TM1).
8
R. Hodhod, P. Cairns, and D. Kudenko
If several rules satisfied their premises, this result in having more than one teaching moment to present, then any of them would be appropriate enough to be presented next to the student. 3.2 The Pedagogical Model The pedagogical model aims to adapt instruction, monitor and evaluate the student's actions. The model is developed in the form of production rules. These rules are used to give the system specific cognitive operations to reason about the student and the teaching process. In order to design the pedagogical model, the problem structure and what exactly needs to be modeled need to be specified. With ill-defined problems, development is a change in the way a person thinks and not merely a case of acquiring more knowledge. Therefore, what really matters is how the student would perceive the problem, act on it, observe consequences and apply what he/she learned later (in similar situations). The model specifies how the student would ideally use the system and how the system should assess the student's skills and update the student model accordingly. The pedagogical model adapts instruction following a model of human tutoring expertise that balances motivational and cognitive goals. The Socratic Method is used as the teaching pedagogy weaved within the narrative and provides an adequate medium to enforce learning the required skills and reinforce positive actions. It triggers lively discussion and helps students make choices based on what is right instead of what they can get away with. According to this model, the teacher asks a series of questions that leads the students to examine the validity of an opinion or belief. This is a powerful teaching method because it actively engages the student and encourages critical thinking, which is just what is needed in examining ethics, values, and other character issues. It allows an appropriate amount of choices during ill-structured and authentic investigations that lead to the development of inquiry skills [31]. The Socratic Method displays its strength when students make a bad choice. Through discussion, students should then be forced to face the contradictions present in any course of action not based on principles of justice or fairness. This method requires a delicate balance between letting the students make decisions, and demonstrating the limits in their reasoning. Finally, raising the stakes and introducing consequences is a tactic followed if the student sticks with the unethical choice. For example, if we would like students to investigate the effects of stealing, we could pose the problem of shoplifting and ask what they would do if they are the owners. In Lynch et al. [32], it has been shown that even in domains where it is impossible to make sharp distinctions between good and bad solutions due to the lack of ideal solutions or a domain theory, solution differences are meaningful. In our opinion, the students' answers to a Socratic Dialogue are also meaningful and reflect their own beliefs and thoughts. The pedagogical model runs the educational process effectively without interfering as a tutor; the whole experience along with provided feedback is tailored in the story context. Since the educational object is in the narrative form, the end of this story depends on the student's actions and choices during the learning course. The final event of the story corresponds to a summative feedback that relates the student's actions to the end result.
Innovative Integrated Architecture for Educational Games: Challenges and Merits
9
3.3 The Student Model Student modeling aims to provide a personalized learning process based on the current student's skills. The student model in AEINS is a complex form of the overlay model represented in the form of rules, associated with certainty confidence, to allow access to sufficient data to permit reliable inferences about the student's beliefs. The model is a mix of implicit, structural and background student models. However, this is not an automated model but it is the pedagogical model authority to update the student model under those themes. Nonetheless, the model also constitutes rules that allow inferring more knowledge about the student's cognitive state in order to enrich the model. The model assumes that the student knowledge is a subset of the expert's knowledge and aims to expand the student knowledge until it matches the expert's. AEINS initializes the student model through some preliminary actions that are designed specifically to help infer an initial model of the student and builds a model of the student's learning process. The general inference rules take this form: AEINS-Believes (Aware (S, Y) & is_prereq (Z, Y) -> AEINS-Believes(Aware (S, Z))) 3.4 The Presentation Module The presentation module handles the flow of information and monitors the interactions between the user and the system. Keller's ARCS model [33] provides four classes: Attention, Relevance, Challenge, and Satisfaction/Success that has been considered while designing the edugame interface. This model mainly aims to gain and retain the student's attention and to understand implicitly how the activities relate to their current situations. In addition to making use of surprise an uncertainty is attached with presenting a problem or new situation which helps in capturing the student's attention. At the awaken stage, the interface itself is designed in a way that captures the student’s attention. The playing characters personalities evolve over time, which make their reactions vary every time with respect to their current personality. The variance of the narrative experience is engaging in itself, helps to capture the attention of the student and creates new experiences. At the explain stage, feedback and explanations are given to student within the story context. This fosters some meta-cognitive skills, such as self reflect, data analysis and linking causes and effects. At the Reinforce and Transfer stages, the student has the freedom to trace back his previous actions and the virtual characters' actions. Again, the student is indirectly forced to self reflect and link causes and actions in order to see what walks him to this particular end. As a result, the student is forced to make a conscious choice in terms of ethics. 3.5 The Story World Module The game nature of AEINS allows the existence of non-playing characters acting as pedagogical agents and various objects in the AEINS story world. The purpose of pedagogical agents is not to perform tasks for users or to simplify tasks, but rather to help users learn how to accomplish tasks [34]. They aim to increase problem solving effectiveness by providing students with customized advice [35]. The pedagogical
10
R. Hodhod, P. Cairns, and D. Kudenko
agents in AEINS are semi-autonomous where on one hand they are able to act and react according to their state and the current world state. On the other hand, the story generator can dictate them, when required, what to do in order to preserve the dramatic line and the educational targets. The presence of a continuous story with characters personalities evolving during the story unfold helps the mental and emotional engagement of the student. The AI of the non-playing characters is represented in the form of rules, these rules can be modified during the story unfold as a result of certain actions. For example, a character who is a friend to the student can become an enemy as a result of a student action. Or a holder of unethical moral virtue character can change to be a holder of a good moral virtue as a result of some interactions with the surrounding world. The student and the agents are responsible for the story unfold where it is generated based on their actions. When it is time to present a teaching moment, the currently involved agents in the main story will take the corresponding roles (that fits their current personalities and relationship to the student). If there exist a role that is still needed to be occupied or an agent is not capable to take that role, the story world with the assistance of the story generator will allow the inclusion of another agent smoothly through the narrative. Once the scene is set, the teaching moment starts. As mentioned previously, the predominant teaching pedagogy is the Socratic Method. The holder of the good moral virtue uses the Socratic Voice to provide discussion, hints and feedback to the student. The text dialogue produced encourages the student to think critically in order to solve the discrepancies encountered in the moral situation(s) he is facing. In addition, students have opportunities to choose among different options and to reason about the criteria lead to the chosen option [36]. When the teaching moment ends, the student and the agents are free to act again influencing the main story line. The story world receives the required actions to be executed by different agents and pass this information to the presentation module to be shown to the student. An example of the story world representation is as follows: place(“house”) place(“library”) place(“school”) char_at (“student”, “house”) current_actor(“Gina”) current_actor(“Peter”) character_personality(“Gina”, “sincere_to_her_friends”)
4 Evaluation AEINS is intended to be evaluated for the following aspects: design goals, adaptation, games' features, technical features, social aspects and educational outcomes. Evaluation in the context of learning technology can be described as a process through which the information about usability of a system is gathered in order to improve the system or to assess a completed interface, and the evaluation methods are procedures for collecting relevant data about the operation and the usability of the system [37].
Innovative Integrated Architecture for Educational Games: Challenges and Merits
11
When a novel learning technique is proposed and implemented, it is necessary to compare it with other similar techniques, if possible, to gauge how it improves on previous results [38]. To the extent of our knowledge, AEINS is the only edugame developed to teach children in the ethics domain, therefore a comparison study to other similar platforms is impossible. However, the utilization of individual aspects in AEINS can be compared to others who made use of the same aspects or at least according to the literature definition for these aspects. 4.1 Evaluation of Design Goals Evaluation of any system is the way to prove its effectiveness. The need for evaluation especially arises when a new design paradigm is presented. In order to evaluate the design goals, formative and summative evaluation could be followed. Formative evaluation seeks to identify aspects of a design that can be improved and summative evaluation seeks to gauge a design outcome. A good evaluation method or approach is the one able to play both of these evaluation rules [39]. Another evaluation methods is the pay-off evaluation, which has more to do with the design aspect, the main problem in this evaluation technique is that if a fault is discovered in the design, there is little systematic basis for attributing shortcomings or strengths to specific aspects of the design [40]. To overcome this difficulty, it is useful to address each goal separately and see how the design manages to satisfy the required goals. Intrinsic evaluation can be seen as the suitable method to use, it is concerned with design goals and is interested in the implicit goals embodied by aspects of a design, and makes value judgment about these goals. 4.2 Intrinsic Evaluation Intrinsic evaluation aims to verify that the design goal has been achieved through achieving the following: - The development of a generic architecture based on learning theories. The architecture should exhibit the following: • • •
The creation of a continuous generated narrative that allows the presence of evolving characters. The integration of an intelligent tutor that makes use of a student model to attempt to solve the bandwidth problem and allows adaptation. Addressing the student agency versus tracking the learning process problem.
- The use of the Socratic Method as the teaching pedagogy that helps in developing moral reasoning. - Solving classroom problems such as adaptation to individual students and helping shy students to express their beliefs. The first part considers the architecture design, which is based on the idea of using interactive narrative and problem based learning that suits many ill-defined domains like those mentioned previously. It is generic in the sense that it can be utilized in any system that aims to teach in ill-defined domains such as ethics and citizenship, history, English literature or social behaviors. In addition, educational theories, such as Gagné's nine events, which have been considered during the design phase, whereas
12
R. Hodhod, P. Cairns, and D. Kudenko
Keller's ARCS model has been considered in designing and implementing the presentation model. Moreover, the architecture manages to achieve the goals set for successful educational games. One aspect of the designed architecture lies in the ability of the story generator to produce a dynamic continuous story at run time that allows the interleaving of graph structured narrative(s). This kind of narrative acts as the teaching moments that allows problem solving, learning by practicing and mastering the skills through repetition, yet applied in various situations. The continuous story allows the presence of nonplaying characters whose personalities evolve and change as the story unfolds. They help in providing realism and believability to the story world and helps in supplying education to the student especially through use of the Socratic Voice. The evolving characters help also to provide engagement and commitment to the edugame virtual world. The third aspect will be verified later in subsection 4.5. The fourth contribution deals with the agency problem found in the existing edugames, where the generated narrative in these edugames either produced by continuous planning and loses some aspects of the educational process such as keeping track of the learning process and being able to assess the learner or produced by graph planning that constrains the learner's freedom in order to maintain the educational goals. AEINS succeeds in overcoming this by integrating both graph planning and continuous planning approaches in generating the edugame story, which is a unique feature of AEINS. The former has been used in structuring the teaching moments and the latter was used to generate the story that links the teaching moments together and forms a long continuous story.% that accommodate the Socratic Dialogue. The third aspect deals with the teaching in AEINS. AEINS follows the constructivist teaching approach, where it is not merely teaching the participant about a process or concept undertaken by an ethics teacher, but rather allows him to experience the process directly. AEINS has strong learning objectives underpinned by effective storytelling, where it uses stories and interactive narrative as a source of inspiration and direction for moral conduct. Learners are involved in moral dilemmas that help them to express their own characters through problem solving, decision making, and conflict resolution present in these dilemmas. This kind of problem solving and decision making allows the learner to learn about basic human values including honesty and kindness. The following contribution of AEINS lies in the use of the Socratic Method as its teaching pedagogy in order to help the learners to discover for themselves what knowledge gaps they may have, along with skills they may need to develop. The ability of AEINS to provide learning and/or develop the students' moral reasoning will be discussed in the empirical evaluation section. The final contribution is concerned with solving real-life classroom teaching problems. These problems are tackled through using computers in general and using AEINS in particular. AEINS succeeds to overcome the classroom problems, where it offers learning at the participant's pace, the required privacy and the safe environment within which children can explore. It allows the inclusion of many different dilemmas that the child can interact with and learn from. Most importantly it offers adaptation that provides personalized teaching and feedback. Moreover, the learner is able to interact with the virtual environment, receiving reactions during the interaction course and afterwards about what has happened; form his own hypothesis and re-interact
Innovative Integrated Architecture for Educational Games: Challenges and Merits
13
with the environment, seeing what effect he or she gets and finally treats this effect as feedback and accepts or rethinks his or her original hypothesis. By doing this, AEINS helps the learners to move from the state of making moral judgments to the state of taking moral actions, from the knowing state to the doing state, which we consider a very important step in moral education. AEINS has been tested for its longest learning path wherein experts play the role of users in order to test for code coverage. In this case the student model was initialized with all the moral virtues assigned as `not mastered' and during the interaction course with AEINS, there was no indications that the learner was involved in concept formation; in other words the learner shows misconception and persistent activity while interacting with the teaching moments. Based on this attitude, the pedagogical model meant to present all the dilemmas related to the misconceptions. In conclusion, AEINS successfully provided the longest learning path when required. Although results from the analytical analysis partially confirm the hypothesis of the thesis, empirical evaluation is still needed to fully judge the contributions of this thesis. The next section evaluates AEINS against various game aspects. 4.3 Adaptation in AEINS Evaluation of adaptive systems can provide feedback that can be used to modify the adaptation strategies of the system itself. The adaptation decision-making phase can help in assessing the system's ability at building student models and the supply of a personalized learning process based on these models. This section discusses the importance of the student model and provides evidence for its positive role from the study of the participants' log files. However, we would start with the assumptions upon which we judged the efficacy of the model as follows: •
•
The student modeling has a positive result if the process is able to determine correctly the participant's misconceptions or missing conceptions underlying unethical action or choice and provides the appropriate feedback. The student modeling has a negative result if the process fails or is unable to determine the participant's misconceptions and consequently does not provide the right feedback corresponding to the participant's actions.
The level of success of the student model component depends on how comprehensive the implemented rules are and the complexity of the rules for determining the participant's misconceptions. From the study of the log files, it has been found that the presence of the student model allowed the presentation of the appropriate teaching moments' according to the participant's needs; the rest of the teaching moments were not presented because the participant's learning level did not require them. On the other hand, obviously, with the absence of the student model the teaching moments would be presented in a specific order to all the learners without any considerations to their differences and regardless of their needs. A well designed student model offers good help for a class instructor to use in order to know the participants in his/her class in a better way. It also gives the instructor a guide to the most suitable dilemma(s) to prepare for the next class; a dilemma that addresses the misconceptions of most of the class participants. AEINS was able to
14
R. Hodhod, P. Cairns, and D. Kudenko
produce final summarized report for every single participant that gave information about the student's level and provided a summary about the whole experience. The report also contained information about the teaching moments experienced by the participant, the participant's actions and the system's evaluation for each action. Moreover, the report reflected on the acquired skills of the participants associated with a confidence factor representing the system's confidence that the participant had acquired certain skill(s). With this evidence about how the student model worked in AEINS, we argue that the student modeling has a positive result of the process. In the next section, we will present a deeper study that has been done to the participants’ log files. 4.4 The Analysis of the Log Files We have studied the participants’ log files to investigate the reasoning paths taken during their interaction with AEINS. The main risk when performing experiments\slash evaluation for an educational game that seems to judge the participant personality lies in the fact that the participants may always try to pick the right choices as a result of being observed, for example always not to lie. In other words this means that the majority of the participants would be in the Right-Right cells. Log files were studied carefully in order to examine this theme. Fortunately, for our purposes, this is not the case as can be seen from the tables \footnote{R-R denotes both an initial and final Right action (student adheres to the right choices). R-W-R denotes an initial Right action, followed by a Wrong action(s) and a final Right action. W-W denotes an initial Wrong action and remains devoted to it to the end resulting in a final Wrong action. W-R denotes an initial Wrong action and final Right action} below. The above table provides interesting results. The variance in the start states between right and wrong shows that the participants' felt free in making their initial choices. On the whole the tables show that pedagogical model manages to present the student with the appropriate teaching moments that challenge the participants as more than 50% of the time the participant would go with the wrong choice. Table 2. The reasoning baths for teaching moment 1
Innovative Integrated Architecture for Educational Games: Challenges and Merits
15
Most of the participants take the wrong action at the start of the teaching (36 out of 71 interactions). The majority of the participants express care towards their friends to the extent they can do something which obviously seems to be wrong, however when they realize that what they did was not right they tend to change their behavior and adhere to the right choice (31 out of 71 interactions). The rest adhere to the wrong choice even after being involved in the Socratic Dialogue (5 out of 71 interactions). Other participants appear to pick the right choices and adhere to them to the end. It is fairly hard to exactly identify the reasons for this. It can be either a reflection of their own personalities or because of their awareness of what is expected from them in this experiment or could be they value their friendship very strongly or even just exploring the consequences of their actions (20 out of 71 interactions). Others started with taking the right action, however they seem to re evaluate their decision based on the consequences occurred, for example their friend could be upset so the participant's altered their behavior to please their friend and stick to a wrong choice (4 out of 71 interactions). What is also interesting is the multiple change of behavior within the same teaching moment where the participants start with picking the right choice then alter their behavior for their friend's sake (take a wrong action) then once again after being involved in the Socratic Dialogue they managed to discover the incorrectness\slash contradictions existed in their actions. Eventually, those participants mange to end with the right choice (11 out of 71 interactions). The above results can be visualized in the charts below.
Fig. 3. The occurrences of the right and wrong choices
Fig. 3 shows the 50% of the time chances the participant has for their initial choice. It also clarifies that the majority of the occurrences end with the right choice. Fig. 4 shows that the majority of participants were presented with the appropriate teaching moments that were able to challenge them based on the pedagogical model's decisions. This also indicates the validity of the student model representation that supplies
16
R. Hodhod, P. Cairns, and D. Kudenko
Fig. 4. The occurrences of different cognitive paths
knowledge to the pedagogical model. As seen in Fig. 4, there occurred 36 initial wrong actions and 31 final right actions for those interactions. This shows that the system successfully aided around 86% of those who initially made the wrong choice to discover the contradictions in their course of actions and make the right choice at the end. 4.5 Empirical Evaluation Gena (2005) mentioned that qualitative methods of evaluation are seldom applied in the assessment of user adaptive systems [41], however for the purpose of this study the qualitative evaluation seems to be very appropriate because of the ill-defined nature of the domain of study. Moreover, it is also a powerful method that allows reasoning behind the explicit facts provided by the participants in the study. The following study provides an evaluation for AEINS technical features, social aspects in AEINS and the educational outcomes. The Study A study was conducted with a total of 20 participating children. The study was based on allowing the participants to interact with AEINS in subjective experiences as it is these experiences that need to be captured. The following is a detailed description of the format of the study in addition to a detailed description of the participants. Study Design A full study has been completed to test AEINS for different criteria such as the technical infrastructure, its functioning, and its ability to support or enable specific activities, and generate predicted educational outcomes. The study was conducted on a group of children aged 8 to 12 years to test the hypothesis of building an educational game that is able to provide individualized and personalized learning in the ethics domain, and able to develop new thoughts of the participants. Comprehensive log files are automatically generated by AEINS that detailed every action taken within the game. A CRB clearance has been extracted for this purpose.
Innovative Integrated Architecture for Educational Games: Challenges and Merits
17
In designing this study, it was determined that the best approach was to rely on a qualitative research method that produces a description, usually in non-numeric terms ideal for eliciting users' thoughts. Since the participants were children, the use of indepth, open-ended interviewing seemed the appropriate method to capture the interviewees' experiences and getting into their thoughts on the program being evaluated. It helped the participants to express their program experiences and judgments in their own terms. The resulted data consist of verbatim quotations with sufficient context to be interpretable. In each assignment, the participant was left to explore and interact with the system at their own pace. The children were monitored during their interaction with AEINS to see if one of the following appears: engagement, losing the feeling about the outside world, boredom, or entertainment. The participants were then post-interviewed, the interviews were semi-structured that composed of open ended questions. All discussions were recorded in order to be analyzed later. Participants Twenty participants were assigned to play with AEINS over a number of games. Their age was between 8 and 12 years (15 male, 6 female), with an exception of one participant who was a 7 year old. They were all children from schools in York who were recruited through personal contacts and voluntarily agreed to use AEINS after taking their families permission. Table 3 shows that the participants were of different
Table 3. Demographic data for the participants
18
R. Hodhod, P. Cairns, and D. Kudenko
origins and had different cultural backgrounds. The children speak English as their second language; however they were all at the average level of the language skills required for their ages in their classes. AEINS is built on the universal view of the right and wrong, therefore there was no problem in recruiting children from different back grounds and different cultures as this will not affect how to use AEINS. Materials and Procedures Prior to each experiment, demographic data was collected for each participant and an informed consent form signed by their parents. The participants were interviewed individually. The AEINS environment was briefly introduced to each participant. The participants were encouraged to explore the environment themselves and provided with the required privacy. Participants were explicitly told “Try to be you”, our intention is to encourage them to respond on the basis of their moral convictions, without regard for whether an action is good. The participants’ reactions during their interaction with AEINS were watched and recorded. The participants worked at their own pace and all their actions were recorded by AEINS to be analyzed later by us. AEINS did not allow the participant to change their minds regarding their taken actions, because this is what can happen in real life. Once an action is done, there may not be a chance to redo it or revise it. In this way, the participant will experience the effects of his choices on himself and on others in a way similar to that in a real life context. To evaluate AEINS, post interviews were conducted that focus on five different categories. The first category includes questions related to the technical infrastructure and its functioning. The second category includes questions related to the functions and features inherent in the system and its ability to support or enable a specific activity. The third category includes questions related to the participant tasks. The fourth category includes questions related to the capability for specific technology-based activities to generate predicted outcomes. And finally the fifth category includes questions related to the re-playability and self reflection. The questions in each of these categories are mapped to some other coding questions that are directly related to the research questions needed to be investigated. We used this style in designing our evaluation, because it was difficult to face the participants with such rich questions that, according to their age range, will be difficult for them to understand. So we substituted research questions with some other questions that can easily be interpreted by the children and allow them to express themselves. The answers to these questions help in answering the main research question in a certain theme, an example of this representation is shown in Table 4ta. This type of assessment allows us to cover different aspects about AEINS and the problem space by ensuring that the participants are assessed on their knowledge of the key moral issues relating to the moral situations they faced. Results According to what AEINS aims to achieve and the data provided, it has been found that it will not be interesting to tackle every single question on its own as sometimes some questions did not produce enough rich data. Instead the results are organized
Innovative Integrated Architecture for Educational Games: Challenges and Merits
19
Table 4. Example of post interview analytical questions
around the main themes reflected by the data. These three themes are: AEINS Architecture and implementation, Social aspects in AEINS, and Learning deployed in AEINS and educational achievements. All the sample comments are representative and no negative comments have been suppressed. AEINS Architecture and Implementation. The AEINS interface is a simple point and click interface. However, some participants were slow at the beginning getting acquainted to the rules of the game, but after a short time they became quick and very immersed. The interface uses check boxes to handle the student's actions or choices. It allows mouse clicks to interact with the game world and multiple lines text boxes to present the story and stores every single action in the environment. This allows the learner, at any time, to go and see past actions to solve a conflict or judging certain action based on previous ones. Most of the participants referred to the interface as easy to use, one participant commented on the interface saying P18: “Everything is clear. The reading is quite easy, where lines are under each other, quite separate which make things clear.” It has been noticed that only a few participants struggle with using the laptop mouse such as P4 and P18, which was easily solved by attaching a normal mouse to the laptop. Interacting with AEINS was shown to be an enjoyable experience for most of the participants, AEINS was described by P11 as an environment where you can try wrong things and see what could happen. P5 said the following about AEINS “.... very million times good.” and added “It tries to make you behave well in real life, this is your training to be good.” Another participant said P6: “I enjoyed finding new situations, meeting the characters and solving problems out for them.” and added \begin {quote} “I like the idea of facing situations in different places” Moreover, the story in AEINS has been described as connected by P5, fun as judged by P13 and by P6 as defined and interesting. Another participant added P18: “The whole story is quite organized. It is good and simple.... it gives a variety of options and characters.” The participants asked to have longer time to play with the game adding more situations to interact with and more places (enlarge the environment space) that have
20
R. Hodhod, P. Cairns, and D. Kudenko
realistic pictures with internal views and people acting. This suggests the need for a 3D interface and a bigger world, however it also suggests that they have enjoyed playing with the game and were satisfied with the design of the current moral situations and therefore they are asking for more. In relation to this, the participants were keen to see how the current story (moral situation) will end. This is interesting, because this end represents summative feedback, which is based on all the participant's previous actions in this particular moral dilemma. There are two kinds of feedback, positive feedback in the form of praising the participant for his good attitudes and negative feedback in the form of losing something or losing a friend. Although AEINS has combined two different techniques to generate the narrative as one of its main contributions. No participant has noticed the transfer points of one technique to another as the story was generated smoothly and successfully to accommodate the teaching moments in a seamless way without affecting the learner's experience. Accordingly, we can say that AEINS manages to compromise between giving the learner the appropriate freedom and being able to track and assess the learning process. Use of the save facility in AEINS was admired by the participants, they all agreed on the idea of revisiting the experience, for example P6 said “I like saving the experience to remember what happened in case this comes to me again so I remember what I have done” another participant said P1: “I would refer back to the saved stories to check what I have chosen where I can't remember” We argue that this a critical issue in AEINS, where revisiting the previous experiences allow self-reflection and may judge themselves on the validity of certain actions, in addition to developing or articulating new thoughts and ideas based on existing ones. We think learner's here are attempting the highest levels of the adapted version of Bloom's taxonomy, where they can evaluate actions and develop/create new ideas. This is a recognized result that needs more empirical studies to confirm. Social Aspects in AEINS. The evaluation shows that children appreciate the social characteristic in the system, as they were able to recognize the genuine social aspects and the realism represented in the game. The analytical questions confirm this recognition. For example, participants clearly cared about the outcome as shown from the following quotes: P15: “The best moment was when my parents and my teacher were proud of me because of what I had done.” Another participant, P16, felt {\textbf{good}} when the teacher told the parents that he told the truth and he was rewarded by going on a nice summer holiday. This quote and others like P6: “I was {\textbf{upset}} when my friend said that she will not be my friend anymore.” [boldface added] shows the emotional effect of the game on the participants where they can feel good, bad, scared, surprised. Emotional engagement is another positive point AEINS provides. It seems that AEINS was able to make them feel that they are really involved in realistic situations and consequently they were acting accordingly, which provides more evidence that the participants' were recognizing the social situation and recognizing the non-playing characters as real friends. One of the interviewees said P5: “I felt as if I am in a real world and these characters are really talking to me, they were very believable.” Another participant said P6: “I did not mean to upset my friend, I felt as if it really happened and I had lost my friend who will not talk to me ever again. I think I will be careful next time.”
Innovative Integrated Architecture for Educational Games: Challenges and Merits
21
What was really interesting is the way the participants personalize the non-playing characters in the game. They do not only interact with them as their friends in the game but also they gave them lives and they were picturing how these characters behave beyond these moments. For example, one interviewee said P2: “I do not like Gina when she lies, I want to tell her that this is wrong and she has to stop lying.” The interviewee added “If she keeps doing this now, no one will believe her in the future.” The participants also believe the non playing characters' personalities: they like some and dislike others. One participant said P9: “I like Peter the most, he is funny.” Another participant said P4: “I do not like Gina, she is not a real friend. She always asks me to do wrong things.” and P11: “Gina is a liar.” Another participant said P1: “I want to tell Judy to stop acting like a baby” The realism present in AEINS allows the participants to think about the nonplaying characters as real friends who can feel and expect certain actions from them. For example, one participant quoted P7: “If I choose to be on the side of one friend, the other one could become angry.” Another participant, when asked about the nonplaying characters said the following: P6: “They rely on me. They ask me to solve their problems. They need my help.” However, when asked if any of them has behaved in a strange way, he replied “They are trying to make me cheat, real friends do not do this.” [italics added] Moreover, the participants were treating the situations as real ones and responding to them in a realistic way, for example one participant said “I found the homework situation very confusing.” and when he has been asked why, he replied P2: “When my mum called me to see the TV, I was scared as I still have homework to do and the teacher will figure out the next morning.” [boldface added] One participant felt proud of herself as she supported her friend and left the football game with him when another player was unfair to this friend. Another participant was very confused in the same situation as she was torn between leaving the game and supporting her friend or missing the fun. [italics added] These results reflect an important point that the learners were able to react to aspects of the domain and apply their current and potential capacities in this game. From the participants' answers, it has been figured out that most of them were not treating the game as just a game, they do respect and appreciate the difficult situations they were facing and they tried to prove themselves and use their skills in order to solve the discrepancies faced. This is very promising because this means that the actions taken in the game reflect their real beliefs and this will help us to recognize the real effect of AEINS on them. Some quotes reflect this result, for example when one participant mentioned that she does not like the homework teaching moment and when she has been asked why she answered P9: “I do not like doing homeworks” Also another participant had not tried to go home at all and when asked why, he replied P2: “I do not like going home in general.” This shows that the children were interacting with AEINS in a realistic manner. Although some of the children did not go that far and achieve what their colleagues achieved, we think we are heading in the right direction to tab in this educational field. Some of the children are talking this way and expressing their ideas, P5: “It was really nice solving my friends’ problems.” P6: “It is good to feel that your friends rely on you, and ask you for help when they need to.” This actually can be
22
R. Hodhod, P. Cairns, and D. Kudenko
seen as recognition of abilities and skills of the participants: they felt proud when they succeeded in solving problems and supported their friends. What has been observed here is that the game is not giving certain skills but it empowers the participants to use the skills they have. It also reinforces problem solving skills where learners are forced to solve their friends' problems and helps them to think wisely about the best way to do this, for example this participant did not choose to be on anyone's side as the teaching moment required, he wants to solve the situation by another way as shown in his say P1: “I want to tell them not to be upset, just play, whether to lose or win there is no problem” These results go well beyond the educational theories that state the importance of stories in transferring tacit knowledge and speak rather about relationships and human connections. In addition, being involved in stories and moral dilemmas helps in emphasizing moral behavior and gives the chance to experience various situations and allow participants to take different roles. Learning Deployed in AEINS and Educational Achievements. This theme is very important as it tends to show that AEINS is an effective learning environment and is able to deliver effective learning, in other words, develop the participant's reasoning process. The use of the Socratic Method as the teaching pedagogy shows success. In every teaching moment, since the voice of Socrates comes from one of the involved characters in the moral situation who exhibits certain personality characteristics, mostly one of the learner's friends, to raise the moral conflict, pushes the learner to think harder to solve the discrepancy exist in these situations. For example, from P11's log file, it has been found that the learner followed the following path in the shoplifting dilemma: agree to help his friend to take a chocolate bar without paying for it, then undertake a discussion with the good moral character, who uses the Socratic Voice, the discussion leads to a change in the learner behavior where he admitted he did a mistake and asked his friend to return the chocolate back. Such an attitude reflects the power of the Socratic Method in forcing the learner to face the contradictions present in any course of action not based on good moral principles. In the post interview with P11, he mentioned that he made a mistake by helping Gina (the immoral character in the shoplifting dilemma) to take the chocolate. This goes well with the results obtained from the log file. One participant likes the fact that she can interact with the teaching moments and is able to see the effect of her decisions on herself and others. This interviewee has asked to restart the game when she has been faced by negative consequences as a result of one of her choices. This shows that although the feedback was implicitly provided in the story, it manages to deliver the message (you did something wrong) which was not appropriate to be mentioned in an explicit way. In the post interview, it seems that the interviewee has an explicit representation about taking stuff. This appears in her final comment: P13: “Taking other people's stuff is stealing and we should not take something without asking first.” We claim that the interactive teaching moments were able to provide the appropriate hints about various moral actions and situate the learners in different mental and emotional states. Moreover it allows the learner to attempt the high levels in the adapted version of Bloom's taxonomy such as Analysis, for example the participants were analyzing the situations, where conflict exists, and trying to find a solution to the
Innovative Integrated Architecture for Educational Games: Challenges and Merits
23
current dilemma, as in these quotes: P4: “It was difficult to take a decision as this can make my friend upset'” The participants were also relating to the real world and applying their beliefs, for example participant 17 was nearly choosing all bad actions to do, and accordingly he was faced with negative consequences as a feedback. He said the following in the post interview P17: “I hope if there was no law'” This shows that although he chose to do the bad actions the feedback provided made him think of the law and the consequences of such actions in real life. Another interesting point rose while talking to participant 5 is that they were able to show high intellectual reasoning to provide support to their acts, for example Participant 5 does not like to disagree with his friends as they become angry with him. “I do not want them to stop being my friend.” and when asked if they even do wrong things, he replied “Yes, because everyone does wrong stuff.'” However, Participant 5 does not seem to be worried about other things rather than losing a friend, we claim that some ideas transfer occurred through interacting with AEINS, the following quote supports this claim “I used to lie on my little sister to come out of trouble, now I think with lying I can be in a bigger trouble.” And when asked about what he is going to do now, he answered “Tell the truth.” The presence of the student model provides adaptation in the sense of presenting the teaching moments according to the student's recognized misconceptions. An important point to mention is that this kind of adaptation reinforces re-playability since the student is not presented with all the teaching moments existing in AEINS every time they play. Re-playability can also occur as a result of the variety of the presented teaching moments and the fact of having different branching stories in the single teaching moments. A point was raised by P16 that he would like to try different possibilities for different actions, even if only faced with the same dilemmas he faced before, so he would play daily for about 20 minutes with this game. We think with the presence of richer repertoire of teaching moments, students can spend a quite long time interacting with AEINS. Such practicing through problem solving and the ability of experiencing new things could lead to developing new insights or deeper ones. Transferring the knowledge to the real world is the main aim of AEINS although this is very difficult to assess as it needs very long term evaluation. However, the interviews provided some insight about what AEINS has achieved in this area. It has shown that some of the learners are thinking of taking the experiences from the game to real experiments. For example, when one participant was asked about what she thinks she will take away out of this experience, she answered P7: “I will think about the situations I have been involved in and what can happen if I really get involved into one.” Another participant commented: P6: “I think this can help me solving school problems.” These quotes show the possibility of learning transfer and the sparking of new thoughts and/or deeper ones. This also fits well with [42] in that when people are faced with a new situation in the world, aspects or elements of this situation remind them of aspects or elements of experiences they have had in the past. They use these elements of past experience to think about the new situation. Sometimes they can just apply past experience pretty much as is to the new situation, other times they have to adapt past experience to be able to apply it.
24
R. Hodhod, P. Cairns, and D. Kudenko
Discovery is also another good point AEINS offers as it provides a safe environment for participants to explore. For example, one participant mentioned that he chose to agree with the bad friend in order to see what would happen. On the other hand, another participant thinks that doing a wrong action in the game is just a mistake, but he is aware of not taking the same action in real life. Even this participant has a certain level of awareness, choosing the wrong action in the game will lead to certain consequences that can support his opinion about not to perform the same action in real situations.
5 Conclusion The research done in this paper is a continuation for the literature in the area of educational games and adds more insight into the areas of intelligent tutoring in ill-defined domains. The work describes the positive role of integrating individual components that seems to individually have positive effect when employed in educational games. The integrated architecture constitutes a domain model, a student model, a pedagogical model, a presentation module, a story world and story generation modules. The integration allows the presence of a continuous story that acts as the 'glue' between the learning objects (teaching moments). The main contribution of the presented architecture is its ability on providing an engaging narrative experience along with an effective individualized learning process. The architecture has been designed based on learning theories that point out important features that should exist to well-establish the learning environment as well as the learning objects. We argue that although each individual attribute is not innovative in itself, their integration in one environment is. In ill-defined domains, knowledge to be acquired is more conceptual than perceptual. Accordingly, it is important to provide interaction with just the type of conceptual materials that we want students to learn. An educational game, AEINS, has been developed as a proof-of-concept. AEINS offers the connection between perception and action that is a highly prototypical form of knowledge, represented in the following form of production rules: If this is the current situation, do these. It provides highly engaging environment inhabited with evolving virtual characters with whom the student build relationships and commitments. AEINS is planned to be incorporated within the ethics curriculum at schools. It can assist teachers with its ability of providing summary reports, based on the student model, for individual students. Such reports help the teachers to categorize the students and to identify the students' weak points in a quick and easy way solving time constraints in the classrooms. AEINS has been intrinsically evaluated to ensure the attempt of all the design goals. AEINS has been evaluated against Gee's games criteria [42] and it has been found that the platform satisfies the majority of these specifications. Qualitative measures of motivation and data log files of each participant's interactions with the learning environment were recorded. They confirmed common intuitions about the motivational benefits of educational games. This benefit did not appear to come at the expense of efficiency or quality of learning. We suggest that this motivation to interact with game environments as characterized by high levels of engagement, enjoyment, and perceived challenge may encourage students to continue game-play and
Innovative Integrated Architecture for Educational Games: Challenges and Merits
25
ultimately experience higher learning gains. Finally, AEINS has been empirically evaluated considering the following themes: Architecture and implementation, Social aspects, and educational achievements. AEINS evaluation showed promising results where children were able to build a powerful bridge between their real identity and this virtual identity in the game. They did have emotional responses that transfer their real world responses to the game. This goes quite well with Gee's discussion about learning and identity and his illustration about the importance of the ability of children to build these bridges in order not to make the learning imperiled [43] (p.61). We believe that the interactive dilemmas in AEINS succeeded to induce moral interpretations. What is happening here fits really well with Gee and his theory about "what video games have to teach us" and how learners can be unwilling to put in the effort and practice demanded for mastering a domain if this compelling component is missing [43](p.63). Moreover, the evaluation resulted in useful feedback that helps in modifying the system to be more user friendly and more enjoyable. Overall, we believe this research provides students with a practical means of exploring abstract issues in concrete settings, allows students to practice making ethical decisions in a realistic context and enables them to see various consequences in a safe environment. Future work includes improving the prototype focusing on three aspects, which are: the addition of more life-like attributes to the agents and emotional modeling, the development of an authoring tool to help teachers and the incorporation of a natural language processing (NLP) engine that might facilitate the student-system interaction.
References 1. Lane, H.C., Core, M.G., Gomboc, D., Karnavat, A., Rosenberg, M.: Intelligent tutoring for interpersonal and intercultural skills. In: The Proceedings of Interservice/Industry Training, Simulationand Education Conference (I/ITSEC), pp. 1–11 (2007) 2. Lynch, C., Ashley, K., Pinkwart, N., Aleven, V.: Argument diagramming as focusing device: does it scaffold reading? In: Proceedings of a Workshop on AIED Applications in IllDefined Domains, held at the 13th International Conference on Artificial Intelligence in Education (2007) 3. Scandura, J.: Domain Specific Structural Analysis for Intelligent Tutoring Systems: Automatable Representation of Declarative, Procedural and Model-Based Knowledge with Relationships to Software Engineering. Tech., Inst., Cognition and Learning 1, 7–57 (2003) 4. Ormerod, T.C.: Planning and ill-defined problems. In: Morris, R., Ward, G. (eds.) The Cognitive Psychology of Planning. Psychology Press, London (2006) 5. Lickona, T., Schaps, E., Lewis, C.: Eleven Principles of Effective Character Education. In: Character Education Partnership (2007) 6. McBrien, J. L., Brandt, R. S.: The Language of Learning: A Guide to Education Terms. In: Association for Supervision and Curriculum Development, Alexandria, pp. 17–18 (1997) 7. Bolton, G.: Acting in classroom drama: A critical analysis. Heinemann, London (1999) 8. Shapiro, D.A.: Teaching ethics from the Inside-Out:Some Strategies for Developing Moral Reasoning Skills in Middle School Students. In: Proceedings of Seattle Pacific University Conference on the Social and Moral Fabric of School Life (1999)
26
R. Hodhod, P. Cairns, and D. Kudenko
9. Lane, H.C.: Promoting Metacognition in Immersive Cultural Learning Environments. In: Jacko, J.A. (ed.) HCI International 2009, Part IV. LNCS, vol. 5613, pp. 129–139. Springer, Heidelberg (2009) 10. Riedl, M.O., Young, R.M.: From Linear Story Generation to Branching Story Graphs. IEEE Journal on Comput. Graph. Appl. 26(3), 23–31 (2006) 11. Figueiredo, R., Brisson, A., Aylett, R., Paiva, A.: Emergent Stories Facilitated: An architecture to generate stories using intelligent synthetic characters. In: Spierling, U., Szilas, N. (eds.) ICIDS 2008. LNCS, vol. 5334, pp. 218–229. Springer, Heidelberg (2008) 12. Harless, W.G.: An Interactive Videodisc Drama: The Case of Frank Hall. Journal of Computer-Based Instruction 13(4) (1986) 13. Bradford, W.M., Charles, B.C., Luke, S.Z., Seung, Y.L., James, C.L., Muriel, R.: Towards narrative-centered learning environments. In: Mateas, M., Sengers, P. (eds.) Narrative Intelligence: Papers from the 1999 Fall Symposium, pp. 78–82. American Association for Artificial Intelligence, Menlo Park (1999) 14. Waraich, A.: Using narrative as a motivating device to teach binary arithmetic and logic gates. SIGCSE Bull Journal 36(3), 97–101 (2004) 15. Johnson, W., Marsella, L.S., Vilhjálmsson, H.: The DARWARS Tactical Language Training System. In: Proceedings of the 26th Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC), Orlando, FL (2004) 16. Vilhjalmsson, H., Merchant, C., Samtani, P.: Social Puppets: Towards Modular Social Animation for Agents and Avatars. In: Schuler, D. (ed.) HCII 2007 and OCSC 2007. LNCS, vol. 4564, pp. 192–201. Springer, Heidelberg (2007) 17. Thomas, J.M., Young, M.: Becoming Scientists: Employing Adaptive Interactive Narrative to Guide Discovery Learning. In: Proceedings of AIED 2007 Workshop on Narrative Learning Environments, Marina Del Rey, California, USA (2007) 18. Mott, B.W., Lester, J.C.: U-director: a decision-theoretic narrative planning architecture for storytelling environments. In: Proceedings of the 5th International Joint Conference on Autonomous Agents and Multi-agent Systems (AAMAS 2006), Hakodate, Japan, pp. 977– 984 (2006) 19. McQuiggan, S., Rowe, J., Lee, S., Lester, J.: Story-Based Learning: The Impact of Narrative on Learning Experiences and Outcomes. In: Woolf, B.P., Aïmeur, E., Nkambou, R., Lajoie, S. (eds.) ITS 2008. LNCS, vol. 5091, pp. 530–539. Springer, Heidelberg (2008) 20. Prada, R., Machado, I., Paiva, A.: TEATRIX: Virtual Environment for Story Creation. In: Gauthier, G., VanLehn, K., Frasson, C. (eds.) ITS 2000. LNCS, vol. 1839, p. 464. Springer, Heidelberg (2000) 21. Riedl, M., Stern, A.: Believable Agents and Intelligent Story Adaptation for Interactive Storytelling. In: Göbel, S., Malkewitz, R., Iurgel, I. (eds.) TIDSE 2006. LNCS, vol. 4326, pp. 1–12. Springer, Heidelberg (2006) 22. Bayon, V., Wilson, J., Stanton, D., Boltman, A.: Mixed reality storytelling environments. Virtual Reality Journal 7(1) (2003) 23. Aylett, R.S., Vala, M., Sequeira, P., Paiva, A.: FearNot! – An Emergent Narrative Approach to Virtual Dramas for Anti-bullying Education. In: Cavazza, M., Donikian, S. (eds.) ICVS-VirtStory 2007. LNCS, vol. 4871, pp. 202–205. Springer, Heidelberg (2007) 24. Magerko, B.S., Stensrud, B.S.: Bringing the schoolhouse inside the box-a tool for engaging, individualized training. In: Proceedings of 25th Army Science Conference, Orlando, FL (2006) 25. Mckenzie, A., Mccalla, G.: Serious Games for Professional Ethics: An Architecture to Support Personalization. In: Workshop on Intelligent Educational Games - AIED 2009, Brighton, UK (2009)
Innovative Integrated Architecture for Educational Games: Challenges and Merits
27
26. Dettori, G., Paiva, A.: Narrative Learning in Technology-Enhanced Environments. In: Ludvigsen, S., Balacheff, N., de Jong, T., Lazonder, A., Barnes, S. (eds.) TechnologyEnhanced Learning. Springer, Heidelberg (2009) 27. Conati, C., Manske, M.: Evaluating Adaptive Feedback in an Educational Computer Game. In: Ruttkay, Z., Kipp, M., Nijholt, A., Vilhjálmsson, H.H. (eds.) IVA 2009. LNCS, vol. 5773, pp. 146–158. Springer, Heidelberg (2009) 28. Magerko, B.S., Stensrud, B.S.: Bringing the schoolhouse inside the box-a tool for engaging, individualized training. In: 25th Army Science Conference (2006) 29. Pierce, N., Conlon, O., Wade, V.: Adaptive Educational Games: Providing Non-invasive Personalised Learning Experiences. In: Proceedings of the 2nd IEEE International Conference on Digital Game and Intelligent Toy Enhanced Learning (2008) 30. Elkind, D. H. and Sweet, F.: How to do character education (1997), http://www.goodcharacter.com/Article_4.html 31. Avner, A., Moore, C., Smith, S.: Active external control: A basis for superiority of CBI. Journal of Computer-Based Instruction 6(4), 115–118 (1980) 32. Lynch, C., Pinkwart, N., Ashley, K., Aleven, V.: What Do Argument Diagrams Tell Us About Students’ Aptitude Or Experience? A Statistical Analysis In An Ill-Defined Domain. In: Proceedings of the Workshop on Intelligent Tutoring Systems for lll-Defined Domains, held at the 9th Intenational Conference on Intelligent Tutoring Systems, Montreal, Canada (2008) 33. Mergel, B.: Occasional papers in educational technology. University of Saskatchewan (1998), http://www.usask.ca/education/coursework/802papers/ mergel/brenda.htm 34. Sklar, E.: Agents for education: when too much intelligence is a bad thing. In: Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multi-Agent Systems, Elbourne, Australia, pp. 1118–1119 (2003) 35. Conati, C., Zhao, X.: Building and evaluating an intelligent pedagogical agent to improve the effectiveness of an educational game. In: Proceedings of the 9th International Conference on Intelligent User Interfaces, Funchal, Madeira, Portugal, pp. 6–13 (2004) 36. Kuhn, D.: Science as argument: implications for teaching and learning scientific thinking. Journal of Science Education 77(3), 319–337 (1993) 37. Oliver, M.: An introduction to the evaluation of learning technology. Journal of Educational Technology and Society 3(4) (2000) 38. Karpov, I.V., D’silva, T., Varrichio, C., Stanley, K.O., Miikkulainen, R.: Integration and evaluation of exploration-based learning in games. In: IEEE Symposium on Computational Intelligence and Games, pp. 21–24 (2006) 39. Malone, T.W., Lepper, M.R.: Making learning fun: A taxonomy of intrinsic motivations for learning. In: Snow, R.E., Farr, M.J. (eds.) Aptitude, Learning, and Instruction. Cognitive and Affective Process Analyses, vol. 161 III, pp. 229–253. Erlbaum, Hillsdale (1988) 40. Carroll, J.M., Singley, M.K., Rosson, M.B.: Integrating theory development with Design Evaluation. Journal of Behavior and Information Technology 11, 247–255 (1992) 41. Gena, C.: Methods and techniques for the evaluation of user adaptive systems. Journal of Knowledge Engineering Review 20(1), 1–37 (2005) 42. Gee, J.P.: Learning by design: Games as learning machines. IEM: Interactive Educational Multimedia 8, 15–23 (2004) 43. Gee, J.P.: Are video games good for learning? Keynote address at Curriculum Corporation 13th National Conference, Adelaide (2003)
28
R. Hodhod, P. Cairns, and D. Kudenko
Appendix: A Typical Student-System Interaction Scenario This appendix presents a typical scenario that the children might have encountered playing the game, showing the role of the characters, different modules' roles and some typical teaching moments. The system’s interaction is in normal font. The student's actions are in bold. Comments and illustrations are italicized. At the very beginning the system allows the learner to enter his name and pick a character to represent him/her in the game world. Then the system greets him/her and presents a brief introduction about the game world. Hi Rania! This is your world, please have a look around! You have four places to go to: the house, the shop, the library and the school. There are four characters with whom you can make friends. You have a list of actions to choose your actions from. You are free to play whenever you are ready. Now, it is time to choose your friends... The system presents the characters by name and personalities The user has to choose a categorized yes/no answer, as no free text is allowed in the current version of AEINS Gina is a nice girl, she is sincere. Gina does not accept taking things without permission but she can lie. Do you like Gina and want to be her friend? Yes You like Gina. You and Gina are friends now. Peter is a beloved boy, he is good and sincere but sometimes he cheats. Do you like Peter and want to be her friend? No You do not like Peter. You and Peter are not friends. Judy is a beautiful girl, she does not accept to lie or to take things without permission. Judy is sometimes not sincere to her friends. Do you like Judy and want to be her friend? No You do not like Gina. You and Judy are not friends. John is a funny boy, he is popular. John does not lie but sometimes he can take stuff that is not his. Do you like John and want to be his friend? Yes You like John. You and John are friends now.
Innovative Integrated Architecture for Educational Games: Challenges and Merits
29
After the student chooses his friends, the student model is initialized according to these choices. The following facts are now asserted in the student model (~ denotes 'not') current_student(Rania) playing_char(Carl) AEINS_believes (stud_aware (cheat)) AEINS_believes (stud_aware (sincere)) AEINS_believes (
~stud_aware (do_not_lie))
AEINS_believes (
~stud_aware (do_not_steel))
Based on the above information and that of the domain model, the student model infers new facts as follows: AEINS_believes ( ~stud_aware (do_not_lie) & is_prereq (do_not_lie, honest)) -> AEINS_believes ( ~stud_aware (honest)) AEINS_believes ( ~stud_aware (honest) & is_prereq (honest, trust-worthiness)) -> AEINS_believes ( ~stud_aware (trustworthiness)) The student model is updated by adding the new drawn facts to the current model. Based on the current student model, the pedagogical model chooses a teaching moment: If not (acquired_value ( “Trustworthiness”)) and if not (acquired_value ( “do_not_steel”)) and expertise_level_is (beginner) Then suggested_TM ( “dilemma”,
“TM1”)
If not (acquired_value ( “Trustworthiness”)) and if not (acquired_value ( “do_not_lie”)) and expertise_level_is (beginner) Then suggested_TM ( “dilemma”,
“TM2”)
Trigger: teaching moment TM1 has not been presented and teaching moment TM2 has not been presented
30
R. Hodhod, P. Cairns, and D. Kudenko
and the be_sincere value has not been held yet Action: set priority to teaching moment TM1 It is worth at this point to remind the reader that every teaching moment has two kinds of prerequisites: educational and narrative. The educational prerequisites have been satisfied from the above rules leading the pedagogical model to choose the teaching moment (TM1) to present to the student. It is now time to satisfy the narrative prerequisites that allows the teaching moment to be presented as a part of the continuous story. The pedagogical model send the teaching moment id to the story generator that fetches the narrative preconditions for the required dilemma. These prerequisites are as follows: at_the_shop ( “student”) and at_the_shop (char(X)) and at_the_shop (char(Y)) and friend ( “student”, char (X)) and char_personality (char (X), not(value_hold( “steal”))) and friend ( “student1”, char (Y)) and char_personality (char (Y), value_hold( “steal”)) The story generator considers these prerequisites as the current goals and generates a plan that allows the story to unfold from the current world state to the goals state. Now AEINS asks the learner to either act or allow AEINS to act. Please choose an action or press done for the system's turn. The learner chooses to act, he chooses to invite someone to his house (this is done by choosing one of the actions: the student chooses “invite home”. The system now asks the user to choose whom he wants to invite and then press the “carry my action button.” Now CLICK on whom you want to invite to your home. Then press CARRY OUT MY ACTION button. The learner chooses to invite Gina (this is done by clicking on Gina's picture.) You chose to invite Gina.
Innovative Integrated Architecture for Educational Games: Challenges and Merits
31
Since the agents inhabiting AEINS world are semi-autonomous, they are able to reply directly to the student's latest action through the reactive planner. The reactive planner chooses the highest preference action for the agent. Based on the current status of the student and the agent the following action executes. Gina accepts your invitation. Gina is at your house. AEINS asks the learner to either act or allow AEINS to act. Now choose another action or press done for the system's turn. The learner chooses to allow AEINS to act (this is done by pressing the done button.) As it is AEINS turn to act, the STRIPS-like planner executes the first action in the previously generated plan. Since Gina is already involved in the story and exhibits the required properties required by the narrative prerequisites, the story world does not need to introduce a new character at this stage. Gina: I am going to the store now. Gina is at the store. AEINS asks the learner to either act or allow AEINS to act. Now choose another action or press done for the system's turn. The learner chooses to allow AEINS to act. To satisfy one of the goals, the story world introduces Judy that satisfies the required conditions. Judy: I like you and want to be your friend, do you agree? Yes. You agreed to be friend to Judy. AEINS asks the learner to either act or allow AEINS to act. Now choose another action or press done for the system's turn. The learner chooses to allow AEINS to act: As the narrative preconditions still miss the presence of another agent with the student at the required place, the following action will be executed. Judy: I am the store, do you want to come and join me? Yes. You agreed to join Judy. AEINS asks the learner to either act or allow AEINS to act. Now choose another action or press done for the system's turn.
32
R. Hodhod, P. Cairns, and D. Kudenko
The narrative preconditions of the teaching moment have now been satisfied, the learner and two friends of his are all at the store. Since the preconditions and actions are represented by variables, this allows different characters to join the learner in the future. The teaching moment starts as follows: Gina is looking around in the store and they saw the kind of chocolate they like. Gina has no enough money to buy the chocolate Gina has an idea!! they can take the chocolate if you helped them to distract the shop_keeper. Gina: Could you please help me and distract the shopkeeper's attention? I really want that chocolate bar. Yes. You agreed. AEINS reflects on the user's action by stressing on his chosen action. You helped Gina to take the chocolate. AEINS recognizes that the learner's attitude is not a desirable one. Therefore, Judy, the one who holds the do_not_steal principle, will start the Socratic Dialogue in this dilemma aiming to guide the student to figure out why his action was not right. Judy: I do not like what you did, this is not good. Am I right? No. You disagreed. The teaching moment unfolds according to the student's answers to Judy. Judy will follow the Socratic Dialogue and raise the ante when necessary as follows: Judy: It is really bad to take stuff that is not ours without permission. You will be a law breaker, do you agree? No. You disagreed. Judy: I am surprised as you do not think that people who take things which are not theirs deserve to be punished. I wonder what if the shop was yours, will you still think the same? No. You disagreed. Judy: It seems that you only care about your own things and not about others, am I right? No. You disagreed. Judy: I do not think you do really care. If you do you would not harm people by taking their stuff without their permission. Am I right?
Innovative Integrated Architecture for Educational Games: Challenges and Merits
33
No. You disagreed. Judy: I will not be your friend any more if you insist on what you did. Now, do you want to tell Gina to return the chocolate back? Yes. You agreed. Judy: I admire what you did. As you see, the shopkeeper caught Gina and punishes her for what she did. The teaching moment ends at this point and the pedagogical model assessed the student's actions as follows: If action (“TM1”,
“agree_to_steal”)
and if action ( “TM1”,
“insist_on_stealing”)
and if
~action ( “TM1”,
“agree_steal_is_bad”)
and if
~action ( “TM1”,
“take_stuff_to_owner”)
Then skill ( “do_not_steal”,
“acquired”, 0.2)
The value 0.2 is the system's confidence of the gained skill, which is lower than a prespecified threshold in this example. Based on this information the pedagogical model updates the student model by asserting the following rule(s): skill ( “do_not_steal”,
“not_acquired”, 0.8)
According to the updated student model, the student has misconceptions with the do_not_lie value (old piece of information from the instantiated student model at the beginning of the game) and the do_not_steal value (new information from the updated model). Based on this information the pedagogical model can choose either a teaching moment that deals with the same value (do_not_steal) or a teaching value that considers the (not_lie) value as follows: If not (acquired_value ( “Trustworthiness”)) and if not (acquired_value ( “do_not_steal”)) and presented (TM1) Then assess_student_skill_level ( “do_not_steal”) assess_student_skill_level ( “do_not_steal”)-> skill ( “do_not_steal”, X > 0.7
“not_acquired”, X) and
34
R. Hodhod, P. Cairns, and D. Kudenko
Then suggested_TM ( “dilemma”,
“TM5”)
If not (acquired_value ( “Trustworthiness”) and if not (acquired_value ( “do_not_lie”)) Then suggested_TM ( “dilemma”,
“TM3”)
It is worth to mention that these are non-deterministic rules where more than one solution can be obtained. According to the fired rules, two teaching moments are suggested: TM3 and TM5. The current pedagogical model chooses randomly one of the teaching moments to present. The chosen teaching moment id will be send to the story generator to construct a new plan. Now AEINS asks the learner to either act or allows AEINS to act. Please choose an action to perform or press done for the system's turn. AEINS continues interacting with the learner based on the student model.
Beyond Standards: Unleashing Accessibility on a Learning Content Management System Silvia Mirri1, Paola Salomoni1, Marco Roccetti1, and Gregory R. Gay2 1
Department of Computer Science, University of Bologna, Bologna, Italy {silvia.mirri,paola.salomoni,marco.roccetti}@unibo.it 2 Inclusive Design Research Centre (IDRC/IDI), OCAD University, Toronto, Ontario, Canada
[email protected]
Abstract. Standards are typically conceived as a means of inclusion, where the term inclusion can refer either to an economic scenario or a social one. They represent a pattern, a paradigm, or an archetype to be wrapped around some kind of reality. Standards related to the Internet and its applications are explicit sets of requirements to be satisfied. Applying and implementing such standards reveals their capabilities to definitively satisfy their goals, beyond the authoritative principles they implicitly carry on. This paper explores questions and perspectives about the implementation of two accessibility standards in an e-learning platform, achieving inclusion both of the standards and their goals to provide accessibility. Their actual implementation in the LCMS ATutor reinforces considerations about inconsistencies and points out some aspects which may otherwise not be glaring. In order to offer enhanced accessibility, some adjustments have been applied in the implementation phase, as the paper describes. Keywords: Design, Human Factors, Standardization, Learning Content Management Systems, E-learning Standards, Instructor Interfaces, Personalized E-Learning.
1 Introduction Standards refer to processes and products, as well as their description (which could be used as metadata about their properties and content). Nowadays they are crowding up the e-learning technologies and applications world [1]. Goals of inclusion are surely reliant upon compliance with one or more standards, but the contemporary presence or overlapping of such standards can reveal inconsistencies, discrepancies or short falls, contradicting their implicit aim. The process of constructing metadata, data structures, and communication protocols according to any imposed pattern or archetype (as we can conceive a standard) opens a range of issues and perspectives about standardization. Standardization is involved also in making learning content work across different platforms and to be accessible to the widest number of learners. Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 35–49, 2011. © Springer-Verlag Berlin Heidelberg 2011
36
S. Mirri et al.
E-learning content and e-learning platforms could obviously be used with older technologies or configurations, making them less available to users who have limited access capabilities or who are using non-standard computer equipment. For example, learners with disabilities who need assistive technologies can greatly benefit from elearning because it allows distance and flexible learning activities and helps them to access resources which would otherwise present significant barriers [2]. Starting from 2002, the IMS Global Learning Consortium began defining a set of specifications that attempt to address the personalization or transformation of learning content [3]. Recently – at the time the authors were writing this paper – a new standard emerged on the accessibility scene: the ISO FDIS 24751 Accessibility standards [4]. Both IMS specifications and the newer ISO standards are based on learners’ profiles and the description of the learning content through metadata. This paper describes the implementation of the two aforementioned standards in ATutor [5], an open source Learning Content Management System (LCMS) used worldwide. The exhaustiveness of each standard, and their coexistence, related to the real and effective improvement of accessibility, has been taken into account during their design, implementation and assessment on ATutor. Implementing the IMS specifications and the ISO standards on such a LCMS represented the ideal case study for emerging open issues. In particular, inconsistencies between the learner profile and content metadata standards, after being applied to specific situations, have implied the adoption of suitable strategies warranting the compliance to both kinds of standard at the same time. As a result, ATutor now provides authors with the means to add alternate forms of content to their learning materials, such as caption for a movie, or a transcript for an audio presentation, or a textual description as an alternate for a flash presentation. Moreover, learners can define in their preference settings how the ATutor environment is displayed, and declare which forms of content they prefer so it is adapted to each individual user’s learning preferences. This implementation has been included as a standard feature in ATutor beginning with version 1.6.2. The remainder of this paper is organized as follows. In Section 2 we outline the standards involved by comparing the IMS AccessForAll specifications and the ISO FDIS 24751 Accessibility standards. Section 3 presents the main related work implementing the IMS AccessForAll specifications in e-learning environments and other applications, while section 4 details design and implementation issues. In Section 5 a use case is presented. Section 6 discusses new considerations about learner profiles and learning content metadata specifications and standards. Section 7 concludes the paper.
2 Background Among the many e-learning standards, some specifications are aimed at accessibility. One, the IMS AccessForAll specifications is based on: •
learner profile metadata, to describe user preferences and needs (IMS ACCessibility for Learning Information Package (ACCLIP) [3]);
Beyond Standards: Unleashing Accessibility on a Learning Content Management System
•
37
content metadata, to label content resources (IMS ACCessibility for MetaData [ACCMD] [6]).
By matching these specifications, it is possible for e-learning content authors to provide alternative formats of the same content, to permit users’ profile declarations, and to automatically configure content for each individual learner. At the time of writing, a new set of standards had emerged into the e-learning accessibility scene: the ISO FDIS 24751 Accessibility standards [6]. Moreover, the IMS AccessForAll specifications were in transition from the 1.0 to the 2.0 version, and the latter will be based largely on the newer ISO standard. The implementation of accessibility metadata and profiling in ATutor has been designed with both ISO and IMS in mind. The approach of the ISO FDIS 24751 Accessibility standard is similar to the IMS AccessForAll specifications. The ISO Digital Resources Description (DRD) [7] claims the same objectives as the IMS ACCMD, and the ISO Personal Needs and Preferences (PNP) [8] is similar to the IMS ACCLIP. Both the ISO DRD and the ISO PNP can be used independently (for instance, the PNP could be used to deliver the required or desired user interface to learner/user), or in combination the former with the latter, to deliver digital content that meet a user’s needs and preferences. Both the ACCLIP and the PNP specifications define the required elements to describe the accessibility preferences of the learner or user, which can be grouped into four sections: 1.
2.
3.
4.
Display information, which includes data about how the user interface and content should be presented and structured. These elements describe how the user prefers information to be displayed or presented; possible requirements include preferences related to cursors, fonts and colours characteristics. Figure 1 shows the Display Settings screenshot of the ATutor user preferences system. Control information, which defines how learners prefer to control resources and how they are operated; e.g., it is possible to define preferences related to navigation elements and to standard keyboard usage, to declare the need for non typical control mechanism, such as an onscreen keyboard, alternative keyboard, mouse emulation, alternative pointing mechanism or voice recognition. Content information, which describes which alternative resources the learner requires; e.g., it is possible to define how to present visual, textual and auditory content in alternate modalities. Accommodations information, which allows recording of requests for and authorization of accessibility accommodations for testing or assessment; for instance, it is possible to declare the request for a particular accommodation and its description.
An ACCLIP profile could be presented to an e-learning application by a learner using a smart card, a memory stick, automatically retrieved from a database or by declaring his/her preferences though a Web interface. The system in turn would serve up the appropriately customized content adapted specifically for that person, according to the accessibility metadata the content author has defined through the IMS ACCMD.
38
S. Mirri et al.
The IMS ACCMD and the ISO DRD specifications group resources into two categories: original (the initial or default resources) and adapted resources (they address the same learning objective as the original resources, but offering the same meaning in alternative forms). Metadata can be used to describe the actual sensory requirements necessary to access the resource and to describe the relationships between originals and their related alternatives.
3 Related Work There are several works related to the adaptation of the learning resources and/or to the profiling of learners needs and preferences, in both research and applied science. Due to its very recent release, none of the previous work takes into account the ISO Accessibility standards. Instead, they are mainly based on the IMS AccessForAll 1.0 specifications. First we will consider works about the matching between the IMS ACCLIP [3] and the IMS ACCMD [6], and then those that deal with each specification separately. First let us consider a project that takes into account both components of the IMS AccessForAll specifications. The Inclusive Learning Exchange (TILE) [13] is a Learning Objects (LOs) repository that stores objects as atomic files, along with their general and AccessForAll metadata. Whenever content authors use the TILE authoring tool to compose and publish learning objects, they are supported in creating and appropriately labeling transformable aggregated lessons (codified by the TILE system using ACCMD). Learners are able to define their preferences, which are stored as IMS ACCLIP records. It is worth noting that TILE is a content repository, thus it is usually linked into an LMS or LCMS through which they can to import and/or export learning objects. Different outcomes emerge from the other projects matching IMS ACCLIP and ACCMD. Works claiming different goals highlight other types of shortcomings and limitations in these specifications. For instance, in a recent study [14], the authors point out some limitations related to the following aspects of the specifications: •
•
information about learners’ preferences and needs is not static, but evolve with time; moreover, some ACCLIP information could not be known by the user in advance and creates a gap in the profile and subsequently in the content adaptation; there is a strong relationship between the capabilities offered by the device used by learners and the effective e-learning accessibility, but there is not a direct match between the information managed by both specifications.
Characteristics of device capabilities are one of the focuses of [15] as well, in which the gap in learners’ profiling specifications emerges as a critical point, above all in mobile-learning environments (in which learning experience can be enjoyed by exploiting mobile device capabilities). A solution to this issue has been proposed in [12], by considering W3C Composite Capabilities/Preference Profile (CC/PP) standard [16] in order to describe device characteristics in addition to user needs and the software environment, so as to provide a more complete learner profiling mechanism.
Beyond Standards: Unleashing Accessibility on a Learning Content Management System
39
Different considerations come out of other works (including [14] and [17]) that match ACCLIP and ACCMD. They adopt concepts and standards related to the semantic Web in describing textual and contextual information in a standardized manner, promoting accessibility and also content reusability. Among the works related to learner profiling, the Web-4-All project [18] emerges. It allows users to automatically configure a public access computer by using a profile described with ACCLIP and stored on a smartcard. The main idea is that each user can freely switch from one public workstation to another, thanks to the data stored on the smartcard. Whenever it is read by a public workstation, the Web-4-All software automatically configures the operating system, the browser and the assistive technologies. This project applies the whole IMS ACCLIP specification. Actually the IMS ACCLIP could be used in more than just e-learning, but also in many other contexts. It is worth mentioning that the Web-4-All software is tailored for a very specific environment, in which workstations are equipped with a smartcard reader and with suitable assistive technologies. Moreover, it does not consider the need to adapt content in different formats; hence it would have limited usefulness in a more common Web-based learning context. Obviously, in such a case, gaps in specifications and inconsistencies with other standards do not come into play, as well as in the other works, which are devoted to investigate or apply only the learners’ profiling specifications. A mirror-like situation arises from those works related to the content metadata addition.
4 Design Issues and Implementation The main aim of this project is the implementation of the IMS AccessForAll specifications and the ISO FDIS 24751 Accessibility standard in ATutor [9], used to display content based on user preferences. This has been involved: implementing a utility to define user preferences; adapting the ATutor Content Manager to implement the ISO DRD and the IMS ACCMD which retrieves users specific content based on their preferences, in order to allow authoring adapted content; modifying the import/export tool, in order to allow the adoption of new metadata devoted to describe primary and equivalent resources. In the following subsections the implementation of such functionalities in ATutor will be described. 4.1 Implementing Learners’ Profiles In order to declare users’ preferences, personalized settings have to be extended the already existing ATutor users’ profile [10]. In particular, user preference area allows declaring five types of personalized settings: • •
ATutor. The preferences prior to IMS AccessForAll specifications and to ISO FDIS 24751 Accessibility standards implementation, for controlling various ATutor related functionality. Display. Display preferences are applied by generating an embedded stylesheet in the head area of ATutor, which overrides the system display settings with those
40
•
•
•
S. Mirri et al.
of the user. Users can control various fonts and colour settings, which replace those same styles defined in the theme being viewed. Figure 1 shows a screenshot of ATutor Display Settings User Preferences. Content. Control over display of adapted content: visual, audio, textual adaptations; to replace or supplement original content; choosing a preferred language for adapted resources. A screenshot of the ATutor Content Settings user preferences is depicted in Figure 2. Content preferences are handled by ATutor's primary output parser. The output parser's function is responsible for the content adaptation, by checking if a content preference setting is enabled in the array of preference settings described above, and retrieving the appropriate adaptations if they exist. If no adaptations exist, the original content is displayed. The screenshot in Figure 2 shows preference settings for alternatives to text, audio and visual, selecting whether to apply the adaptation, choosing which modal adaptation to replace the file with, and selecting the preferred language the resource should appear in. Tools. Learning scaffolds displayed, such as a dictionary, calculator, or encyclopedia. The first implementation of Tool preference is a simple static list of common learning scaffolds. An administrator defines the URLs to versions of Web based tools, such as an online dictionary, or calculator, for instance. When a user selects any of the available tools under the Tools tab of the Preferences screen, they appear as links to the external tools in a side menu block. Controls. Display of navigation elements such as breadcrumbs, sequence links, tables of contents. These settings allow a user to control which navigation elements are displayed: breadcrumbs, sequence links, and a table of contents at the top of each content page. Implementing these control settings involve modifying ATutor themes, adding in several conditional statements (if/then) that check to see if a preference is set before displaying these navigation tools. The previous statements are added to theme header template files. Figure 3 shows a screenshot of ATutor Control Settings User Preferences.
4.2 Implementing Authoring Adapted Content The ATutor Content Editor has had a new Tab added, Adapted Content, through which content authors can assemble alternatives or supplements for original pieces of content. ATutor content pages are parsed to identify the files linked into each content page as a whole. First the Original Resource Types are defined: auditory, textual, and/or visual. A radio button for one of the resource files from the original content is selected on the left, and an adapted resource file is selected from the file manager to the right, then the Add button can create an association between the files. The Adapted Resource will then appear as a subsection to the original resource that appears above. Select the available Adapted Resource Types to define the access modality of the adapted resource, and select the language of the resource. A variety of different adaptations can be added for each file in the original resource. Figure 4
Beyond Standards: Unleashing Accessibility on a Learning Content Management System
41
shows a screenshot of the ATutor Adapted Content Editor: original resources are listed on the left side of the page, while the course File Manager is available on the right side; through this tool it is possible to define resource forms or types, and relationships between originals and alternatives.
Fig. 1. The ATutor Display Settings User Preferences
Fig. 2. The ATutor Content Settings User Preferences
42
S. Mirri et al.
Fig. 3. The ATutor Control Settings User Preferences
Fig. 4. The ATutor Adapted Content Interface
4.3 Modifying Import/Export Tool IMS AccessForAll and ISO FDIS 24751 Accessibility metadata have been integrated into ATutor content packaging. Content authors can choose to export adapted content with the content packages they distribute, and instructors and course designers can choose to import adapted content when restoring content packages into their course learning materials. This extension of the IMS Content Packaging [11] in ATutor makes possible to share adapted content once it had been created and to maintain relationships between original and adapted resources also in other IMS compliant LCMSs. Figure 5 shows a screenshot of the enhanced ATutor import/export tool. Authors and users can choose to export and/or import learning content maintaining accessibility metadata or not.
Beyond Standards: Unleashing Accessibility on a Learning Content Management System
43
Fig. 5. The ATutor import/export tool
5 Use Cases This section is devoted to describe some use cases, in order to show how the ATutor LCMS could be used, exploiting the new accessibility features. In particular, we show a content author who adds some adapted resources to an original one in a didactical material and two students who access the same content in different formats, according to their preferences and needs. The content author is the lecturer of a course in an Italian Master degree in Elearning. This course is named “Multimedia Systems” (in Italian “Sistemi Multimediali”) and it is devised in blended learning. The LCMS ATutor is exploited to host and deliver the e-learning content. Through the ATutor content editor interface, the lecturer: • • •
authors the didactical materials and, by using the Adapted Content tab; declares the natural language and the types (auditory, visual, textual and sign language) of original and adapted resources; adds new adapted resources, she creates a relationships between original and corresponding adapted resources.
In particular, she adds an auditory adapted resource, which corresponds to a speech description of a JPG image in the original content (see Figure 4). When this user accesses the digital lecture, then the ATutor LCMS automatically transform the content and it provides the auditory alternative (when available), replacing each image (Figure 7). A blind student accesses the course and declares his profile through the Preferences interface, setting his need for auditory adapted resources to replace visual original ones in the Content Settings Tab (Figure 6). Users with low vision would find great benefit by asking in their profile to append an alternative to the original visual resources, so that they could enjoy the images together with their proper auditory or textual description. An example is shown in Figure 8. As
44
S. Mirri et al.
we have discussed in Section 3, even if such a feature is not included in user profiling standards, we have added the possibility to specify in users would like the adapted resource to replace the default one or be appended (see Figure 6 and Figure 8).
Fig. 6. Blind user’s content settings preferences
Fig. 7. A screenshot of an adapted content for blind users (in Italian language), where the auditory alternative has replaced the visual original resource
Beyond Standards: Unleashing Accessibility on a Learning Content Management System
45
Original resources are maintained when users have declared no settings in their preferences profile or when there is no match between users’ needs and alternatives types the authors have added to the original content.
Fig. 8. A screenshot of an adapted content for users with low vision (in Italian language), where the auditory alternative has been appended to the visual original resource
6 Discussion As already detailed in Section 2, the ISO Accessibility standards aim to integrate the IMS ACCLIP and the IMS ACCMD respectively into the ISO PNP and the ISO DRD set of specifications. Differences between IMS and ISO are primarily in the language used: the IMS uses terms such as “Primary”, “Secondary”, “Alternate” resources, the ISO defines such resources as “Original” and “Adapted”. Moreover, the IMS refers to “visual”, “auditory”, “tactile” and “textual” as “Content Types” , while the ISO refers to the same terms (and also to olfactory) as “Access Modes” or the senses through which content is experienced. As a synthesis, in order to obtain compliance with both standards, learners' needs and preferences on one side, and content metadata about accessibility on the other, they must coexist and be mutually consistent with each other. Indeed, limitations and mismatches have been found and analyzed during the implementation of the IMS specifications and the ISO standards in ATutor, and potential solutions have been proposed to fill some of these gaps. The following subsections detail and critically analyze, by examples, such limitations and inconsistencies, and follows with proposed solutions.
46
S. Mirri et al.
6.1 Learners’ Needs and Preferences With both IMS ACCLIP and ISO PNP, learners can declare which kind of adapted resources they prefer or need in place of a particular type of original content. Text may be preferred or needed instead of visual resources and/or audio might be preferred over text or images, and so on. According to both of the standards, learners can explicitly declare in their preference profiles only one alternative access modes for each form of resources. For instance, blind learner could state he/she needs to access original visual resources only as auditory or textual alternative content. Such a one-to-one relationship does not allow further choices: the blind user for instance, might request audio files describing images, but if such alternatives are absent, he/she cannot choose a text description instead (to be read by a screen reader). Any implicit extension of the choices, so as to describe an alternative having, in turn another alternative, and so on, might produce a loop among available (and not available) forms of resources. Furthermore, both standards do not deal with sizes or quality of video and audio resources: it is not possible to request a degraded version of a clip or an audio file to be adapted to the device being used. In implementing IMS ACCLIP and ISO PNP in ATutor, the above limitations remained open issues for future work. As a summary, a suitable mechanism to identify loops in one-to-many relationships among requests will have to be designed, while profiling of resources related to their quality will be taken into account, as other standard, i.e. CC/PP, do [12]. 6.2 Content Metadata From the content metadata point of view, providing information about alternatives to resources is the aim of the IMS ACCMD specification and the ISO DRD standard. Any content being presented can be identified as having a primary form and adapted ones, depending on its media type. In ATutor, such information has been used so as to present adapted content, based on a user’s profile. Typically, learning content may be made up of many resources. Aggregated and complex learning content has to be disassembled into separate parts to be designated as primary or alternative forms of content, and match users' preferences and needs. Each separate part is thought of as an atom, which could be considered as a whole to an alternative form of it. Limitations about the capability of exhaustively providing metadata arise whenever authors would provide alternatives both to the whole original aggregated content and to each single part that makes up the whole resource. Let us consider, as an example, an HTML document containing pieces of formatted text and JPEG images. At the atomic level, accessibility metadata for each JPEG would contain the attribute which defines the visual nature of the resources and possibly their textual alternatives. Assuming each component of the HTML document has complete accessibility metadata, they can be matched with the learners’ personal needs, making it possible to adapt the content and present alternatives to the JPEG images, if the learner has requested them. Unfortunately, due to definitions in the ISO standards and the IMS specifications documents, it is not possible to declare pieces of formatted text in an HTML
Beyond Standards: Unleashing Accessibility on a Learning Content Management System
47
document as original resources, if they are not in separated single files. This means it is possible to provide alternatives only to external textual files, such as PDF, DOC, RTF or TXT files, whenever they are linked to the HTML document. This poses a problem when trying to meet the needs of some particular learners’, for instance users with reading disabilities. Providing adaptations for this kind of resource is a challenging aspect of implementing the IMS and ISO accessibility standards. A feasible solution seems to be defining the whole aggregated object as atomic, by declaring all the suitable attributes in order to identify its form, and by adding alternatives to it as a whole, but some problems make it ineffective. First, this is possible only if we give up the idea of providing alternatives to each single file which the whole content addresses to: only one alternative will be displayed when the content is viewed. Let us consider an author who defines an audio alternative for a visual resource in a page, then defines another audio alternative for a primarily text page as a whole. In this case, ACCMD and DRD metadata are not sufficiently to determine which auditory resource has to be displayed to learners who declare to need audio alternatives to both visual and textual original content. Second, the more the aggregated resource is composed of different kinds of media, the more providing appropriated alternatives to it as a whole would be difficult. In our accessibility standards implementation, the initial idea of creating an alternative for text content on an aggregated resource was to produce a full alternative (an aggregated resource itself) for the original one. This idea has been set aside, because it created conflicts with the other types of adaptations of content, due to the definition of standards specification, as already mentioned. Finally, other limitations emerge in providing metadata on multimedia objects. According to the standards, they are dealt with as a sort of “black-box”. On one hand, some multimedia formats and standards are thought as referring to single files (i.e. SMIL and MPEG 7), so as to let their disaggregation. On the other hand, Flash applications or MPEG2 clips are examples of potentially indivisible objects, which have to be considered as atom elements. Even when multimedia objects are recognized as aggregated resources, standards do not allow one to declare subsets of single resources as atomic, and to define alternative to individual atoms. A sequence of audio files cannot be identified as a single resource and a video with sign language content cannot be defined as an alternative to it. Vice versa, a subset of adapted resources cannot be declared as an alternative to a single resource. As an instance, a sequence of images cannot be declared as an alternative to a Flash animation. Such limitations prevent providing different alternatives to multimedia object with different levels of disaggregation so as to meet a wider range of users’ needs, which surely improve the accessibility and the personalization level of the learning materials. 6.3 Learners’ Profiling and Content Metadata Standards Asymmetry All considerations outlined above represent gaps or limitations of each of the standards – IMS ACCMD and ISO DRD on one side, and IMS ACCLIP and ISO PNP on the other, considered separately. Also an inconsistence between the IMS and ISO standards has be found and analyzed. It is related to the declaration of supplementary or replacement alternatives
48
S. Mirri et al.
to resources. Following the IMS ACCMD and ISO DRD standards, it should be possible to define an adapted resource as a supplement to the original one. This means that, in order to meet users’ needs, the adapted resource should be offered together with, and not as a substitute for the original file. On the other hand, the IMS ACCLIP and ISO PNP do not consider such information. Hence, even if the authors declare supplementary resources, there is no way to define information addressable by users’ preferences about them. In several cases, providing both the original resource and the additional one would better meet learners’ needs. As an instance, users with low vision could benefit from the provision of a visual resource together with its textual or audio alternative: those learners can see the image by using their screen magnification applications and take advantage of the related adapted resource, better understanding the meaning of content. For the same reason, learners with reading disabilities would often prefer to read textual resources with some assistive technologies (Text-ToSpeech based, for instance) and can benefit from an alternative visual resource, allowing them to experience the content through multiple sensory modalities. In order to offer more custom-made content, in ATutor users have the opportunity of specifying if they would like the adapted resource to replace the original one or be appended to it as shown in Figure 6 and in Figure 8. This solution does not affect content interoperability, because it is only related to users’ profiling specifications, instead of accessibility metadata standards. In fact, accessibility metadata may be added to content packages (as well as the content itself), according to e-learning interoperability standards. This allows exporting content that, in turn, could be imported into another standard compliant LCMS, without loss of information. New (non standard) metadata could not be understood by other compliant LCMSs.
References 1. Devedzic, V., Jovanovic, J., Gasevic, D.: The Pragmatics of Current E-Learning Standards. IEEE Journal of Internet Computing 11(3), 19–27 (2007) 2. Barron, J.A., Fleetwood, L., Barron, A.E.: E-Learning for Everyone: Addressing Accessibility. Journal of Interactive Instruction Delivery 16(4), 3–10 (2004) 3. IMS Global Learning Consortium: IMS Learning Information Package Accessibility for LIP (2002), http://www.imsglobal.org/specificationdownload.cfm 4. International Organization for Standardization (ISO): ISO/IEC 24751 Information technology – Individualized adaptability and accessibility in e-learning, education and training (2008) 5. Inclusive Design Research Centre (IDRC/IDI): ATutor (2010), http://atutor.ca 6. IMS Global Learning Consortium: IMS AccessForAll Meta-Data (2004), http://www.imsglobal.org/specificationdownload.cfm 7. International Organization for Standardization (ISO): Information technology – Individualized adaptability and accessibility in e-learning, education and training – Part 3: “Access for all” digital resource description (2008a) 8. International Organization for Standardization (ISO): Information technology – Individualized adaptability and accessibility in e-learning, education and training – Part 2: “Access for all” personal needs and preferences for digital delivery (2008b)
Beyond Standards: Unleashing Accessibility on a Learning Content Management System
49
9. Gay, G.R., Mirri, S., Salomoni, P., Roccetti, M.: Adapting Learning Environments with Access For All. In: 6th ACM International Cross-Disciplinary Conference on Web Accessibility (W4A 2009) - 18th ACM International World Wide Web Conference (WWW 2009), Madrid, Spain, April 2009, pp. 90–91. ACM Press, New York (2009) 10. Mirri, S., Gay, G.R., Salomoni, P., Roccetti, M.: Meeting learners’ preferences: implementing content adaptability in e-learning. In: New Learning Technologies Conference (2009) 11. IMS Global Learning Consortium: IMS Content Packaging Specification (2003), http://www.imsglobal.org/content/packaging/index.html 12. Salomoni, P., Mirri, S., Ferretti, S., Roccetti, M.: Profiling Learners with Special Needs for Custom E-Learning Experiences, a Closed Case? In: International Cross-Disciplinary Conference on Web Accessibility (W4A 2007) – 16th ACM International World Wide Web Conference (WWW 2007), Banff, Alberta, Canada, April 2007, pp. 84–92. ACM Press, New York (2007) 13. Nevile, L., Cooper, M., Health, A., Rothberg, M., Treviranus, J.: Learner-centred Accessibility for Interoperable Web-based Educational Systems. In: 14th International World Wide Web Conference. ACM Press, New York (2005) 14. Santos, O.C., Boticario, J.G.: Improving learners’ satisfaction in specification-based scenarios with dynamic inclusive support. In: 8th IEEE International Conference on Advanced Learning Technologies, pp. 491–495. IEEE Press, New York (2008) 15. Anido-Rifon, L.: Accessibility and Supporting Technologies in M-Learning Standardization. In: 3th IEEE International Conference on Systems, pp. 162–167. IEEE Press, New York (2008) 16. World Wide Web Consortium: Composite Capability/Preference Profiles (CC/PP): Structure and Vocabularies 1.0 (2004), http://www.w3.org/TR/2004/REC-CCPP-struct-vocab-20040115 17. Garcia-Robles, R., Diaz del Rio, F., Civit, A., Prieto, J.A.: Promoting accessibility by using metadata in the framework of a semantic-web driven CMS. In: International Conference on Dublin Core and Metadata Applications, pp. 71–77 (2005) 18. Web-4-all Project, http://web4all.atrc.utoronto.ca/
Design and Implementation of an OpenGL Based 3D First Person Shooting Game Qiaomin Lin1,2, Zhen Zhao1, Dihua Xu2, and Ruchuan Wang2,3 1
School of Communications, Nanjing Univ. of Posts and Telecommunications, China 2 School of Computer, Nanjing Univ. of Posts and Telecommunications, China 3 State Key-lab for Novel Software Technology, Nanjing Univ., China {qmlin,D0050920,dxu,wangrc}@njupt.edu.cn
Abstract. First person perspective games are an important part of the many genres that make up the multi-billion dollar gaming industry. In this article, we present the design and implementation of a 3D first person shooting (FPS) game. Our main contribution is to suggest proper practices rooted in computer graphics and geometry mathematics that we believe should be used when designing 3D FPS games. These practices are level of detail (LOD) based terrain generation and texture mapping based simulation of sky, water and tree. Besides, particle system and billboard technique, character model and animation, mouse pick and sound are also illustrated. Keywords: First Person Shooting, OpenGL, Terrain Generation, Texture Mapping, Particle System, Billboard, Animation.
1 Introduction OpenGL, a widely used graphical programming kit, is composed of a series of built-in APIs which are the interfaces between the display driver and the application. The efficiency of 3D programming can be significantly improved by using OpenGL. With the performance enhancement of computer display device, such as graphics chips, memory bandwidth and monitors, game users become more and more demanding on the vision and display effect of PC games. So PC games with 3D interface and scene have been the mainstream of the industry. With the advent and popularity of QUAKE[1] and HALF-LIFE[2], the FPS games have stood on the frontline of the 3D game industry. OpenGL uses the right-hand coordinate system[3]. Graph transformation in OpenGL is based on matrix operation. Function glMatrixMode( ) is called to specify which type of transformation is expected to be carried out. There are two types of transformation mode: GL_PROJECTION and GL_MODELVIEW. Model view mode consists of translate transformation, scaling transformation and rotating transformation. Projection mode includes perspective projection and orthogonal projection[4], which are showed respectively in Fig.1 and Fig.2. Only inner the frustum defined by projection matrix can the object be rendered and sent to the monitor. Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 50–61, 2011. © Springer-Verlag Berlin Heidelberg 2011
Design and Implementation of an OpenGL Based 3D FPS Game
51
Fig. 1. Perspective projection
Fig. 2. Orthogonal projection
2 Game Structure A critical design step is the design of classes that can be operated by the game system. Fig. 3 shows the general structure of the classes in the game. 1) 2) 3)
4) 5) 6)
7)
8)
Class COpenglframework is the entry point of the whole game. Class CGameApp is to create game window and set up the message mechanism. Besides setting the environment variable of OpenGL, class COpengl is also for calling the initiation and rendering function of the objects in scene graph, and realizing the mouse pick function. CTerrain defines the algorithms of LOD and Quadtree to create a 3D terrain. Class CSkyBox is for creating the sky dome. Class CTree is to render the single tree by texture mapping. In class CMs3d, load the 3Ds model file into the scene. Class CSnow extends from CParticle to achieve the snow effect through particle system. Every snow particle is realized by billboard technique so that the amount of loaded data can be reduced a lot. Class CMd2Object consists of the functions of MD2 file-loading, connectivity calculation and shadow creation. Class CGamerole(viz. player character) and CMonster(viz. none-player character), which both extend from class CMd2Object, are responsible for upgrading the state of characters in time. Class CInfo is designed to store the states of characters. CExplosion extends from CBillboard and CFireball extends from CExplosion. They are responsible for the simulation of shooting showed as Fig. 4:
52
Q. Lin et al.
Fig. 3. Game structure
Fig. 4. The shooting between player character and none-player character
Design and Implementation of an OpenGL Based 3D FPS Game
53
3 Detailed Design and Implementation 3.1 Terrain Generation Acquire the Height. In order to acquire the height of 3D terrain, a data source is needed to bind for every vertex in the terrain map[5]. Select a 513x513 gray-scale image shown in Fig.5 as the data source, the gray-scale of every pixel ranges from 0 to 255. Every point’s value of gray-scale is matched up to the height of the corresponding vertex in the terrain.
Fig. 5. The data source of height
Texture Mapping of Terrain. Use texture mode of GL_COMBINE_ARB and GL_RGB_SCALE_ARB to simulate the 3D terrain's realistic surface.
Fig. 6. The texture mapping of terrain surface
Patch and LOD. CTerrain uses a two-dimensional array(patch[8][8]) to divide the whole terrain into an array of 8x8 patches, which covers a 65x65 vertex matrix. For example, patch[1][0] means the patch located in the second row and first column. Assign a LOD value to every patch. The assignment of LOD depends on the distance from camera to the center of current patch. From the visual perspective, LOD can reveal the abundance of terrain detail when rendering this patch. And from the algorithm angle, LOD decides the increment along both X axis and Z axis when the application establishes the rendering index. In other words, the number of vertex being rendered is closely related to the LOD value, as shown in Fig.7. The white points stand for those being added into the rendering index, and the more the vertices being rendered, the more detailed the 3D terrain will be.
54
Q. Lin et al.
Fig. 7. The 8x8 patch array & patches with different LOD value
Fig. 8. LOD value depends on the distance from eye position to corresponding patch
However, different LOD values between adjacent patches will lead to mapping vacancy as shown in Fig.9. For instance, apply the real height of vertex A in left patch for rendering while the same point A in right patch is not added to rendering index. So when rendering the right patch, the average height of vertex B and D is used instead of its real height. Due to the height difference in vertex A, there will exist a mapping vacancy in triangle ABD. How to tackle this problem? Generally, there are two approaches as follows: z
z
Approach 1: Get the world coordinates and UV mapping coordinates of vertex A, B and D. Vertex C is the middle point of line BD. Fix the mapping vacancy of triangle ABD using GL_TRIANGLE_FAN mode, so as to arrange the triangles. Approach 2: Omit the rendering of vertex A, so that the mapping vacancy can be avoided.
Design and Implementation of an OpenGL Based 3D FPS Game
55
Fig. 9. Mapping vacancy caused by different LOD values
The approach 1 can achieve a better terrain effect by sacrificing system performance while approach 2 can speed up rendering by sacrificing terrain accuracy. Quadtree Structure. Use quadtree structure to organize the 8x8 patch array when creating the terrain[6]. Quadtree is a tree –mode data structure. Every father node has four child nodes except for the leaf nodes. As shown in Fig.10, leaf node binds only one patch. The code for creation of quadtree is as follows: if(size>1) { isLeaf=false; child = new QUADNODE[4]; child[0].Create(x, z, size/2); child[1].Create(x+(size/2), z, size/2); child[2].Create(x, z+(size/2), size/2); child[3].Create(x+(size/2), z+(size/2), size/2); }else{ // patch is the leaf node isLeaf=true; patch=&Terrain.patch[z][x]; } How to determine whether the patch No.1 is visible or not? Just use the depth-first way to traversal the quadtree structure from the root A. By testing whether the node’s bounding sphere or box is within the frustum or not, the visibility of the node can be determined. When a patch is visible, the vertex in the patch will be rendered according to the rendering index. If the farther node C is invisible, then the child node M is also invisible, nor is there any need to test the leaf nodes (like patch No.1). Therefore, the quadtree structure can improve the efficiency of visibility testing.
56
Q. Lin et al.
Fig. 10. 8x8 patch array organized as quadtree structure
3.2 Simulation of Sky, Water and Tree Sky Dome. Use dome texture mapping[7] to simulate the sky, as shown in Fig.11.
Fig. 11. Dome texture mapping of sky
The following code will assign the world coordinates and UV texture coordinates to the dome. Set the world coordinates and UV texture coordinates at the dome vertex (phi,theta), (phi+dphi, theta), (phi, theta+dtheta) and (phi+dphi, theta+dtheta), according to the horizontal and vertical increment: dtheta and dphi. Take the vertex (phi,theta) for example: for (phi=0; phi <= 180 - dphi; phi += (int)dphi) for (theta=0; theta <= 360 - dtheta; theta += (int)dtheta) { //Calculate the vertex at phi, theta Vertices[n].x = radius * sinf(phi*DTOR) * cosf(DTOR*theta); Vertices[n].z = radius * sinf(phi*DTOR) * sinf(DTOR*theta); Vertices[n].y = radius * cosf(phi*DTOR); vx = Vertices[n].x; vy = Vertices[n].y;
Design and Implementation of an OpenGL Based 3D FPS Game
57
vz = Vertices[n].z; //Normalize the vector mag = (float)sqrt(SQR(vx)+SQR(vy)+SQR(vz)); vx /= mag; vy /= mag; vz /= mag; //calculate the spherical texture coordinates Vertices[n].u=hTile* (float)(atan2(vx, vz)/(PI*2)) + 0.5f; Vertices[n].v=vTile* (float)(asinf(vy) / PI) + 0.5f; n++; … } Water. As for the simulation of water, use dynamic texture mapping[8] to let water flow. Add a variable 0.5*sinf(delta) to both UV texture coordinates, so that the UV texture mapping changes in the scope of (-0.5 ,0.5). The code is as follows: glTexCoord2f(0.0f+0.5*sinf(delta),0.0f+0.5*sinf(delta)); glVertex3f(-100.0f , -1.0f , -100.0f); glTexCoord2f(0.0f+0.5*sinf(delta),1.0f+0.5*sinf(delta)); glVertex3f(-100.0f , -1.0f , 613.0f); glTexCoord2f(1.0f+0.5*sinf(delta),1.0f+0.5*sinf(delta)); glVertex3f(613.0f , -1.0f , 613.0f); glTexCoord2f(1.0f+0.5*sinf(delta),0.0f+0.5*sinf(delta)); glVertex3f(613.0f,-1.0f,-100.0f);
Fig. 12. UV coordinates of water
Tree. Use textures showed in Fig.13 to render a tree. The following code regulates that the texture mapping of tree branches use the alpha mode, which means the black pixels in the tree branch image will not be rendered. Through calling the function of glTranslatef(x,y,z) and glRotatef(rot,x,y,z) to set the height and angle of every branch, a tree with lush branches can then be created. glEnable(GL_BLEND); glEnable(GL_ALPHA_TEST); glAlphaFunc(GL_GREATER , 0.0f); glBlendFunc(GL_SRC_ALPHA,GL_ONE_MINUS_SRC_ALPHA);
58
Q. Lin et al.
Fig. 13. Tree branch texture and tree body texture
//set the height and angle for a branch glTranslatef(0,m_height*0.04f,0); glRotatef(170, 0.0f,1.0f,0.0f);
Fig. 14. Tree creation based on the texture mapping
Fig.14 shows the process of creation of a tree: render the tree body first, then the branches and the top. 3.3 Particle System and Billboard Particle System. Class CParticle defines a data structure to store the properties of a particle. struct PARTICLE { vector3d m_pos; vector3d m_velocity; float life; float fade; float m_size; vector3d m_gravity; float m_color[3]; }
// position of a particle // velocity of a particle // size of a particle // the increment of velocity
Code 'pList_particle = new PARTICLE[m_maxParticles]' is to create an array that can store data of particles. Variable 'm_maxParticles' is the allowed largest number of particles.
Design and Implementation of an OpenGL Based 3D FPS Game
59
During initiation, function named ResetPaticle(PARTICLE *particle) in class CSnow assigns the property values to all particles, including the position, size, velocity and gravity. Every single particle has its own life expectancy. When a snow particle falls down to a certain height, game application calls function named ResetPaticle(PARTICLE *particle) to reset its state. Billboard. The essence of billboard technique is to use a simple 2D image instead of complicated 3D model[9]. The adoption of billboard in simulation of fireball-shooting and snow-falling can greatly save memory resources and futher enhance the rendering speed. Using matrix-based model view transformation, the 2D image can always face towards the viewer. 3.4 Character Model and Animation The suffix of 3D model files of characters is 3ds. Class CMs3d sets 4 data structures to store the vertex, triangle, mesh and material respectively: MS3D_VERTEX, MS3D_TRIANGLE, MS3D_MESH and MS3D_MATERIAL. Both Play character and none-player character are designed as 3D model in the game. The actions of game character are realized by adopting MD2 technology[10], which is a key frame animation. MD2 builds up a series of key frames to simulate actions such as stand, walk, run, pain, and dead. Fig.15 shows some key frames of the player character.
STAND
RUN
PAIN
DEAD
Fig. 15. Key frames of MD2 animation
3.5 Mouse Pick and Sound Mouse Pick. Picking scene objects by mouse is required for the interaction between user and application. (x, y) is the position of mouse in current window coordinate system. Set the point (x, viewport[3] – y) as the pick center, and calculate an inner projection matrix as follows in a 5x5 pixel zone to control the mouse pick: gluPickMatrix( (GLdouble) (x), (GLdouble) (viewport[3] y), 5.0 , 5.0 , viewport); gluPerspective(45.0f,(GLfloat)(viewport[2]-viewport[0]) /(GLfloat) (viewport[3]-viewport[1]), 0.001f, 1000.0f);
60
Q. Lin et al.
That inner matrix is the same as the projection matrix of the whole scene. If a object is clipped by inner pick matrix, the application will write a hit record to buffer. If more than one object is clipped at the same time because of overlapping, the value of hits will be bigger than 1. In this case, the object which has the lowest Z-depth value will be picked out. The code is as follows: … for(int i = 0 ; i < hits ; i++) { if(selectBuff[i*4 + 1] < (GLuint)depth) { //The ID of the object being hit choose = selectBuff[i*4 + 3]; //The Z-depth of object depth = selectBuff[i*4 + 1]; } } Sound. In the game there are two cases to play sounds, which are implemented through Microsoft's System APIs for Multimedia [11]. One is when the fireball shot by character explodes. The other is when the character acts. Although the code(see below) to play sounds is very simple, they can greatly improve the reality of the shooting game. // when fireball arrives at the target if(vpos.dist_sqr(end_pos) <= 1.0f) PlaySound("data/explosion.wav", NULL, SND_ASYNC); … //play different sounds according to character's actions switch(object_state) { case WALK: PlaySound("data/walk.wav", NULL, SND_ASYNC); break; case RUN: … }
4 Conclusion This article presents in detail the design and implementation a 3D FPS game. LOD and quadtree-based terrain creation is the key module to build a 3D game. With skybox, particle system, texture mapping, tree and house model, the 3D game's scene is well organized. Besides, the billboard technique speeds up the rendering process and interaction between user and application is realized through mouse pick module. Finally, sounds are played when fireball explodes or character acts, so as to improve reality. However, now that game design is a complicated task composed of software engineering, computer graphics, geometry mathematics, artificial intelligence and game engine[12], there are bound to be new algorithms and techniques to be further explored.
Design and Implementation of an OpenGL Based 3D FPS Game
61
Acknowledgments. Supported by the National Natural Science Foundation of China (60573141, 60773041), the National High Technology Research and Development Program of China ("863" Program) (2007AA, 2007AA), the Qinlan Foundation of Nanjing University of Posts and Telecommunications (NY206030, NY206034).
References 1. Quake, http://www.idsoftware.com/ 2. Half-life, http://orange.half-life2.com/ 3. Mao, W., Tang, M.: Three-dimensional Game Design Book-OpenGL, pp. 60–65. Electronic audiovisual publishing house in Sichuan, Chendu (2005) 4. Wang, R.: Computer Graphics, pp. 180–210. Posts & Telecom Press, Beijing (2009) 5. Trent, P.: Focus On 3D Terrain Programming, pp. 179–181. Premier Press (2003) 6. Kamat, V., Martinez, J.: Large-scale dynamic terrain in three-dimensional construction process visualizations. Journal of Computing in Civil Engineering, 160–171 (2005) 7. Fletcher, D., Ian, P.: 3D Math Primer for Graphics and Game Development, pp. 122–131. WordWare Publishing, Plano (2002) 8. Hu, Y., Luiz, V.: Realistic, Real-Time Rendering of Ocean Waves, http://www.visgraf.impa.br/Data/RefBib/PSDF/ lvelho-cavw04/rtwave.pdf 9. Geng, W., Chen, W.: Computer game design and programming, 2nd edn., pp. 133–135. Publishing house of electronics industry, Beijing (2009) 10. Tang, B., Pan, Z., Zheng, L., Zhang, M.: Simulating Reactive Motions for Motion Capture Animation. In: Nishita, T., Peng, Q., Seidel, H.-P. (eds.) CGI 2006. LNCS, vol. 4035, pp. 530–537. Springer, Heidelberg (2006) 11. MSDN Library, http://msdn.microsoft.com 12. Trenholme, D., Smith, S.: Computer game engines for developing first-person virtual environments. In: Virtual Reality, pp. 333–342 (2008)
Direct Interaction between Operator and 3D Virtual Environment with a Large Scale Haptic Jie Huang, Jian Li, and Rui Xiao Beijing Institute of Technology, Beijing, 100081, P.R. China
[email protected]
Abstract. Previous studies have some research results on the interaction method between operator and 3D virtual environment through avatar. Direct interaction provides the method by which operator direct touch 3D virtual object. Therefore, it provides more natural and intuitive methods. A direct interaction system, include large scale 3D virtual environment and string-driven haptic, has been proposed. Key technologies of hand position measurement and string tension control are studied. Within this system, virtual squash enjoy system has been developed. Through some experiments, the feasible of this system has been proved. Keywords: Direct interaction, large scale, 3D virtual environment, haptic.
1 Introduction Virtual environment could be divided into two categories, small scale virtual environment and large scale virtual environment. Small scale virtual environment includes all situations in which user is sitting still in front of a monitor or wearing a binocular omni-orientation monitor, and interact with virtual objects in a small workspace [1]. Large scale virtual environment includes all situations in which operator interacts with virtual objects in a large workspace, and has real-time interaction with visual feedback, haptic force feedback, 3D sound, smell and even taste [2]. One of the major features of virtual environment is haptic, by which human manipulate and sense virtual objects. P. Richard’s study demonstrates that human performance in terms of speed and precision can be improved by the support of haptic [3]. W. Wu expresses that haptic feedback compensates the disadvantage of visual representation, such as the misjudgment of the size of virtual object [4]. Haptic interfaces can also be divided into two categories, desktop haptic and large scale haptic. The Forcedimension, the Haption, and the Sensable are the famous company to provide the desktop haptic interfaces device. Large scale string-driven haptic device provide force and touch sense in a large scale virtual environment. String connects actuators to operation apparatus. It has many good properties, such as large workspace, transparent, safeness, lightness, low cost. M. Sato proposed string-driven framework SPIDAR, which means space interface device for artificial reality [5-8]. A virtual catch ball entertainment system, in which operator plays virtual ball with a virtual man, has been proposed [9-10]. However, the virtual Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 62–70, 2011. © Springer-Verlag Berlin Heidelberg 2011
Direct Interaction between Operator and 3D Virtual Environment
63
environment is 2D images. Paul Richard studies a large scale virtual environment with string-driven haptic feedback in the product design. Work performance could be improved by their string-driven haptic device [11-14]. Those previous studies provide interaction method between operator and virtual environment through avatar. Direct interaction, in which operator direct touch 3D virtual environment, provide more natural and intuitive methods. Therefore, the direct interaction methodologies between operator and 3D virtual environment with a large scale haptic feedback are studied in this paper. The remainder of this paper is organized as follows. Section II proposes methods of large scale virtual environment, included framework, hand space position, and string tension calculation. Virtual squash enjoy systems has been described in the section III to prove its feasibility. The conclusion has been proposed in the section IV.
2 Large Scale Virtual Environment 2.1 Description The large scale virtual environment with string-driven haptic is shown in Fig. 1. Stereoscopic images are displayed on a 3m x 3m screen and can be viewed by polarized glasses. 3D birds installed at the polarized glasses supply its position, which helps simulation calculation to regulate the stereoscopic images. Some human factors, such as eye distance and height, would also change stereoscopic images. Therefore, those two data must be measured before playing. The string-driven haptic is called SPIDAR framework. To provide haptic on both hands, eight motors are placed on the corner of a cubic frame surrounding operator. String is wrapped around a pulley driven by a motor. Hands are connected four motors through string. The A, C, F, H motors are used for left hand. The B, D, E, G motors are used for right hand. Those eight strings help to measure hand space position and to provide force feedback. Since string has the transparent property, it is very easy to see the virtual objects, and to move around freely in a large workspace. 2.2 Hand Position Calculation Encode is placed on the motor to measure pulley rotate angle. Hand position can be calculated by those four rotate angles. Rotate angles corresponding to left hand can be obtained as the following formula.
(rAθ A ) 2 = ( x − a ) 2 + ( y + b) 2 + ( z + c) 2 (rCθ C ) 2 = ( x + a) 2 + ( y − b) 2 + ( z + c) 2 (rFθ F ) 2 = ( x − a) 2 + ( y − b) 2 + ( z − c) 2
(1)
(rH θ H ) 2 = ( x + a ) 2 + ( y + b) 2 + ( z − c ) 2 where a, b and c are the constant distance between pulleys, and x, y, z are hand space position for left hand, and θA, θC, θF, θH are rotate angle, and rA, rC, rF, rH are string radius wrapped around the pulley. The x ranges from -1.4m to 1.4m. The y ranges from -1.5m to 1.5m. The z ranges from -1.1m to 1.1m.
64
J. Huang, J. Li, and R. Xiao
screen F
G H E
C
D
B A
3D birds
hand position
string tension
stereoscopic
haptic controller and virtual
image
environment simulation
eye distance height Fig. 1. Large scale virtual environment with string-driven haptic
The most difficulty problem is time variable property for string radius wrapped around the pulley. String radius may be increased or decreased with pulley rotate motion. The time variable property could not be measured by any kind of sensor. However, the value in different working range can be calibrated by experiment data. We use a ruler to measure string length at certain points. For example, ruler measure the string length θkrk and θk+!rk+1 at the k point and k+1 point. The points between the k point and the k+1 point can be calculated. If the time variable string radius r can be calculated, the string length θr can be obtained. The following formula describes method to calculate the time variable string radius.
r=
θ k rk θ − θk (θ k +1rk +1 − θ k rk ) + (θ k +1 − θ k )θ θ
(2)
Direct Interaction between Operator and 3D Virtual Environment
65
where θ is the measured value by encode between the k point and the k+1 point. From equation (2), we obtain time variable string radius. Therefore, string length θArA, θCrC, θFrF and θHrH can be calculated. From equation (1), error of string length θArA, θCrC, θFrF and θHrH will deduce error of hand space position. Therefore, hand space position can be calculated from equation (1). We construct a function, which describes as following formula. F ( x, y, z ) = [( x − a ) 2 + ( y + b) 2 + ( z + c) 2 − ( rAθ A ) 2 ]2 + [( x + a) 2 + ( y − b) 2 + ( z + c) 2 − ( rCθ C ) 2 ]2 + [( x − a) 2 + ( y − b) 2 + ( z − c) 2 − (rFθ F ) 2 ]2
(3)
+ [( x + a) 2 + ( y + b) 2 + ( z − c) 2 − (rHθ H ) 2 ]2 When we obtain the string length θArA, θCrC, θFrF and θHrH, the hand position (x,y,z) are calculated to make the function F(x,y,z) to be the least value. The least value function satisfies 4
4
i =1
i =1
F ( s + h) = ∑ g 2 ( s + h) ≈ ∑ [ g ( s) + J ( s) h]2
(4)
where s = [ x, y , z ] T
,
and h = [ Δx, Δy , Δz ] T
,
and g ( s) = ( x + ai ) 2 + ( y + bi ) 2 + ( z + ci ) 2 − ( riθ i ) 2
,
(5)
and
J ( s) =
∂g = [2( x + ai ),2( y + bi ),2( z + ci )]T ∂s
(6)
From equation (4), we obtain 4 ∂F ( s + h ) = ∑ [ 2 J T ( s ) g ( s ) + 2 J T ( s ) J ( s ) h] ∂h i =1
(7)
4 ∂ 2 F ( s + h) = 2∑ [ J T ( s ) J ( s )] 2 ∂h i =1
(8)
Then, if J(s) has full rank, ∂2F(s+h)/∂h2 is a positive define. Therefore, F(s+h) has a unique minimizer h. The unique minimizer h satisfies h = [ J T ( s ) J ( s )] ⋅ [− J T ( s ) g ( s )]
(9)
66
J. Huang, J. Li, and R. Xiao
If hand position at the k point can be defined as s(k), the g(s(k)) and the J(s(k)) can be obtained from formula (5) and (6). The minimizer h can be calculated from formula (9). Then, calculated hand position at the next point will be s (k + 1) = s (k ) + h
(10)
University of Angers–ISTIA has been proposed a sample method to calculate hand position from formula (1). Since three unknown parameter will be calculated from four equations, it has a large error of hand position [11]. Compared with University of Angers–ISTIA method, the method in this paper has the high accuracy of hand position. 2.3 Tension Calculation
Since string can’t provide push force, and can only provide pull force, resultant force of string tension provides force display. Four strings are attached to pull one hand, therefore operator feels push force. Tension satisfies 4
f = ∑α j f j
(11)
j =1
where f is the resultant force, and αj is unit vector for string j, and fj represents the tension for string j. From formula (11), we can obtain many results to calculate fj. Those results will make confused because we don’t know which result will be used to act on the motors. If we choose any one of result, the system may be unstable. Therefore, it is difficult to calculate fj from formula (11). Three strings provide the resultant force f. Method to calculate the tension for four strings is to judge which three strings make resultant force. The following method will teach us how to judge those useful three strings. Sine any two strings with a crossing position will become a plane, six planes will exist in the real operating environment. Twelve vertical vectors Bn will exist in those six planes, where n [1,12]. We know only three are positive in the twelve f · Bn. Those three vectors are corresponding to the three strings to provide the resultant force, as shown in Fig. 2. Since f ·BAOC, f ·BCOH, and f ·BAOH are both positive, the resultant force f results from the A string, C string, and H string. Since strings must become straight state, a slight force u will act on strings to make it straight. Resultant force f is different from ideal force f ’. Resultant force and ideal force satisfies
∈
4
f = f ' − ∑α ju j =1
(12)
where f ’is ideal force. Form formula (11) and (12), we obtain 4
f ' = ∑ α j (u + f j ) j =1
(13)
Direct Interaction between Operator and 3D Virtual Environment
G
67
F
H
E
C B f D
A
Fig. 2. Tension control using three strings
3 Virtual Squash Enjoy Systems 3.1 Virtual Environment
Six virtual walls are designed to surround SPIDAR framework by EON professional suite. The EON professional suite is simulation software provided by EON Reality Corporation. A virtual ball is also designed by EON professional suite to stay in SPIDAR framework. Combined real space with virtual space becomes an operating space. Since operator stay in the SPIDAR framework, the space in the SPIDAR framework is reachable. Strings provide hand position measure and haptic feedback, as shown in Fig.3. Operator has ability of haptic interaction with virtual ball and virtual walls. When real hand touches virtual ball or virtual walls, EON professional suite will provide the physical engine module to simulate collision process. After EON professional suite calculate ideal force f ’, string tension fj can be calculate from formula (13). 3.2 Haptic
A PCI interface card existed in EON workstation provide haptic control function. It calculates hand position and then transfer to EON professional suite through PCI bus. EON professional suite simulates collision process to calculate string tension fj and then transfer to PCI card. The PCI card drives eight motors to provide string tension. Motors are RE30 DC motor from MAXON Inc. Optical encoders and pulleys are connected to motor shaft, as shown in Fig. 4. 3.3 Operation
When operator stands in the SPIDAR framework, he can see six 3D virtual walls and a moving 3D virtual ball by polarized glasses. When operator touch virtual ball, he feel force feedback. Virtual ball will collide with virtual wall or real hand. Therefore, operator can play virtual ball in this virtual squash enjoy system. This virtual squash enjoy system helps to practice large scale haptic feedback. It provides application background in the sport and education fields.
68
J. Huang, J. Li, and R. Xiao
Fig. 3. Large scale virtual environment
Fig. 4. Motor, optical encoder and pulley
3.4 Compared with Direct Interaction and Indirect Interaction
In order to access user performance, user’s feeling about interactivity, reality and enjoyment are recorded to compare direct interaction with indirect interaction, as shown in Fig. 5. In direct interaction, user directly touches the virtual ball using the method proposed in this paper. In indirect interaction, user touches the virtual ball through avatar, which is another virtual ball with different color. Ten undergraduate college students choose to play virtual squash enjoy systems. They are ordered to write down their feeling about interactivity, reality and enjoyment after finishing the task. Five students feel very good and five students feel good for interactivity condition in direct interaction. Meanwhile, three students feel bad and seven students feel middle for interactivity condition in indirect interaction. One student feels very good and nine students feel good for reality condition in direct interaction. Meanwhile, five students feel bad and five students feel middle for reality condition in indirect interaction. Eight students feel very good and two students feel good for enjoyment condition in direct interaction. Meanwhile, eight students feel bad and two students feel middle for enjoyment condition in indirect interaction. From those data, we draw a conclusion that direct interaction is better than indirection interaction.
Direct Interaction between Operator and 3D Virtual Environment
69
Fig. 5. User performance data
4 Conclusion The problems of design for a direct interaction system with a large scale haptic have been addressed. The sufficient conditions for the design of such system have been established. The effectiveness of this methodology is verified by virtual squash enjoy system.
References 1. Inglese, F.-X., Lucidarme, P., Richard, P.: A Human-Scale Virtual Environment with Haptic Feedback. In: Proc. ICINCO 2005, vol. 4, pp. 14–17 (2005) 2. Papin, J.-P., Bouallagui, M., Ouali, A.: DIODE: Smell-diffusion in Real and Virtual Environments. In: Proc. VRIC 2003, pp. 113–117 (2003) 3. Richard, P., Burdea, G., Coiffet, P.: Human Perceptual Issues in Virtual Environment: Sensory Substitution and Information Redundancy. In: Proc. ROMAN 1995, pp. 301–306 (1995) 4. Wu, W., Basdogan, C., Srinivasan, M.: The Effect of Perspective on Visual-haptic Perception of Object Size and Compliance in Virtual Environments. In: Proc. of the ASME Dynamic Systems and Control Division 1999, vol. 67, pp. 19–26 (1999) 5. Hirata, Y., Sato, M.: 3-Dimensional Interface Device for Virtual Work Space. In: Proc. of the 1992 IEEE/RSJ International Conference on IROS, pp.889-896 (1992) 6. Kim, S., Hasegawa, S., Koike, Y.: Tension based 7-DOF force feedback device: SPIDARG. In: Virtual Reality 2002 Conference, pp. 283–284 (2002) 7. Kim, S., Ishii, M., Koike, Y.: Haptic Interface with 7 DOF Using 8 Strings: SPIDAR-G. In: The 10th International Conference on Artificial Reality and Tele-existence, pp. 224– 230 (2000) 8. Luo, Y., Jun, M., Katsuhito, A.: Development of New Force Feedback Interface for Twohanded 6 DOF Manipulation —SPIDAR-G&G system. In: Proc. ICAT, pp. 166–172 (2003) 9. Jeong, S., Hashimoto, N., Sato, M.: A Novel Interaction System with Force Feedback between Real-and Virtual Human. In: ACM SIGCHI International Conference on Advances in Computer Entertainment Technology, 2004, pp. 61–66 (2004) 10. Hasegawa, S., Toshiaki, I., Hashimoto, N.: Human Scale Haptic Interaction with a Reactive Virtual Human in a Realtime Physics Simulator. In: ACM SIGCHI International Conference on Advances in Computer Entertainment Technology, 2005, pp. 149–155 (2005)
70
J. Huang, J. Li, and R. Xiao
11. Richard, P., Chamaret, D., Inglese, F.-X.: Human-scale Virtual Environment for Product Design: Effect of Sensory Substitution. The International Journal of Virtual Reality 5(2), 37–44 (2006) 12. Richard, E., Tijou, A., Richard, P.: Multi-modal Virtual Environments for Education with Haptic and Olfactory Feedback. Virtual Reality 10(3), 207–225 (2006) 13. Richard, E., Tijou, A., Richard, P.: Multi-modal virtual environments for education: From illusion to immersion. In: Pan, Z., Aylett, R.S., Diener, H., Jin, X., Göbel, S., Li, L. (eds.) Edutainment 2006. LNCS, vol. 3942, pp. 1274–1279. Springer, Heidelberg (2006) 14. Tijou, A., Richard, E., Richard, P.: Using olfactive virtual environments for learning organic molecules. In: Pan, Z., Aylett, R.S., Diener, H., Jin, X., Göbel, S., Li, L. (eds.) Edutainment 2006. LNCS, vol. 3942, pp. 1223–1233. Springer, Heidelberg (2006)
Modeling and Optimizing of Joint Inventory in Supply Chain Management Min Lu and Haibo Zhao Lishui College, Lishui, 323000, Zhejiang Province, China
[email protected]
Abstract. Supply Chain Management is a key factor of enhancing competitiveness and achieving success in an enterprise. And inventory control is an important content in this management. In this study, we studied on the new inventory management methodJoint Inventory Management. A model of Joint Inventory for manufacturing and marketing is established. Then Genetic Algorithms is employed to optimize the model. Finally the new method is used in an enterprise. Keywords: Supply Chain Management, Joint Inventory Management, Genetic Algorithms.
1
Introduction
Supply Chain Management(SCM) is the systemic, strategic coordination of the traditional business functions and the tactics across these business functions within a particular company and across businesses within the supply chain, for the purposes of improving the long-term performance of the individual companies and the supply chain as a whole [1]. In another definition, SCM is the integration of key business processes across the supply chain for the purpose of creating value for customers and stakeholders [2] [3]. A supply chain, as opposed to SCM, is a set of organizations directly linked by one or more of the upstream and downstream flows of products, services, finances, and information from a source to a customer. Managing a supply chain is ’supply chain management’ [4] [5]. From above, we can see that SCM is considered as a key factor of enhancing competitiveness and achieving success in an enterprise. And inventory control is an important content in this management. Recently, a new supply chain inventory management method - Joint Inventory Management is appeared. This inventory management strategy breaks up the traditional fragmented inventory management mode. In the supply chain system, a demand amplification problem is caused by independent inventory operation mode among each node enterprise [6]. Fig.1 shows the management mode of joint inventory in the SCM environment. It emphasizes that upstream and downstream business can participate at the same time and can make or manage inventory project together. Some data show that the corporation of each supply chain enterprise can make 25% reduction of inventory level [7]. Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 71–79, 2011. c Springer-Verlag Berlin Heidelberg 2011
72
M. Lu and H. Zhao
Fig. 1. Management mode of joint inventorys
Currently, though Joint Inventory Management mode and method are already proposed, the mathematical modeling and optimization problems of joint inventory are rarely involved. The similarity with joint inventory problem is a central warehouse with multiple sub-inventories distributed inventory system. Bylkas et al employ dynamic programming theory to build the distributed sales network inventory model, and this inventory model includes a number of distributors, constant demand, production scheduling and other circumstances. However, the model has its own limitation; it does not take into account the cost of all inventories and only consider inventory costs, order costs without the consideration of shortage costs [8]. Lu Lu et al establish an integrated inventory model which includes a manufacturer and multi-vendors, and use heuristic approach to optimize the model. However, the result of the optimal inventory model is not particularly satisfied [9]. Overall speaking, there are two main optimal methods for joint inventory model: one based on the enumeration of the dynamic programming method; the other based on the general heuristic search method. The former is easy, but the calculation is relatively large, while the latter tends to local optimization. Based on the above considerations, this paper presents a comprehensive cost model that includes shortage cost to joint inventory using genetic algorithm optimization model to minimize inventory costs, and determines the parameters of the relevant inventory. Genetic algorithm is an orderly global random search algorithm; it avoids the general search algorithm that is the possibility of trapped in local optimum, according to the simulation of the biological evolution, eventually gets the global optimal or near global optimal solution. Firstly, we need model the joint inventory management in production and marketing of Joint Inventory Management mode. Secondly, this modeling and optimization methods that can also be extended to the joint inventory of raw materials [10]. In this condition, we consider joint inventory system for production and marketing, which is composed of N sub-distributors and N sub-inventory. And it is aimed at a single product. Fig.2 shows the description of production and marketing oriented joint inventory system. The production and marketing joint inventory operates based on the inventory of the various sub-inventories. When the overall inventory dropped to the total order point, the sub-inventory inventory make the joint order to the core enterprise by the Joint Coordination Centre [11], when some sub-warehouse inventory is lower than the order point, and the overall inventory is not down to the total order point, all warehouses in each allocation transfer under unified
Modeling and Optimizing of Joint Inventory in Supply Chain Management
73
Fig. 2. Joint inventory for production and marking
control by the Joint Coordination Centre. As it is a unified inventory management, the allocation procedure is very simple. It does consider the cost, such transfer can overcome the past expensive emergency delivery and shortage costs, and the allocation of additional time is shorter than the usual response time. Therefore, the distributors face the possibility of even less out of stock in the joint inventory management model. In this case, it is assumed: 1. Production and marketing joint inventory adopts (S, Q) ordering policy, continually inspect and jointly order; 2. In the t time period, the kth distributors demand is Dk , subject to the Poisson distribution ( k = 1, , N ) with intensity of λk ; 3. The order-in-advance time period is L0 for core enterprise to joint inventory, and the sub-inventory for order-in-advance time period is also L0 ; 4. Maximum supply capacity of the core enterprises in the t period is GQ; 5. User satisfaction rate is p. Optimization objectives: in the situation of limited capacities and resources, set the user satisfaction rate; make sure sub-inventories and inventory control strategy (safety inventory, order point, order amount) of joint inventory, the purpose is to make the minimum total cost of inventory.
2 2.1
Mathematical Model Construction of Model
Before we describe our model detail, we first give some symbol definition: Sk – order point of k sub-inventory in the time t, where k = 1, 2, ..., N (following the same); QK – order quantity of k sub-inventory in the period t; S – total order point of joint inventory in the time t, is the sum of all warehouses’ order points; Q– total order quantity of joint inventory in the time t, which is the sum of all warehouses’ order quantity; Ik – initial inventory of K sub-inventory in the period t; SSk – safe stock quantity of k sub-inventory in period t; πk – unit shortage loss of k sub-inventory in time t; h – maintenance costs per unit of product of sub-inventory, maintenance costs of all sub-inventories with Joint Inventory Management is unified;
74
M. Lu and H. Zhao
A – order costs of production and marketing joint inventory to the core business; ESLk – expected service level of K distributor; T SLk – target service level of K distributor. 2.2
Cost Model of Joint Inventory
Under the joint inventory management, inventory cost consists of three parts: storage cost of maintenance inventory, and shortage loss cost caused by out of stock, as well as the order cost of joint inventory to core enterprise [12]. When all distributors’ demands obey the Poisson distribution with the intensity of λk , the shortage costs Cs will be: N N N Sk − Ik ] πk G(Ik + Qk ) Cs = δ[ k=1
k=1
(1)
k=1
The G(Ik + Qk is the out of stock quantity of k sub-inventory in period t, it can be expressed as a probability function of needs: ∞
G(Ik + Qk ) =
(x − Ik − Qk )P (x)
(2)
x=Ik +Qk
And in the Formula (1): N N 1 k=1 Sk − k=1 Ik ≥ 0; N Sk − Ik ] = δ[ 0 N k=1 Sk − k=1 Ik < 0. k=1 k=1 N
N
(3)
This function shows that the shortage cost only occurs when the sum of all sub-inventories current inventory is less than or equal the order point of joint inventory. Otherwise the system can solve the out of stock problem by transferring the goods among all warehouses in each allocation. The inventory maintenance cost of the joint inventory Ch will be: Ch =
N
hT (Ik + Qk )
(4)
k=1
The T (Ik + Qk ) is inventory maintenance quantity of sub-inventory in the advanced order period of joint inventory management, it can be expressed as a probability function of needs: T (Ik + Qk ) =
Ik +Qk
(x − Ik − Qk )P (x)
(5)
x=0
Inventory maintenance cost is always existed, and under the unified management of the joint inventory, unit maintenance cost of each sub-inventory is the same as h. The order cost of joint inventory CA will be:
Modeling and Optimizing of Joint Inventory in Supply Chain Management
N CA =
k=1
75
∞
N
x=0 xP (x)
k=1
Qk
A
(6)
Order cost only In occurs ∞when joint inventory makes order to core enterprises. N Formula (6), N means the total demand of distributors. Q is k k=1 x=0 k=1 the total order quantity of joint inventory. Therefore, the total inventory cost of the joint inventory is T C that consists of three parts described before. According to the goal of joint inventory controlling, we set the optimization model as following:
M in : T C = Cs + Ch + CA N ∞ N N N xP (x) = δ[ Sk − Ik ] πk G(Ik + Qk ) + k=1Nx=0 A k=1 Qk k=1 k=1 k=1 +
N
hT (Ik + Qk )
(7)
k=1
ESLk ≥ T SLk Q kN> 0 k=1 Qk ≤ GQ In the limited condition, ESLk is expected service level of client, T SLk is target service level, and ESLk will be: ESLk = 1 −
∞
P (x)
(8)
x=Sk
3 3.1
Algorithm Descriptions Determine the Order Point and Safety Stock
First, we need determine each sub-inventory and safety stock according to the user satisfaction rate and the lead time. Considering the actual situation we determine the every warehouse order point Sk and safety stock SSk .The order point Sk can be calculated by Formula (8) with the Math software called Maple, while the order point Sk and the safety stock SSk also satisfy a functional relation as follows: Sk = SSk + L0 uk
(9)
Where uk is the mean of the requirement of sub-inventories k in the lead time of joint inventory ordering, and the safety stock of every sub-inventory can be determined by this formula.
76
3.2
M. Lu and H. Zhao
Optimize the Model Using Genetic Algorithm
We can solve the problem of model optimization by using Genetic Algorithm since it is a nonlinear problem [13]. The main process of Genetic Algorithm can be described as follows: 1. Preparation (a) Use real number encoding. Compared with the binary encoding, real number encoding has higher precision and efficiency Nand wider search range. pop1..popsize = (Q1 , Q2 , . . . , QN ), total Q = k=1 Qk . According to various constraints, we can determine the feasible zone Q ∈ (0, GQt). The determination of the feasible zone speeds up the convergence rate of the constraint problem. (b) Select the fitness function f , here f = −T C. (c) Initializing parameters, which include the population size popsize, the cross rate pc , the mutation rate pm and the catastrophic rate pml . (d) Termination condition, which is the maximum evolutionary generation maxgen. 2. Generate initial population Within the feasible zone, we can generate an individual randomly. Then we need test the satisfaction rate, if it does not meet the requirements, we will re-generate an individual until there are popsize individuals that meet the satisfaction rate and form an initial population (pop1 , pop2 , . . . , poppopsize ). 3. Selective replication First we get the fitness of each individual according to the fitness formula, then sort them according to their size (pop1 , pop2 , . . . , poppopsize ). After select popsize 5 times using the roulette method, we can replicate individuals into the matching pool for crossover operation. 4. Using arithmetic crossover If the parents are feasible, then their children are feasible too. According to this feature, the crossover may not find solutions close to the edge, and this should rely on the variation. 5. Determine variation individuals First we need generate a variation direction d = (d1 , d2 , . . . , dN ) randomly, where di is the permissible variation of Qi .childpop = parentpop+d. By using stochastic simulation, we can test if the satisfaction rate of the individual meet the requirement, if not, re-generate an individual until it is qualified. 6. Repeat the above step (3)-(5), until satisfy the termination condition and get the optimum solution pop0 .
4
Case Studies
Ningbo Jinfeng Machinery Co., Ltd is a professional enterprise which produces mechanical punching machines. The company uses a typical SCM, where the enterprise is the core part. In the sales chain between the core enterprise and the downstream distributors, the inventory problem of supply chain is solved by
Modeling and Optimizing of Joint Inventory in Supply Chain Management
77
establishing the production and marketing joint inventory rather than setting up the inventory respectively. For their production and marketing joint inventory, the above methods are employed for modeling and optimization. Currently, there are four downstream distributors of Jinfeng Machinery Co., Ltd: major distribution centers in Shanghai, Guangdong, Shenyang and Chongqing. Thus, this joint product sales inventory model is composed of four distributors and four sub-inventories. Table 1 lists the parameters of the production and marketing joint inventory, which are obtained through the practical research and the visit to the enterprise. Based on these parameters, order point Sk and safety stock SSk can be calculated according to Formula (8) and (9), and the concrete results are listed in Table 2. As the sum of the current inventory of each sub-inventory is less than the total order points, the joint ordering is requested. We can take the population size popsize = 30 and the crossover rate Pc = 0.8; then we select two kinds of mutation rate to evolve 50 populations: 1. Pm = 0.1 ; 2. Pm = 0.5. In this experiment, the Genetic Algorithm Toolbox ”Gaot” is used and the inventory costing model has been optimized with Matlab. The optimized results are shown in Table 2, while Fig.3 shows the tracking trajectory optimized by Genetic Algorithm. From the optimized results we can see that: the total cost of the optimized production and marketing joint inventory of Jinfeng enterprise is about 1.5 million Table 1. Inventory parameters λk Lk πk h A (pcs) (pcs) (yuan/pcs) (yuan/pcs) (yuan/time)
Parameter Subinventories1 Subinventories2 Subinventories3 Subinventories4 Period: 30days GO : 250 items
20 25 15 18
9 11 5 7
32000 30000 33000 31000
48000 48000 48000 48000
2500
Lead time : 10days P : 95%
Table 2. Optimized results
Parameter
Order point Sk (pcs)
Safety stock SSk (pcs)
Order Order Qk (pcs) Qk (pcs) Pm = 0.1 Pm = 0.5
Subinventories1 29 23 43 Subinventories2 34 26 49 Subinventories3 23 18 33 Subinventories4 26 20 38 Total cost of the joint inventory(myriad yuan): 154.02 Simulated user satisfaction rate(%): 96
42 49 32 37 153.05 96
78
M. Lu and H. Zhao
Fig. 3. Optimized trajectory
RMB, less than the previous statistics which is 1.7 million RMB and the expected user satisfaction rate reaches 96%, which did not affect the customer service in exchange for lower inventory costs.
5
Conclusion
In the circumstances of SCM, we established a mathematical model for the production and marketing joint inventory. The model fully takes into account the cost of the inventory system and the user satisfaction rate at the same time, which is consistent with the overall interests of the enterprise. Taking advantage of the self adapting and learning performance of the Genetic Algorithm, the algorithm can obtain relatively satisfied optimized results. In practice, this method has obtained a good validation. Of cause, there new model need still be tested in other manufactures, and new concept or scheme of SCM [14] study can be integrated into the framework.
References [1] Mentzer, J.T., et al.: Defining Supply Chain Management. Journal of Business Logistics 22(2), 1–25 (2001) [2] Ketchen Jr., G., Hult, T.M.: Bridging organization theory and supply chain management: The case of best value supply chains. Journal of Operations Management 25(2), 573–580 (2006) [3] Kouvelis, P., Chambers, C., Wang, H.: Supply Chain Management Research and Production and Operations Management: Review, Trends, and Opportunities. Production and Operations Management 15(3), 449–469 (2006)
Modeling and Optimizing of Joint Inventory in Supply Chain Management
79
[4] Larson, P.D., Halldorsson, A.: Logistics versus supply chain management: an international survey. International Journal of Logistics: Research & Application 7(1), 17–31 (2004) [5] Lavassani, M.K., Movahedi, B., Kumar, V.: Historical Developments in Theories of Supply Chain Management: The Case of B2B E-marketplaces. Administrative Science Association of Canada (ASAC), Halifax, Canada (2008) [6] Ma, S., Lin, Y., Chen, Z.: Supply Chain Management. Machinery Industry Press, Beijing (2000) [7] Lee, H.L., Billington, C.: The evolution of supply chain management models and practices at Hewlett Packard Interfaces, vol. 25(5), pp. 42–63 (1998) [8] Bylkas, A.: Dynamic model for the single vendor, multi buyer problem. International Journal of Production Economics 59, 297–304 (1999) [9] Lu, L.: A one-vendor multi-buyer integrated inventory model. European Journal of Operational Research 81 (1995) [10] Bowersox, D.J., Closs, D.J.: Logistics management: the integrated supply chain process. McGraw-Hill Companies, New York (1998) [11] Ding, H., Yu, M.: Modern Production Operation Management (1999) [12] Ballou, R.H.: Business Logistics Management, 3rd edn. (1992) [13] Stenross, F.M., Sweet, G.J.: Implementing an Integrated Supply Chain. In: Annual Conference Proceedings (1991) [14] Simchi-Levi, D., Kaminsky, P., Simchi-levi, E.: Designing and Managing the Supply Chain, 3rd edn. McGraw-Hill, New York (2007)
Vision-Based Robotic Graphic Programming System* Jianfei Mao, Ronghua Liang, Keji Mao, and Qing Tian Institute of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China {mjf,rhliang,maokeji,qingtian}@zjut.edu.cn
Abstract. Visual sensor is widely used in the robotics industry for its cheapness and acquiring information-rich. In this paper we study on vision-based robot programming system and represent the system architecture including the whole framework, the hardware and software components, the visual feedback control loop structure. The relative techniques include robot vision calibration, the building of 3D graphics simulation platform and the workstation-based open robot controller. At last we demonstrate a game of robot lighting candles based on the simulation platform and a game of robot locating in real experiment, separately. Keywords: Vision Calibration, Visual Feedback, Graphics Simulation Platform, Open Robot Controller.
1 Introduction Robotics technology is a strategically high-tech and an advanced science for all the countries. As a result, America takes robotics as an alarmed technique and aims at its applications on the martial job. European aims at its application on service and medical industry. Japanese aims at its application on apery and entertainments. Our country also pays much attention to robotics, and has ranks robotics as the mainly supportable orientation all the time since the implementation of National 863 Program. Visual sense is an important Non-contact sensing technology, and is more and more applied to robotics for its cheapness and acquiring Information-rich. The mainly contents of robot vision include image processing [14], pattern recognition [15], 3D modeling, object position and motion analysis [11]. Visual feedback is important for robot to implement tasks of grasping, assembling and avoiding obstacle, and is widely used in robot jointing, assembling, portage, satellite docking and recovery in space technology, underwater operating in marine discovery [1] and so on. In the paper we take PUMA robot as the research object, and build a vision-based robot programming system. The relative key techniques include robot vision calibration, the building of 3D graphics simulation platform and the workstation-based open robot controller. *
Supported by the National Natural Science Foundation of China under Grant Nos. 60703002; the Natural Science Foundation of Zhejiang Province of China under Grant No. Y1090335; key projects from Science and Technology Department of Zhejiang Province under Grant No. 2007C11022; social development projects from Science and Technology Department of Zhejiang Province under Grant No. 2009C33043.
Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 80–89, 2011. © Springer-Verlag Berlin Heidelberg 2011
Vision-Based Robotic Graphic Programming System
81
2 System Architecture 2.1 Hardware Component The system hardware consists of PUMA arm, graphics workstation, binocular cameras, workbench. Other ancillary equipments, including PUMA controller, RS232 switch to RS485 module, Pan & Tilt (PT) decoder, etc. The composition of the hardware is illustrated as the following figure. The whole system constitutes a robot visual feedback control loop, of which graphics workstation is linked with the robot controller by RS232 communications, PT and cameras are linked with the graphics workstation by RS485 communication, and the camera video cables are directly joined up the graphics workstation. So the graphics workstation seems as the brain of the whole system, and it can perceive vision through camera, perceive its current location and status by the robot controller, convey a variety of instructions to control the PT and cameras through RS485 cable and control the machinery arm through RS232 cable. These constitute the robot visual feedback system.
Graphics workstation
RS485 Cable Video Cable light source PUMA
RS232 Cable Cameras Parallel Cable Workpiece
Robot Controller
Pan & Tilt (PT) Workbench
Fig. 1. Hardware Components of The Vision-Based System
2.2 Visual Feedback Control Flow A common task such as completing a feedback control for a robot may involve a wide range of techniques. For a vision-based system it must firstly perceive the surrounding
82
J. Mao et al.
Determine object shape and location in camera frame
Transformation
Determine object shape and location in robot frame Inverse kinematic problem solving
Robot kinematic part
Image capture and processing
Robot vision part
Feather exaction and stereo matching
Estimate robot pose
Robot Trajectory Planning Robot tracks to the predetermined pose
Fig. 2. Visual Feedback Control Flow
environment, then take action by the sequence of the feedback control. See the following diagram for the visual feedback flow. So the first is image acquisition and image pre-processing. As applications and requirements are variable, the methods of the image pre-processing are variable. But its purposes are nothing more than to highlight the features and reduce noise simultaneously. The possible steps include image filtering noise, gray-enhanced gray-scale equalization, etc. The next is feature extraction and stereo matching, where feature extraction is directly to serve as stereo matching. Stereo matching will reconstruct the object shape and the spatial location. As variable requirements, Feature extraction methods for the aim of stereo matching can be classified three types, respectively, which is the matching based on the point feature, the line feature or the surface, namely texture feature. The former two have high matching precision and fast speed. However, the acquired information is only limited to the matching points or the matching lines, and these methods are suitable for the 3D reconstruction of the regularly geometric object, such as mechanical parts or man-made landscapes such as the lateral of the building. On the contrary, to the stereo matching based on surface or texture feature, the rebuilding information is spread all over the sighted surface, so the acquired information is rich. However, its speed is slow, if do not take into account the speed, this method can rebuild almost any object. In the study of stereo matching [12,13], Binocular stereo matching is widely studied for that binocular constitution comparatively meets the visual characteristics of common animals. For our robot vision system, we fixed the binocular cameras on the Pan & tilt (PT) to survey the
Vision-Based Robotic Graphic Programming System
83
whole workbench, which are the robot outside binocular eyes. Later we’ll fix another binocular camera on the robot’s elbow to form the robot itself binocular eyes. When in the vision-based control, firstly use the PT’s cameras to get the rough position of the object and guide the robot arm to reach the rough place, then use the arm’s cameras to aim at the object and determine the exact position of the object coupled with feedback control mechanism. Since this system is mainly devised for assembly of mechanical parts, which requires high speed, now we pay more attention to the stereo matching based on point and line feature matching, and in the future we will more concern the stereo matching based on surface or texture feature. Since in 3D rebuilding, the coordinate frame is the camera coordinate frame, we must transform this frame to robot itself frame through a pre-calibrated rigid transformation camera frame to robot base frame, thus, robot can know the object shape and spacious location, then it can calculate the final angles its every joint should rotate by the inverse kinematics problem solving, then can decide how to track the object simultaneously according to its current position, environmental requirements and various predetermined optimization objectives. Sometimes, in the process of tracking, it may constantly analyses current video and adjusts strategy to track the object. 2.3 Feedback Control Loop The following diagram shows the workstation-based robotic visual feedback control system, which is an open robot control system by extending a general workstation. The control loop illuminates the signal flow throughout the system. In the diagram, graphics workstation is the core of the entire feedback control system, which can not only accept the operator's commands, but also automatically modulate the control planning according to the image information from the visual sensor and every joint’s status from robot universal controller, which makes full use of high-speed processing capability of the graphics workstation. Using graphics workstation to extend and upgrade the universal controller makes it possible for robot to do some complex feedback control planning and 3D off-line graphics simulation.
Captured By Visual Sensor
Operator
in A
Graphics in B Workstation Procedure
Universal Controller Procedure
Fig. 3. Feedback Control Loop
Robot Arm
Work Environment
84
J. Mao et al.
3 Key Techniques 3.1 The Frame of Workstation-Based Open Robot Controller [2, 3] PUMA universal controller can be divided into two parts, body device and peripheral equipment. The body part includes a 16-bit microprocessor, servo circuit, CPU interface (serial interface for peripherals, parallel interface for servo circuits), clock circuit. The peripheral part includes terminal, teaching box, floppy disk (secondary memory device). These devices were made in the early 90s of last century. CPU and memory, floppy disks show good performance in that time. But for now, these devices performance is far backward. They can’t be compared with current ones no matter what is the operation speed or the memory, so the old system can’t store large program and carry out a real time control planning, therefore, it is imperative to expand the old system. When expanding, we keep a fundamental the expanded new system can’t destroy the original controller, and must be compatible with it. That is to say, we extend our control signal and arm status signal from controller to the high-performance graphics workstation by serial ports, so the graphics workstation can make full use of current advance computer performance and greatly enhance the performance of the original controller. For this extension does not involve the change of controller itself hardware, the new system reveals more independent and versatile. It can be easily transplanted to another computer and become the upper control system of the other controllers. Conveniently, in new system you can see the universal controller as a black box, and need only consider the data transmission and processing, rather than the problem of hardware design, so it simplifies the design of software. Compared with the original system, its advantage is obvious. Its hardware extension performance is enhanced, for example the visual sensors can be installed, and it can carry out some complex image processing operations, also can store large volumes of program files, and perform 3D off-line graphical simulation. It improves the environments of the development and application, and can achieve friendly human-machine interface. It improves data processing and can carry out real-time multi-task processing. It allows further software development and complex motion planning. 3.2 Object-Oriented Design of 3D Graphic Programming System [4, 5, 6] Such an open controller based on workstation can fully exerts the performance advantage of the workstation and make 3D graphic programming possible. In the programming system development, it is important to realize the 3D graphics simulation of robot model [7]. Robot 3D graphics simulation is stated to apply computer graphics to build the geometrical model of the robot and its environment, and based on the model, to use some programming algorithm to control and operate the model and succeed the track planning at the off-line instance. More over, it must have
Vision-Based Robotic Graphic Programming System
85
abilities to make collision detection and time arbitrament, and to make simulation of the characteristic of robotic kinetics and dynamics, and to carry out 3D animation simulation to test the programming. Therefore, 3D off-line simulation helps us to understanding the design performance and early detecting of design flaws so as to avoid unnecessary losses. Now there are various methods to make robot 3D graphical simulation. The main methods are OpenGL programming, Open Inventor Programming and Cortonas SDK application. Through a comprehensive comparison, the application of cortona activx control and VRML Automation is found well meeting the design requirements. It has the abilities of a small amount of calculation, fast response and good animation in drawing, and is very fast and convenient to develop program with VC++, furthermore, it is flexible to manage VRML nodes and thresholds so as to realize various control of the movement units of robot model. The following two figures show the robot arm body and the graphics simulation platform based on Cotona3D, respectively.
Fig. 4. Robot Arm
Fig. 5. 3D Graphics programming Platform
Object-oriented programming concepts have been deeply rooted among the people. It is prerequisite to model the robot and its environment, the other ancillary equipments with object-oriented analysis when building robot graphics programming system. In robot system the robot operation can be regarded as a kind of action to its environment. The entire physical system is largely formed by robot, camera, work platform, various workpieces, environmental obstacle, where the robot itself is made up of joints, controller, various sensors, etc. For Object-oriented design and analysis, we must abstract these objects of the robot system to establish the corresponding classes and objects, as well as abstract the interaction between these objects after analyzing and summarizing, eventually into the corresponding classes and objects. The following table shows several major classes in programming.
86
J. Mao et al. Table 1. Some Classes and Their Functional Briefs in Programming Class C232—> CPuma232 CRobot—> CPuma CCamera CCalibrate CRenderView CMathCom
Functional Brief Communication class, charge of communication between robot controller and workstation via RS232. Robot class, receiving instruments and computing robot status and parameters Camera class, video capturing and controlling Calibration class, calibration of robot itself and vision Rendering class, drawing model and displaying animation Class for various algorithms, including the solving of forward and inverse kinematics problem and the planning of motion
3.3 Robot Calibration Robot calibration is the basis of 3D off-line simulation and the prerequisite of robotic normal operation. The parameters from robot calibration will be directly used for robot programming and various control planning. So whether the parameters are precise will directly affect object exact position and tracking for the robot, thus, robot calibration is very important. Current visual sensor has been becoming the standard installation of the robot for its cheapness and information-richness For PUMA, the object of robot itself calibration is to make the sensor reading on the robot joints matched with the amount of the motors’ control, and PUMA robot itself has the calibration programming to realize itself calibration, however, the calibration took relatively long time. Robot vision calibration includes identifications of intrinsic and external parameters of the camera, of which the identification of the external parameters is to solve the rigid transformation between the camera frame to the robot frame, which is the key for vision to guide the robot positioning precisely. There are many calibration algorithms for robot vision. Since the intrinsic parameters and the outer ones are coupled with each other and highly nonlinear, the general calibration algorithm [8,9] are divided into two stages, the first is to determine the camera intrinsic parameters and the rigid transformation matrices between the template frame and the camera frame by template calibration algorithm, the next is to use these matrices to establish identical equations and generally solve the equations linearly. So it is a two-stage method, where the errors in the first stage will be transferred to the second stage solving external parameters. Although the internal and external parameters can be further optimized by means of nonlinear optimizations, it is not very good to eliminate this error transfer and accumulation for these optimizations are generally local optimal in solving. However, the two-stage method has its merits that it can resolve very compound camera model and solve the internal parameters precisely enough to make up for the lack of error transfer. We use the two-stage methods to calibrate the robot vision, that is, to make the robot arm motion as star-shaped tracks to complete calibration, of which the classic template calibration for camera is applied for the internal parameters of the camera, thereby, the calibration of external parameters has been become the solution of the equation AX=XB. There have been many useful algorithms to solve the equation, however, these algorithms almost require that A, B satisfy the rigid body transformation, and the rotation part of A,B should be the same each other. In many circumstances, the rotation part
Vision-Based Robotic Graphic Programming System
87
of A, B from the first stage usually can’t be same, and there is a little deviation. For classic algorithms, the rotation parts of A,B are seen as equality, so it will affect the precision of the final solution. To overcome the shortcoming, we propose a directly linear algorithm [10] based on singular value decomposition and orthogonal optimization estimation. The algorithm need not consider whether A or B is a rigid body transformation matrix as well as the rotation angle is the same, which widely meets general requirements. Calibration Equation A, B, X can be written as:
⎛R A=⎜ a ⎝0
Ta ⎞ ⎛ Rb ⎟ B=⎜ 1⎠ ⎝0
Tb ⎞ ⎛ Rx ⎟ X =⎜ 1⎠ ⎝0
Tx ⎞ ⎟ 1⎠
(1)
where Ra, Rb, Rx are all 3×3 matrices and Ta, Tb, Tx are all 3×1 vector. Substitute these to the equation: RaRx=RxRb
(2)
(Ra-I)Tx=RxTb-Ta
(3)
then To solve AX=XB is equivalent to solve (2)(3). Equation (2) can represent as the following linear form by Kronecker and to solve (2) is equivalent to solve the following linear equation: (Ra⊗I3-I3⊗Rb*)vec(Rx)=0
(4)
Algorithm Brief Obviously, if the rotational part has been solved, (3) can be easily solved linearly. So we will stress on the solution of the rotational part in the following. (1) Solve (4) by SVD and fold the solved vector into the real square matrix B. (2) By SVD of matrix B=USV*, then the optimal rotational part must be determined by Rx=|UV*| Many real experiments and simulations have revealed the rapidness and precision of the algorithm.
4 Experiments and Conclusions 4.1 A Simulation Game of Lighting Candles The graphical interfaces on the simulation platform are shown as the following figures. The demo reveals the robot arm how to exactly reach above the candles and lit the candles in simulation. In real experiment corresponding to the simulation, we also lit the candles. It is successful and inspiring for our simulation platform.
88
J. Mao et al.
Simulation is over, begin the real operation?
Observe at another view
Fig. 6. A demo demonstrates the robot how to light the candles in our 3D graphics simulation platform
4.2 A Game of Robot Exactly Reaching the Bowl’s Center In a real experiment, we use binocular cameras on the pan & tile as the PUMA vision (it has not been shot in the photos) and the vision can guide the robot exactly to reach the bowl’s center as the follow demo shows. In the experiment, many algorithms have been involved, including such as image processing and recognition, robot motion planning, etc, which have been developed into our graphic programming system. Through the experiment these algorithms have been tested successfully.
Vision-Based Robotic Graphic Programming System
Fig. 7(a). Robot initial position
89
Fig. 7(b). Robot final position
References 1. Pascoal, A., Oliveira, P., Silvestre, C., et al.: MARIUS: an autonomous underwater vehicle for coastal oceanography. IEEE Robotics & Automation Magazine 4(4), 46–59 (1997) 2. Maria, G.: The future of robot programming. Robotica 5(3), 235–246 (1987) 3. Fan, Y., Tan, M.: Current state and tendencies in the development of robot controller. Robot 21(1), 75–80 (1999) 4. Miller, D.J., Lennox, R.C.: An object-oriented environment for robot system architectures. In: Proc. IEEE Int. Conf. on Robotics and Automation, vol. 1, pp. 352–361 (1990) 5. Boyer, M., et al.: An object-oriented paradigm for the design and implementation of robot planning and programming system. In: Proc. IEEE Int. Conf. on Robotics and Automation, vol. 1, pp. 204–209 (1991) 6. Wong, R.K.: Advanced object-oriented techniques for modeling robotic systems. In: Proc. IEEE Int. Conf. on Robotics and Automation, vol. 1, pp. 1099–1104 (1995) 7. Robinette, M.F., Manseur, A.R.: Robot-Draw, an Internet-based visualization tool for robotics education. IEEE Transactions on Education 44(1), 29–34 (2001) 8. Tsai, R., Lenz, R.: A new technique for fully autonomous and efficient 3D robotics hand/eye calibration. IEEE Transactions on Robotics and Automation 5(3), 345–358 (1989) 9. Zhuang, H., Roth, Z.S.: Simultaneous Calibration of a Robot and a Hand-Mounted Camera. IEEE Transactions on Robotics and Automations 11(5), 649–660 (1995) 10. Liang, R., Mao, J.: Hand-eye calibration with a new linear decomposition algorithm. Journal of Zhejiang University Science A 9(10), 1363–1368 (2008) 11. Hornick, M.L., Ravani, B.: Computer-aided Off-line Planning and Programming of Robot Motion. Int. J. Robotics Research (4), 18–31 (1986) 12. Lawrence Zitnick, C., Kanade, T.: A Cooperative Algorithm for Stereo Matching and Occlusion Detection. IEEE Transactions on Robotics and Pattern Analysis and Machine Intellegence 22(7), 675–684 (2000) 13. Sun, J., Zheng, N., Shum, H.-Y.: Stereo Matching Using Belief Propagation. IEEE Transactions on Robotics and Pattern Analysis And Machine Intellegence 25(7), 787–800 (2003) 14. Ma, M., Hao, C., Li, X.: A Survey On Grey Image Processing. In: Proceedings of the 5th World Congress on Intelligent Control and Automation, pp. 4141–4145 (2004) 15. Li, L., Li, F.: What, where and who? Classifying events by scene and object recognition. In: ICCV 2007, pp. 1–8 (2007)
Integrating Activity Theory for Context Analysis on Large Display* Fang You1,2, HuiMin Luo1, and JianMin Wang2 1
School of communication and design, Sun Yat-sen University, Guangzhou, China, 510275 2 Key lab of digital life, ministry of education, Sun Yat-sen University, Guangzhou, China, 510275 {youfang,mcswjm}@mail.sysu.edu.cn,
[email protected]
Abstract. During recent years, interaction research on large display is a promising domain, and techniques and usability improvements constantly emerged. Different researchers focus on separate aspects but we need a comprehensive approach to analyze the context of use on large display. In this paper, we applied activity theory to understand the large display usage and showed design ideas of large display: centralized mapping and gesture tracing. We took the speaker-audience usage as an example and presented two prototypes based on activity-centered analysis. Our results were evaluated and the feedback showed that activity-based integration is a feasible solution to large display design. Keywords: Activity, large displays, speaker-audience usage, context of use.
1 Introduction Years of rapid development of hardware makes it possible to access much larger size of display devices. Researches with several topics on productivity of large display have emerged in large numbers. Based on the application context, many researches dig into the usability problems on public large display, semi-public and the private usage. Microsoft researchers [7] have summed up the basic usability issues according to numerous studies and classified the topics into six broad categories. Techniques and usability innovations have expanded the concrete probability of productivity on large display. Wall-size display can be used as a public interactive platform [4, 15, 26] or ambient display [10, 11, 14, 25]. Tabletop large display, which is novel, is focused on the user behaviors and the usability of computer supporting cooperative work [22, 23, 27, 33]. Most of previous works present discuss some points of improving or creating large display usage to a certain extent. However, they lack of a macroscopic method to analyze the context of use on large display. Our research is meant to explore a method to analysis the context of use on large display integrated activity theory, which has been employed as a framework for human-computer interaction research from more than a decade. Furthermore we present an example analysis of speakeraudience usage and the prototype design based on it. *
Partially supported by national nature science foundation of China.
Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 90–103, 2011. © Springer-Verlag Berlin Heidelberg 2011
Integrating Activity Theory for Context Analysis on Large Display
91
2 Related Work 2.1 Research on Large Display Missile Mouse and Target Chooser solve the problem of cursor loss by using only small hand motions [28]. Drag-and-Throw, Push-and-throw, Drag-and-Pop and Drag-and-Pick [1, 6, 12] shorten the distance of the icons on large display by temporarily moving targets or cursor location towards each other. They are comparable metaphor solutions to the distal access to information. Scalable Fabric [29] uses the periphery and flexible design that leverages the user’s spatial memory and visualrecognition memory to aid task recognition. Group Bar [30] provides task management features by extending the current windows. The above researches focus on the problems of operations on large display, while the following research discussed the solutions to the viewing or watching experience. Spotlight [18] presents a new interaction technique for directing the visual attention of an audience when viewing data or presentations on large wall-sized displays. Focus plus context screens [2] are wall-size low-resolution displays with an embedded high-resolution display region. The scalability of information visualizations for large display [34] explores the effect of using large, high resolution displays to scale up information visualizations beyond potential visual acuity limitations. Raphael [13] surveys graphical cues designed to direct visual attention and adapts them to window switching and reports the results of two user studies. Wincus [31] designed by Microsoft is a novel interaction technique that allows users to replicate arbitrary regions of existing windows into independent windows. These approaches to large display all focus on technical or other single points. However, the large display usage is a process full of context. Separate design and analysis can improve some aspects of the usage but cannot make a general survey of the context of use. Our research explores a method integrating activity theory to comprehensively analyze the large display usage, not only separate scenarios but all the context-dependent activities, for example, what instrument do users use to interact with the large display and the effect of the interaction. 2.2 Activity Theory for HCI Activity theory was first born within Soviet psychology, since the mid 1980s it has been explored as a basic perspective on human-computer interaction (HCI) [3, 5]. Kuutti, K. [20] presented the activity theory as the potential framework for humancomputer interaction research. The solution offered by activity theory is that a minimal meaningful context for every action must be included in the basic unit of analysis. This unit is called an activity. An activity unit showed in Figure 1 has six elements [8]: the basic three are subject, who has the motive towards the object, and the mediated tools. And the extended elements are community, who share the same object, and rules and division of labor in the relationships of subject-community and community-object. The relationships of the elements in an activity are showed in Figure 2 [8]. Activities consist of chains of actions and each action consists of unconscious operations. All the activities generated by their motives and the actions defined by the goals, which are determined
92
F. You, H. Luo, and J. Wang
by the motive. The unconscious operations are triggered by the actions and the conditions, which also affect the goals. Through the process of these three levels, the objects in an activity transform into outcomes. Levels of an activity are flexible, an activity can change into an action when loses its motive action can become an operation through automation and an operation can become an action through conceptualization [5]. That is to say the six elements structure analysis can be used in individual level.
Fig. 1. Basic structure of an activity [20]
To deal with the complexity of human abilities and skills as well as the cognitive aspect, activity theory provide a fresh perspective on certain problems in the fields of HCI and present a structural and intertwined analysis framework to HCI [17].
Fig. 2. Levels of a activity
Present researches of activity theory used in Computer Support Cooperative Work wildly. Augmented Reality system [9] is a kind of graspable groupware, which supports cooperative planning. Zhang [35] used activity-based method for modeling the Flow-Manufacturing-Oriented ERP systems. It was also used in understanding of human activity in the larger social context [24, 32] and the context-aware in software development [16]. Activity theory as a philosophy and cross-disciplinary framework can contribute in such aspects [20]: 1) Multi-levelness. 2) Studying interaction embedded in social context. 3) Dealing with the dynamics and development. In this paper, we integrate activity theory to analyze the large display usage, which contains a chains of interactions embedded in semi-public context. We expect activity theory can be a guide to task analysis and prototype design.
Integrating Activity Theory for Context Analysis on Large Display
93
3 Structure of Use on Large Display Figure 3 depicts our activity structure of large display usage. According to activity theory, the subject who is a person or a group engaged in the activity [19] acts on the object through the mediation tools, in our context, the mediation is set to be the large display. The subject is the one who has a form of doing directed to the object and use large display. Depending on the concrete cases, subject can be a speaker in a presentation of a meeting or an advertiser who wants to promote some product through the large display, and it also can be a personal user or a player and so forth if the large display is a public one. Accordingly, object can be an audience, a consumer, the data or the game and so on. Community component shares the same object with the subject. Under the speaker-audience conditions, it may be the member of a presentation but not the speaker, such as a coordinator or a receiver. In the case of the public large display, the community is the group of persons who do the same task with the subject. “Rules” mediates the relationship between subject and community; it may be the conventions in the community. In the situation of the public large display; it may be the convention of different types of users. “Division of labor” mediates the object and community means the organization of the collaboration in the community.
Fig. 3. The structure of activities of large display
In the following sessions, we focused on the speaker-audience situation of large display usage as the example of our integrating activity analysis.
4 User Analysis 4.1 First User Study We conducted a user study to illustrate the interface elements of the speaker-audience large display usage. Leont’ve [21] indicated different activities are distinguished by their motives, so that the activity of speaker-audience usage of large display can be defined as presentation.
94
F. You, H. Luo, and J. Wang
In our first study, we divided the elements into most common modes, which represent the layout of presentation application. They included the main area with a message showing box and the interrelated menu that opened the exact file in it, the toolbar which scale and move the content in the main display area. Considering the size of the large display, a thumbnail of the main area was also considered. Mediation in computer-mediated works can be divided into the physical, the handling and the subject/object-directed aspects [5]. Handling aspects are the elements support operations of the users, and can be seen as a group of tools that allowed the users to focus on the “real” object. In the cases of the speaker-audience usage, the audiences are the “real” objects, and the elements on user interface are considered as tools. We used paper prototype on a board measured 2048mm*768mm, in which the elements could be moved. Participates moved and re-pasted the paper elements till they were convenient according to the tasks. We created 3 tasks consisting of common operations, for instance, moving, pressing, and underlining and so forth among different elements. Each task was set into the scenario of presentation: Scenario: Speaker-audience usage. A person or a group engages in the presentation and audience watch around. The aim of the speaker is to explain the content to the audience as well as to display the operation processing. He can use laser pen or other instruments to present. On the other hand, the audience’s goal is to understand what the speaker wants to show. Five participants (three females), all skilled computer users were recruited for this study. They performed the tasks in the scenario under an observer, which recorded the time of each task and the paths of the operations and so on. The tasks included two Preparing Tasks, requiring the users to operate the menu, the main area, the toolbar and the thumbnail one after another to open the exact file and adjust it to the right location. Main Task called for the shifting among the main area with a message box, the toolbar and thumbnail, and explaining the details of the content to the observer or audience (the other participates). The elements as the tools were concerned with the actual activities in line with the tasks. An interview was held after the tasks. It focused on what the size and the locations of the elements were suitable for the users during the tasks and why they pasted them on that exact area. We consulted tracing focus shift method [5] and analyze the records of the users’ tasks in a form where the users in one dimension, and the elements that were used by different users as tools in other. The description including essentials of the operation process was after the last elements. Further more, we had summarized the users’ aim and adjustment of tasks in two scenarios based on the interview.
a
b
Fig. 4. The user presented as speaker in the first study (a) Most speakers stood at the right side (b) The speaker obstructed the view of audience inevitably sometimes
Integrating Activity Theory for Context Analysis on Large Display
95
In most situations, the speakers stood at the right side of the large display subconsciously at the beginning but walked across to the left side to explain the details on the display area while the tasks demanded (Figure 4). They used right hand to operate and explained with some small margin gestures most of the time, and then switched to the left hand sometimes when they stood at the left side of the large display. We drew out the contradictions in the speaker-audience large display usage based on the result. In activity theory there are four levels of contradictions: primary is the contradiction within each element of a single activity, and secondary is the one between two elements of a single activity. The imbalances between an activity and a more advanced version of this same activity are the tertiary, and the quaternary is between different activities. We focused on the secondary contradictions among the speaker, the audience and the large display. Figure 5 shows the contradiction items.
Fig. 5. Secondary contradictions between the three elements
Design issues concluded from contradiction analysis are as follow: z
z
Centralization of the user-interface control elements. Different from the personal use, the speakers expected a centralized control area to make a better presentation. This control area was expected to be on the lower right of the screen. (The right-hander’s expectation). Content mapping of the main area. A centralized control area was easy to cause contradiction between the facilitation of the speaker and the misunderstanding of the audience. Because they could not watch the exact operation gestures through the movement of arms and fingers, the audience found it hard to understand what the speaker wanted to display.
96
F. You, H. Luo, and J. Wang
4.2 Second User Study In order to abstract some concrete design issues, we had an expert review to illustrate the scenarios of the speaker-audience as well as the analysis of the context of use. Ten participates including three user-interface designers and seven software engineers of large display were invited. The expert review discussed some subjects about the ideas of centralized control area with content mapping. At the same time two comparable prototypes were showed on an IMS Series (Interactive Multicube System), two interactive digital boards each with resolution 1024x768. One of the prototypes was the ordinary presentation application with the user-interface elements that had been analyzed in the first study; the other had a mapping zone of the main area on the lower right of the large display. The second one had some operation, such as, clicking, moving, drawing and so on in the mapping zone; the main area would show the effect of these operations correspondingly. A list of questions held the expert review. The questions covered the basic issues and phrase issues about the two comparable prototypes. Basic issues focused on the physical point of the speaker-audience large display usage, i.e., where the most standing location was during the presentation and when and how the speaker want to interact with the audience. Phrase issues emphasized the relationship between concrete operations and interface elements in those situations. i.e., what the gestures on the main area were if the speaker wanted to explain the details. Each issue was detailed discussed based on the experience and the description recollections of participates. All the designers and engineers operated the corresponding prototype at crucial points and gave some design ideas. According to our speaker-audience situation, the activity which was generated by the motive of making a good presentation could be described as: standing on the side of the large display, using hands to point. Even if the border between the levels of activity is blurred [20] that it was in fact impossible to make a concrete classification of the actions and operations, we described the actions that constituted the presentation activity and the operations triggered by the actions based on the scenarios. The most important actions that affected the user interface of the large display had been extracted from chains of actions. Table 1 shows the level-analysis of these important actions. Each action had its mediated relationship structure which depended on the goal. The objects and tools of the structures changed during the process of the presentation activity showed in Table 1. In the presentation activity, the speaker acted on the overall object audience through chains of operations directed to large display by different aspects of mediation, such as, the fingers, the user interface elements on the display and so on. This activity consists of chains of important actions: the interaction with audience by language and the adjusting of the way of presentation by the feedback of the users and the coping strategy, etc.
Integrating Activity Theory for Context Analysis on Large Display Table 1. The level analysis of the presentation activity governs
triggers is in accord with
Goal
Action achieves
Opening the exact file and adjust it to the best position
realizes
Standing on the right side, operating the large display.
Operation
Condition
1.Open file
the
2.Choose the right document 3.Adjust material
the
The speaker had to check the effect after the operation. The display is too large to reach another side. The speaker always touches the screen unconsciousness that leads misoperation. Because of the large size, the speaker has to go around pointing to the exact target and the audience view is sometimes obstructed.
Explainin g the property of the content well
Explaining the target
1.Using language explain
to
Language is not enough to the audience to comprehend.
2.Using gesture explain
to
There is cognitive gap between the speaker and the audience.
3.Using contextual facial expression Making sure the mode of present is good for the audience.
Adjusting the speaking and pointing mode according to the feedback.
Design issues
triggers
1.Looking around 2.Using facial expressions
The audience is hard to express his emotion or feedback in this situation.
4.Turning head
The speaker has a little time to make sure whether the audience understands what he presents.
5.Selecting a suitable location
The communication time is too short for the speaker to adjust.
6.Preparing for the next presentation
The speaker must adjust his mood in a short time.
3.Using language
Using visual and audio hints to tell the effect of the operation. The button had to be big enough to press correctly. Having a mapping zone to control the whole display.
Language, gesture and contextual facial expression may be parallel. Using some hints or instruments to explain. Having a repeat or a slow cast function. Having something like a progress bar.
97
98
F. You, H. Luo, and J. Wang
5 The Prototype In this session we will describe 1) the mapping and gesture tracing prototype design, 2) what the mapping zone is in the large display and how it works for the speaker, and 3) what and how the gesture tracing is during the presentation activity. 5.1 Mapping for the Speaker Figure 6 shows the mapping prototype for the speaker. The centralized control mapping zone on the right corner is movable within the whole display. The mapping zone is integrated controls of the main area of display, and menu and toolbar that can be customized are also combined. Speakers can operate in the centralized control mapping zone while having the presentation, so that the obstruction can be avoided. Users can drag the frame of the centralized control mapping zone during the tasks. It is on the uppermost layer and folds when hands move out and activates, and unfolds when hands move in. The active time will last for some seconds after the moving away of the hands. When close to the edges and out of activation, it will retract automatically.
Fig. 6. The mapping prototype for the speaker
The corresponding relationship between the mapping zone and the main area is similar to the navigation in the design software. The image accuracy in the mapping zone is less than the main area and the introduction text is not included in the zone. Peculiarity is the synchronous of the mapping zone and the main area. The speaker selects some part of the zone and the corresponding area in the main area will be highlighted. And the effect after the operations would be synchronous in the main area and the centralized control mapping zone. The readability of the characters in main area is one of the crucial problems of the corresponding relationship. We provide solution of scaling up the handwriting in the centralized control mapping zone by the smoothing of the strokes. It is certain that the centralized control mapping zone is the supplement of the large display, users who have the presentations as the speakers can act on the main area directly also. According to the second study, speakers preferred to have the largerange movement not in the centralized control mapping zone, but on the main area itself. However, simple operations, such as the single-click and double-click and drag, were preferred to operate in the centralized control mapping zone.
Integrating Activity Theory for Context Analysis on Large Display
99
5.2 Gesture Tracing for the Audience In this session, we present another idea of the prototype: gesture tracing. According to the user model of speaker-audience usage on large display, the secondary contradictions between each element (the subject speaker, the tool large display and the object audience) of the presentation activity is important. Mapping is convenient for speakers to operate and reduce the obstruction of the view of audiences. But, this mapping control mode will lead to cognitive gap between the speaker and the audience because of the centralization of the operations. How to balance the contradiction between usage of the speaker and the audience is the essence of using activity theory. Centralized control mapping zone is one of the means to renew balance of contradiction between the speaker and the large display that we found from the studies and analysis based on activity theory. And to the contradiction between the audience and the large display and the one between the speaker and audience, gesture tracing is the solution. From the analysis of the second study, we summarize some detailed issues of gesture tracing: z
z
z
The abstraction extent of the tracing and the time of process of the gesture tracing have to be in accord with the concrete application design. Whether the gesture should show the trace or not depends on the definition of the gesture. Some misoperations didn’t have to be concluded. Provide fluid intergradations of the tracing and the outcome of the gesure. Avoiding the excessive effect of tracing that disturb the main presentation.
Figure 7 shows the gesture tracing prototype on large display. The key enabling features of the gesture tracing are the location showing and the moving tracing of touch points. As the speaker operates on the centralized control mapping zone, audiences want to know where the touch point is and what the effect of the gestures is, moreover they want to know the process of the gestures, such as the start point, direction and the end point.
Fig. 7. The gesture tracing prototype for the audience. (a) The zoom in tracing of moving. (b) The zoom out tracing of moving.
100
F. You, H. Luo, and J. Wang
6 User Feedback In order to confirm whether the prototypes derived from the activity analysis work in practice, we had a comparing evaluation between mapping situation and no-mapping situation on large display. 6.1 Evaluation Method We used the mapping and gesture tracing prototype and the similar prototype without centralized control mapping zone. We recruited ten participants in the evaluation, they included three IT professional and seven students of new media design. Participants were arranged into two groups, one presented to be the speaker using the large display and the other was the audience. We designed four tasks for a round and each round had one speaker partnered with one audience. The speaker loaded four different pictures on the comparable prototypes respectively and introduced them to the paired audience; the main goal was to tell the audience “what it is” and “what it is use for”. He/she pointed to the exact areas prescribed in the tasks, and had some gesture operations, such as, moving, scaling and so on. In order to reduce the influence of the sequence, we used random order of prototypes. Whether the centralized control mapping zone design improves the usability of a common large display can be deduced in the searching time as well as the complexity of the paths. Hence we recorded the completion time and the paths of the tasks done on the two prototypes. The click points could be found in the paths, and the time between two clicks indicated the searching time. For the audience, after finishing all the tasks, we used the questionnaires of fivepoint Likert scale, where score 1 means strongly disagree and score 5 means strongly agree. Statements about the comprehension and the satisfaction of the presentation are asked to evaluate. And some important responses, we had interviews with them. 6.2 Task Result Participants presented as the speaker preferred to use the centralized control mapping zone when they had to introduce the objects far away while they touched the main area directly sometimes. When using the prototype without mapping, they all had to go across embarrassed and quickly to finish the tasks. All the speakers responded they were afraid to obstruct the audience’s view. Table 2. The detailed description of satisfaction
Description I want to know the process of the gesture. I want to see the moving tendency of the hands. I want to see a longer time of the tracing after the gesture.( 2-3 seconds) I know what the tracing mean after the operations. I don’t like the speaker move around I only want to know the effect after the operations.
Average score 4 4 4 3.8 3.8 2.8
Integrating Activity Theory for Context Analysis on Large Display
101
Most users liked to operate on the centralized control mapping zone even though they had different use modes. Three speakers looked around the main area and operated in the centralized control mapping zone alternately, and one user finished all the tasks only in the zone and seldom watched the main area. He said “I like to watch on large area but operate on smaller area”. One speaker indicated that she could not make sure the corresponding effect in the main area while using the centralized control mapping zone but she agreed that the view of the audience shouldn’t be obstructed. We found out the preference of the audience in the five-point Likert scale by average scores of the description. The average scores of the descriptions were calculated by weighted average of the preference score of different items. Table 3 shows the detailed description of satisfaction according the Likert scale. Audiences strongly agreed “I don’t like the speaker moving around”. They liked to watch the hand gesture of the speaker during the whole tasks not only the effect of the operations. About the gesture tracing, audiences liked the showing of the hand moving tendency. And for the relevant objects on the large display, gesture tracing was a good method to understand the relationship among them.
7 Conclusion In this paper, we presented the integration method of activity theory to analyze the large display usage comprehensively. It is useful to describe a wholly situation of the usage and the descriptions were the guides of design ideas. We created two studies to analysis the concrete situation of speaker-audience usage on large display, which were in accordance with the activity-centered design method. After the analyze of the usage according to the activity theory, we presented two design ideas for large display usage: 1) the centralized control mapping zone for the speaker, which also avoids the obstruction, 2) gesture tracing for audience, a solution to fill up the cognitive gap between the speak and the audience. The evaluation of the prototype demonstrated the feasibility and effectiveness of the activity-based integration method to analyze the large display usage. We believed the centralized control mapping zone with gesture tracing plays a valuable role in a variety of large display applications, such as the presentation systems in modern meetings or the conferences in a medium to large venue. In future, we will continue to explore the descriptive method, activity theory, to further improve the prototype, especially the corresponding relationship between the gesture tracing and the effect after operations.
References 1. Baudisch, P., Cutrell, E., Robbins, D., Czerwinski, M., Tandler, P., Bederson, B., Zierlinger, A.: Drag-and-Pop and Drag-and-Pick: Techniques for Accessing Remote Screen Content on Touch and Pen-operated Systems. In: Proc. Interact 2003, Amsterdam, pp. 57–64 (2003) 2. Baudisch, P., Good, N.: Focus Plus Context Screens: Displays for Users Working with Large Visual Documents. In: CHI 2002 Extended Abstracts on Human Factors in Computing Systems, pp. 492–493. ACM Press, New York (2002)
102
F. You, H. Luo, and J. Wang
3. Bødker, S.: A human activity approach to user interface design. In: Human-Computer Interaction, vol. 4(3), pp. 171–195. L. Erlbaum Associates Inc., Mahwah (1989) 4. Brignull, H., Izadi, S., Fitzpatrick, G., Rogers, Y., Rodden, T.: The introduction of a shared interactive surface into a communal space. In: Proc. CSCW 2004, pp. 49–58. ACM Press, New York (2004) 5. Carrol, J.M.: HCI models, theories, and frameworks: toward a multidisciplinary science, pp. 291–324. Morgan Kaufmann, San Francisco (2003) 6. Collomb, M., Hascoët, M., Baudisch, P., Lee, B.: Improving drag-and-drop on wall-size displays. In: Proc. GI 2005, Canadian Human-Computer Communications Society, pp. 25– 32 (2005) 7. Czerwinski, M., Robertson, G., Meyers, B., Smith, G., Robbins, D., Tan, D.: Large Display Research Overview. In: CHI 2006 Extended Abstracts on Human Factors in Computing Systems, pp. 69–74. ACM Press, New York (2006) 8. Engeström, Y., Punamäki, R.-L. (eds.) (forthcoming): Perspectives on Activity Theory. Cambridge Univ. Press, Cambridge 9. Fjeld, M., Lauche, K., Bichsel, M., Voorhorst, F., Krueger, H., Rauterberg, M.: Physical and Virtual Tools: ActivityTheory Applied to the Design of Groupware. In: Computer Supported Cooperative Work, vol. 11(1-2), pp. 153–180. Kluwer Academic Publishers, Dordrecht (2002) 10. GeiBler, J.: Shuffle, Throw, or take it! Working Efficiently with an Interactive Wall. In: CHI 1998 Extended Abstracts, pp. 265–266. ACM Press, New York (1998) 11. Grossman, T., Wigdor, D., Balakrishnan, R.: Multi-Finger Gestural Interaction with 3D Volumetric Displays. In: Proc. UIST 2004, pp. 61–70. ACM Press, New York (2004) 12. Hascoët, M.: Throwing models for large displays. In: Proc. HCI 2003, Designing for Society, vol. 2, pp. 73–77 (2003) 13. Hoffmann, R., Baudisch, P., Weld, D.S.: Evaluating Visual Cues for Window Switching on Large Screens. In: Proc. CHI 2008, pp. 929–938. ACM Press, New York (2008) 14. Holmquist, L., Mattern, F., Schiele, B., Alahuhta, P., Beigl, M., Gellersen, H.: Smart-its friends: A technique for users to easily establish connections between smart artefacts. In: Abowd, G.D., Brumitt, B., Shafer, S. (eds.) UbiComp 2001. LNCS, vol. 2201, pp. 116– 122. Springer, Heidelberg (2001) 15. Huang, E.M., Russell, D.M., Sue, A.E.: IM here: public instant messaging on large, shared displays for workgroup interactions. In: Proc. CHI 2004, pp. 279–286. ACM Press, New York (2004) 16. Jiang, T., Ying, J., Wu, M., Fang, M.: An Architecture of Process-centered Context-aware Software Development Environment. In: Proc. EDOCW 2006, pp. 1–5 (2006) 17. Kaptelinin, V., Nardi, B.A.: Activity Theory: Basic Concepts and Applications. In: CHI 1997 Extended Abstracts on Human Factors in Computing Systems: Looking to the Future, pp. 158–159. ACM Press, New York (1997) 18. Khan, A., Matejka, J., Fitzmaurice, G., Kurtenbach, G.: Spotlight: Directing Users’ Attention on Large Displays. In: Proc. CHI 2005, pp. 791–798. ACM Press, New York (2005) 19. Kofod-Petersen, A., Cassens, J.: Using activity theory to model context awareness. In: Roth-Berghofer, T.R., Schulz, S., Leake, D.B. (eds.) MRC 2005. LNCS (LNAI), vol. 3946, pp. 1–17. Springer, Heidelberg (2006) 20. Kuutti, K.: Activity Theory as a Potential Framework for Human-Computer Interaction Research. In: Nardi, B. (ed.) Context and Consciousness, pp. 17–44. MIT Press, Cambridge (1996) 21. Leont’ev, A.N.: Activity and consciousness. In: Philosophy in the USSR: Problems of Dialectical Materialism, pp. 180–202. Progress Publishers, Moscow (1977)
Integrating Activity Theory for Context Analysis on Large Display
103
22. Morris, M.R., Huang, A., Paepcke, A., Winograd, T.: Cooperative Gestures: Multi-User Gestural Interactions for Co-located Groupware. In: Proc. CHI 2006, pp. 1201–1210. ACM Press, New York (2006) 23. Morris, M.R., Paepcke, A., Winograd, T., Stamberger, J.: Team Tag: Exploring Centralized versus Replicated Controls for Co-located Tabletop Groupware. In: Proc. CHI 2006, pp. 1273–1282. ACM Press, New York (2006) 24. Neto, G.C., Gomes, A.S., Castro, J., Sampaio, S.: Integrating activity theory and organizational modeling for context of use analysis. In: Proc. CLIHC 2005, pp. 301–306. ACM Press, New York (2005) 25. Oka, K., Sato, Y., Koike, H.: Real-time tracking of multiple fingertips and gesture recognition for augmented desk interface systems. In: IEEE International Conference on Automatic Face and Gesture Recognition, pp. 429–434 (2002) 26. Peltonen, P., Kurvinen, E., Salovaara, A., Jacucci, G., Ilmonen, T., Evans, J., Oulasvirta, A., Saarikko, P.: It’s Mine, Don’t Touch!: Interactions at a Large Multi-Touch Display in a City Centre. In: Proc. CHI 2008, pp. 1285–1294. ACM Press, New York (2008) 27. Piper, A.M., O’Brien, E., Morris, M.R., Winograd, T.: SIDES: A Collaborative Tabletop Computer Game for Social Skills Development. In: Proc.CSCW 2006, pp. 1–10. ACM Press, New York (2006) 28. Robertson, G., Czerwinski, M., Baudisch, P., Meyers, B., Robbins, D., Smith, G., Tan, D.: The Large-Display User Experience. IEEE Computer Graphics and Applications 25(4), 44–51 (2005) 29. Robertson, G., Horvitz, E., Czerwinski, M., Baudisch, P., Hutchings, D., Meyers, B., Robbins, D., Smith, G.: Scalable Fabric: Flexible Task Management. In: AVI 2004: Proceedings of the Working Conference on Advanced Visual Interfaces, pp. 85–89. ACM Press, New York (2004) 30. Smith, G., Baudisch, P., Robertson, G., Czerwinski, M., Meyers, B., Robbins, D., Andrews, D.: GroupBar: The TaskBar Evolved. In: Proc. OZCHI 2003, pp. 34–43 (2003) 31. Tan, D.S., Meyers, B., Czerwinski, M.: WinCuts: Manipulating Arbitrary Window Regions for More Effective Use of Screen Space. In: CHI 2004 Extended Abstracts on Human Factors in Computing Systems, pp. 1525–1528. ACM Press, New York (2004) 32. Widjaja, I., Balbo, S.: Structuration of Activity: A View on Human Activity. In: Proc. OZCHI 2005, Computer-Human Interaction Special Interest Group (CHISIG) of Australia, pp. 1–4 (2005) 33. Wobbrock, J.O., Morris, M.R., Wilson, A.D.: User-Defined Gestures for Surface Computing. In: Proc. CHI 2009, pp. 1083–1092. ACM Press, New York (2009) 34. Yost, B., Haciahmetoglu, Y., North, C.: Beyond Visual Acuity: The Perceptual Scalability of Information Visualizations for Large Displays. In: Proc. CHI 2007, pp. 101–110. ACM Press, New York (2007)
Line Drawings Abstraction from 3D Models Shujie Zhao1, and Enhua Wu1,2, 1 Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macao 2 State Key Lab of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing
[email protected]
Abstract. Non-photorealistic rendering, also called stylistic rendering, emphasizes on expressing special features and omitting extraneous information to generate a new scene different from the primary one through digital processing. The stylistic rendering is of importance in various applications, in particular in the entertainment such as production of cartoon, digital media for mobiles etc. Line drawing is one of the rendering techniques in non-photorealistic rendering. Using feature lines to convey salient and important aspects of a scene while rendering could provide clearer ideas for model representation. In this regard, we propose a method to extract feature lines directly from three-dimensional models in this paper. By the method, linear feature lines are extracted through finding intersections of two implicit functions that can work without lighting, and rendered with visibility in a comprehensive way. Starting from an introduction to the purpose of line drawings, the development of the method is described in this paper. An algorithm for line extraction using implicit functions is presented in the main part of the paper. Test results and analysis on performance of the test are given. Finally, a conclusion is made, and the future development on line drawings is discussed. Keywords: non-photorealistic rendering, stylistic rendering, line drawing, feature line, implicit function, isosurface.
In recent years, the size of volumetric dataset, such as a computerized tomography (CT) or a magnetic resonance imaging (MRI) scan, is greatly increased as the scanning hardware and simulation algorithms have been rapidly improved. Using feature lines to convey salient and important aspects of a model while rendering could reduce the complexities but keep enough information of the original model and provide clearer ideas to observers. At the same time, rendering cost could be reduced by just rendering certain feature expressions instead of the entire model.
Shujie Zhao was born in 1982. She is a Master’s student at University of Macao. Her current research interests include non-photorealistic rendering. Enhua Wu is a Professor at University of Macao, and State Key Lab of Computer Science, Chinese Academy of Sciences. His research interests are photorealistic rendering in computer graphics including virtual reality, realistic image synthesis, scientific visualization, physically based modeling and animation.
Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 104–111, 2011. c Springer-Verlag Berlin Heidelberg 2011
Line Drawings Abstraction from 3D Models
1
105
Introduction
Line is one of the basic primitives in computer graphics. It can be used to convey salient and important aspects of two-dimensional images or three-dimensional models while rendering. Different feature lines can help us to distinguish between different parts and features of a scene. They also can be used to highlight some details that might be ignored while shading. There are several different feature lines, including silhouettes, boundaries, creases, crest lines, occluding contours, as well as suggestive contours, used to depict important aspects of objects. Boundaries and creases are model-dependent [11], and crest lines are view-independent [17]. Silhouettes, occluding contours and suggestive contours are view-dependent. When the viewpoint changes, they need to be recalculated. As we know that lots of algorithms for line drawings have been proposed. They can be divided into two main categories: image-space algorithms and objectspace algorithms. Image-space algorithms render some scale fields at first, and then perform some image processing to extract feature lines on the frame buffer. Object-space algorithms extract lines from a model directly. There are also some hybrid algorithms, which perform a bit process in object-space and need mostly graphics hardware tricks. In 1990, Satio and Takahashi [18] first presented a data structure and algorithm for drawing edges from image buffer. After that, Decaudin [9], Curtis [5] and Gooch et al. [10] also proposed algorithms using first- and second- order differentials to extract feature lines on frame buffer. Since the depth buffer was available, those image-space algorithms can work without cognition of the surface representation. However, they suffer from the limited resolution of the depth image. With this regard in this paper, we propose a method that extracts feature lines directly from three-dimensional models. The earliest algorithm of line drawings for 3D models was proposed in 1967 by Appel [2]. Markosian et al. [13] introduced an algorithm that improves Appel’s hidden line algorithm. In 2003, DeCarlo et al. [6] presented a new type of feature lines, called suggestive contours. The contours are found at the locations where the dot product between normal vector and view vector is zero. And some locations at which the dot product between normal vector and view vector is positive local minimum rather than zero, they are true contours from relatively nearby viewpoint, but not the contours from the current viewpoint. Those locations are almost-contours, and suggestive contours are found at those locations to complete the contours. Suggestive contours are sensitive to represent more details of shape information. When a set of points on the surface satisfies the condition that its radial curvature κr equal to 0, and the directional derivative of radial curvature Dw κr is positive in the direction w, which is an un-normalized vector of view vector V projected onto the tangent plane, those points are suggestive contours. Using the method proposed by DeCarlo et al. [6], some undesirable lines may be drawn. And the algorithm does not consider the size of the object shown in image, those feature lines may be depicted too small or too large for
106
S. Zhao and E. Wu
current view. Ni et al. [15] proposed a method to control the size of shape features view-dependently in line drawings of 3D meshes. The method needs more time in preprocessing and more memory at run-time than the original mesh. Some algorithms extract lines dependent on lighting, such as those proposed by DeCarlo and Rusinkiewicz [8], and Lee et al. [12]. However, the methods have a limitation that some feature lines had no parameter in lighting. Some feature lines are view-dependent, and their extraction is cost intensive. Based on the work in [17], we present a real-time rendering algorithm in dependent of lighting for line drawings abstraction from 3D models as implemented in Chapter 2. The algorithm extracts feature lines directly from isosurfaces instead of meshes. Hermite interpolation is used to find the intersection of two implicit functions. We trace a ray to check the visibility of those feature lines.
2
Algorithm and Implementation
For the several feature lines we concerned, we will extract them using implicit functions [1]. By the method, we find intersections between two implicit functions to determine the locations of feature lines, and apply Hermite interpolation to refine those lines in a comprehensible way. However inside the volume, some feature lines extracted are in fact invisible viewed from the outside which need to be removed, so the visibility processing of feature lines needs to be further made. 2.1
Feature Lines Extraction
Instead of rendering entire volumetric data, rendering an isosurface can improve efficiency and reduce the complexity of the original model. The scheme can turn the clutter of multiple overlapping layers of data into a clear and easy way of expression. Those linear features that lie on isosurface within the volume are extracted directly. Isosurface F, as a 2D manifold embedded into 3D datasets, is defined as the zero-set of the function: f (i, j, k) = φ(i, j, k) − τ,
(1)
where τ is a threshold within the ranges of the data values. For the current viewpoint, contours are extracted from the location at which n · v = 0, where n is the surface normal and v is the viewpoint vector. We need to find the zero set of contour surfaces C: c(i, j, k) = −∇φ(i, j, k) · v(i, j, k).
(2)
The process of extracting a contour is to find cubes that include zeros of implicit functions for both f and c at a specific threshold at first. The intersections between two implicit functions are found with Hermite interpolation in each
Line Drawings Abstraction from 3D Models
107
cube. Finally, those intersections are connected to find a segment of contour. Using this method, other feature lines can be extracted by finding intersections between the implicit function f and another implicit function, which represents one kind of feature lines, inside the volume. 2.2
Trace of Feature Lines
Instead of searching over all volumetric data to find out lines, lines are traced by searching correlative cells. That is practical for interactive visualization of large datasets since it is output-sensitive. As the intersection of any two 2D surfaces is a 1D segment, we can trace lines by using those intersections. For a given seed cell, a 1D segment of the line is extracted, then move to the cell adjacent to other face including one of the intersection points, then recall the process of line extraction and repeat tracing. This loop terminates when the current cell returned to the starting seed cell.The process is as shown in figure 1.
Fig. 1. The process for finding a line
Not all of the lines are visible. For example, the interior contours are not required to at all. Some lines like suggestive contours are view dependent, and they will be moving along the surface when the viewpoint changes. If those curves move quickly while the viewpoint changes, the curves are considered as unstable and should be pruned away. Unstable curves are pruned based on the speed of their motion. Implicit function theorem [14] is used to obtain the speed. As the viewpoint changes, the seed points of a new frame are first going to examine those cells that include contours in the previously drawn frame. The previous unstable curves would be pruned away when the viewpoint changes, and a new contour needs to be calculated. The new valid cell may locate near the old one, or at random in the volume. So an iterative gradient descent method is used to find new seeds while the viewpoint changes.
3
Experimental Results and Performance
The integrated development environment (IDE) for implementing the algorithm is Microsoft Visual C++ 6.0 with Service Pack 5, and it runs under the system of Microsoft Windows 2000 with Service Pack 4. The testing platform is AMD Athlon(tm) 64 X2 Dual Processor 4000+ CPU with 1G RAM of memory. And the graphics hardware is Nvidia GeForce 6800 GT with 256M RAM of Video memories. OpenGL 1.4 graphics library and GLUI 2.0 graphical user interface (GUI) tool are used to build a simple user interface.
108
3.1
S. Zhao and E. Wu
Experimental Results
Several experimental models were used to test the algorithm, with the results as shown from the figure 2 to figure 5. The results are generated from different viewpoints, with various feature lines in different styles. In these experimental results, we render lines using texture mapping or toon shading. Lines are extracted from volume data directly, and they distributed over the entire model. Using texture mapping or toon shading while rendering can get a representation more sensitive to the observer.
Fig. 2. The original model Bunny is shown on the left. The results are rendered with toon shading and contours (green) in the middle, and with texture mapping on the right.
Fig. 3. The original model Lion
3.2
Performance
The resolution of rendering window is 1024 × 768 in all cases, and the rendering frame rate for all models we tested ranges between 34.97 to 0.92 fps for models with 67K to 6M triangles. And the rendering frame rate ranged between 18.5 to 0.016 fps of the models we tested by using brute force algorithm. The processing time for each model in different steps is as shown in Table 1. While the size of the volumetric data is n × n × n, the time complexity for generating the isosurface is O(n2 ) in a particular bandlimited shape represented. As we know that suggestive contours take more computation time, and handling of visibility costs more time.
Line Drawings Abstraction from 3D Models
109
Table 1. Processing time of Abstract Line Drawings at each steps Model Bunny Lion Brain
Vertices C + SC (ms) Smoothing Meshes(ms) Texture mapping (ms) 72027 292.75 323.94 313.42 183408 755.85 879.94 507.18 294012 1098.52 1206.52 692.43
Fig. 4. The results are rendered with texture mapping and contours (green), from a side viewpoint on the left, and another viewpoint from the front on the right
Fig. 5. The original model Brain is shown on the left. The results are shown in different styles with suggestive contours (blue), contours (green), and boundaries (black) in the middle, and texture mapping with contours on the right.
Since the system renders the model in O(n) in terms of the number of vertices, the time complexity for visibility processing is in O(n2 ). When the viewpoint changes, only contours are drawn. The full drawing with suggestive contours and hidden surface removal would be made when the motion stops. 3.3
Applications
The algorithm proposed in the paper is of significance particularly in entertainment industry. It can be used in creating cartoons, and in particular employed in the digital media in mobile industry, on the transactions between mobile phone and the World Wide Web. The representation of different objects using feature lines can highlight smart figures and point out special aspects. Since the size of data sets grows fast nowadays and many people use mobile phone to connect the web, using lines in transaction of objects between mobile and the web can significantly improve the speed, and promote the transaction.
110
4
S. Zhao and E. Wu
Conclusion
In this paper, we expatiate on the purpose of abstract line drawings, and an algorithm of abstract line extraction for 3D models is presented and implemented. By the algorithm, linear feature lines are extracted through finding intersections of two implicit functions that can work without lighting. Those feature lines are rendered with visibility in a more comprehensive way. Experimental results of different models show that the algorithm is effective to provide a line drawing representation for 3D models. The processing to the algorithm in real time is achieved in applications. We intend to develop improved methods to reduce the processing cost on rendering and improve the time for finding different feature lines. We may combine this algorithm with other effects of NPR, such as hatching, halftoning and so on, to get more comprehensive effect. In addition, we plan to extend our investigation in the future on how to use NPR techniques for scientific visualization. Acknowledgments. This work was supported by China Fundamental Science and Technology 973 Research Grant (2009CB320802), the National 863 HighTec Research and Development Grant (2008AA01Z301) and the Research Grant of University of Macau.
References 1. Akenine-M¨ oller, T., Haines, E.: Real-time rendering, 2nd edn. A K Peters, Wellesley (2002) 2. Appel, A.: The notion of quantitative invisibility and the machine rendering of solids. In: Proceedings of the 1967 22nd ACM Annual Conference/Annual Meeting, pp. 387–393 (1967) 3. Burns, M., Klawe, J., Rusinkiewicz, S., Finkelstein, A., DeCarlo, D.: Line drawings from volume data. ACM Transactions on Graphics (SIGGRAPH 2005) 24(3), 512– 518 (2005) 4. do Carmo, M.P.: Differential geometry of curves and surfaces. Prentice-Hall, Englewood Cliffs (2004) 5. Curtis, C.: Loose and sketchy animation. In: Conference Abstracts and Applications, SIGGRAPH 1998, p. 317 (1998) 6. DeCarlo, D., Finkelstein, A., Rusinkiewicz, S., Santella, A.: Suggestive contours for conveying shape. ACM Transactions on Graphics (SIGGRAPH 2003) 22(3), 848–855 (2003) 7. DeCarlo, D., Finkelstein, A., Rusinkiewicz, S.: Interactive rendering of suggestive contours with temporal coherence. In: Proceedings of the Third International Symposium on Non-Photorealistic Animation and Rendering, pp. 15–24 (2004) 8. DeCarlo, D., Rusinkiewicz, S.: Highlight lines for conveying shape. Working manuscript (2007) 9. Decaudin, P.: Modeling using fusion of 3D shapes for computer graphics - cartoonlooking rendering of 3D scenes. PhD thesis, Universite de Technologies de Compiegne, France (1996)
Line Drawings Abstraction from 3D Models
111
10. Gooch, B., Sloan, P.P.J., Gooch, A., Shireley, P., Riesenfeld, R.: Interactive technical illustration. In: Proceedings of ACM Symposium on Interactive 3D Graphics 1999, pp. 31–38 (1999) 11. Gooch, B., Gooch, A.: Non-photorealistic rendering. A K Peters, Wellesley (2001) 12. Lee, Y.J., Markosian, L., Lee, S.Y., Hughes, J.F.: Line drawings via abstracted shading. In: Proceedings of ACM SIGGRAPH 2007, vol. 26(3) (2007) 13. Markosian, L., Kowalski, M.A., Trychin, S.J., Bourdev, L.D., Goldstein, D., Hughes, J.F.: Real-time nonphotorealistic rendering. In: Proceedings of SIGGRAPH 1997, pp. 415–420 (1997) 14. Munkres, J.: Analysis on manifolds. Addison-Wesley, Reading (1991) 15. Ni, A., Jeong, K., Lee, S., Markosian, L.: Multi-scale line drawings from 3D meshes. In: Proceedings of ACM Symposium on Interactive 3D Graphics and Games 2006, pp. 133–137 (2006) 16. Rusinkiewicz, S.: Estimating curvatures and their derivatives on triangle meshes. In: Proceedings of the Second International Symposium on 3D Data Processing, Visualization, and Transmission, pp. 486–493 (2004) 17. Rusinkiewicz, S., DeCarlo, D., Finkelstein, A.: Line drawings from 3D models. In: ACM SIGGRAPH 2005 Course Notes #7 (2005) 18. Satio, T., Takahashi, T.: Comprehensible rendering of 3D shapes. In: Proceedings of Computer Graphics (SIGGRAPH 1990), vol. 24(3), pp. 197–206 (1990)
Interactive Creation of Chinese Calligraphy with the Application in Calligraphy Education Xianjun Zhang1,2,3, Qi Zhao1, Huanzhen Xue1, and Jun Dong4,* 1
Software Engineering Institute, East China Normal University, Shanghai, 200062, China 2 International Education College, Shanghai University of Finance and Economics, Shanghai, 200433, China 3 The Key Laboratory of Complex Systems and Intelligence Science, Chinese Academy of Sciences, Beijing, 100080, China 4 Suzhou Institute of Nano-Tech and Nano-Bionics, Chinese Academy of Sciences, Suzhou, 215125, China
[email protected],
[email protected],
[email protected],
[email protected]
Abstract. Given a few tablet images of Chinese calligraphy, it is difficult to automatically create new Chinese calligraphy with better effects while keeping similar style. A semiautomatic creation scheme of Chinese calligraphy and its application in calligraphy education are proposed. First of all, the preprocessing, contour tracing and skeleton extraction are performed respectively on the images of original Chinese calligraphy characters. After that, strokes are extracted interactively. Then a statistical model is used to make stroke reforming. All strokes are stored separately in the stroke library. Finally, new Chinese calligraphy characters are created based on the structure of skeletons and right selection of reformed strokes from the stroke library. The experimental results show the created Chinese calligraphy characters look similar to the samples in the structure, however, with different effects. Keywords: Chinese calligraphy, stroke reforming, skeleton, calligraphy education.
1 Introduction Chinese calligraphy is handwriting of Chinese character written by a hairy brush. The creation process of Chinese calligraphy is a typical and complex cognitive process with learning, experience, knowledge and skill involved [1]. The computerized simulation of the creation process of Chinese calligraphy can extend the depth and scope of artificial intelligence and promote the understanding of the essence of intelligence. In general, Chinese characters are composed of a few elemental strokes. A hairy brush is made of a bundle of hair from selected animals, which is soft and flexible. As a result, it is highly skillful to manipulate the hairy brush to control the orientation and *
Corresponding author.
Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 112–121, 2011. © Springer-Verlag Berlin Heidelberg 2011
Interactive Creation of Chinese Calligraphy
113
pressure. The learners of Chinese calligraphy must make a lot of practice to grasp how to write each stroke by a hairy brush. In addition, the learners need to know the structure knowledge of Chinese characters in order to make proper layout of a few strokes. The skeleton of a Chinese character can provide important information about the structure of this character, spatial layout of strokes and the relationships between them. It is possible for computers to make use of skeleton and reformed strokes to create a new Chinese calligraphy. A lot of work has been done to address related problems such as virtual brush modeling [2], [3], [4] and [5], ink diffusion modeling [6] and [7], stroke extraction [8], stroke reforming [9], emulation of imagery thinking [10], automatic generation of Chinese calligraphy [11] etc. A typical approach proposed in [11] is an intelligent system using a constraint-based analogous-reasoning process. The system fuses knowledge from multiple sources to support a restricted form of reasoning and to be able to generate nice Chinese calligraphy. However, the drawbacks are too many parameters need to be set manually, too much experience is required to interact with the system, and the efficiency is low. Moreover, many details about the strokes and the characters cannot be processed precisely. The interplay result between learning and simulation is training simulation. The interactive training simulation makes the learner to interact with existing models [12]. The computerized simulation of the creation process of Chinese calligraphy can provide new means to teach and train the learners of calligraphy. In this paper, a semiautomatic creation scheme of Chinese calligraphy and its application in calligraphy education are described. The paper is organized as follows. Section 2 introduces the decomposition process of Chinese calligraphy including preprocessing, contour tracing, skeleton extraction and stroke extraction respectively. Stroke reforming is performed in section 3. The experimental results are presented in section 4. Section 5 illustrates how the proposed scheme is applied in the calligraphy education. Finally, conclusions are drawn and future work is discussed in Section 6.
2 Decomposition Process of Chinese Calligraphy Decomposition of Chinese calligraphy is to analyze the Chinese character to obtain the structure information and strokes. A lot of operations are required in this process including preprocessing, contour tracing, skeleton extraction and stroke extraction. 2.1 Preprocessing A lot of excellent Chinese calligraphy was carved on the stones in the long history in China. The images of Chinese calligraphy were captured by digital cameras and stored in the form of image files on the storage medium. As a result nearly all tablet images are dispersed with spots and scratches with various sizes. Thus the preprocessing is required to improve further processing effects. The preprocessing mainly consists of filtering, segmentation, binarization. The filtering helps to remove some random noise and make the image smooth. The segmentation aims to locate and segment individual characters, which can be done by manual or by segmentation algorithms. Binarization is to convert the segmented image to binary image.
114
X. Zhang et al.
In figure 1, an example of tablet image of Chinese character ”chun” (means ”Spring” in English) with much noise is shown. The result after filtering, segmentation and Binarization performed on this tablet image is shown in figure 2.
Fig. 1. Tablet image of Chinese character ”chun”
Fig. 2. Result image after preprocessing
2.2 Contour Tracing In order to obtain the contour of Chinese character, contour tracing need to be done on the binary image of Chinese character. In this paper, the contour tracing method, which is commonly used in digital image processing, is used to obtain the set of the coordinates of the contour points. The contour in binary image of Chinese character is always continuous and closed. Starting from one contour point which has not been searched yet and following along the contour in a certain direction (generally, anti-clockwise for the outer ring, clockwise for the inner ring), the contour tracing method can find the set of the boundary points of the closed contour. By repeating this operation several times, the collection of the contour points of all parts of the character will be obtained. The contour traced from the Chinese character ”chun” in figure 2 is shown in figure 3. 2.3 Skeleton Extraction Each Chinese character is a pattern composed of a few elemental strokes with specific relationships between them. The information of relationships can be obtained from the
Interactive Creation of Chinese Calligraphy
115
Fig. 3. Traced contour of Chinese character ”chun”
skeleton of Chinese character after thinning operation is per- formed on the result image in the preprocessing stage. The improved algorithm in [13] is particularly suitable to extract the skeleton of Chinese character, which extracts a smoother skeleton that keeps the connectivity and symmetry of the original image while ensuring one pixel width. We select this algorithm for skeleton extraction of tablet character. The first step is to remove redundant pixels by thinning the character while keeping the inner points, prominent parts and special points (such as crossing points and corner points) for the connectivity. The next step is to delete the burr of the skeleton caused by the rough boundary of the calligraphy. Finally, redundant pixels are removed to ensure the single-pixel-width. The skeleton extracted from the Chinese character ”chun” in figure 2 is shown in figure 4. 2.4 Stroke Extraction Strokes are the fundamental elements of Chinese characters. The following sections about stroke reforming and synthesis of characters are based on the extracted strokes. Therefore, it is crucial to decompose the Chinese character into strokes. The algorithm of stroke extraction in this paper comes from two different extraction algorithms in [14] and [15] with some proper combination and simplification. In some cases human-computer interaction is needed due to the difficulty of the extraction. The contour of a calligraphy character includes the information of the basic outline of the stroke such as the coordinates of boundary points, the curvature, arrival angle
Fig. 4. Extracted skeleton of Chinese character ”chun”
116
X. Zhang et al.
and output angle, and the intersection points of strokes and so on. The basic steps of stroke extraction are finding the matched intersection points of strokes and then matching each pair of the two intersection points (which are located on the same stroke). As a result, a single stroke can be extracted. The strokes extracted from the Chinese character ”chun” in figure 2 are shown in figure 5.
Fig. 5. Extracted strokes of Chinese character ”chun”
3 Stroke Reforming In order to create new Chinese calligraphy, strokes must be reformed based on some existing Chinese calligraphy. The stroke reforming approach should be able to keep essential shapes of original strokes with proper variation simultaneously. First of all, a few feature points need to be identified from the contour to represent the shape of character. In general, these kinds of feature points are turning points with significant curvature locating at the places of intersection between two strokes. However, it is not enough for these points to completely describe the information of the contour. It is necessary to select some landmarks additionally along the contour in order to represent the shape of character precisely. Each stroke can be represented by those landmark points. For example, if there are n landmark points for a stroke and (xi , yi ) represents the ith point’s coordinate, then a 2n elements vector, X = (x1 , …, xn , y1 , …, yn )T , is composed to describe the stroke.
Fig. 6. One sample of tablet Chinese character ”chun”
Interactive Creation of Chinese Calligraphy
Fig. 7. Another sample of tablet Chinese character ”chun”
Fig. 8. Extracted strokes of tablet Chinese character ”chun”
Fig. 9. Reformed strokes of tablet Chinese character ”chun”
117
118
X. Zhang et al.
Fig. 10. Overlapping of skeleton and reformed strokes of tablet Chinese character ”chun”
Fig. 11. Results in the creation process of Chinese characters ”chun hui da di”
Given a few tablet images of Chinese calligraphy as samples, a parameterized statistical model was proposed in [9] to reform the strokes through the adjustment of a few parameters. The model can be represented as: Y = M(b), where Y is target stroke to be generated and parameter b is a vector to adjust the stroke features. When the parameter is changed with some specific values, a new stroke will be generated. The training set is formed by the coordinates of landmark points of given samples. Furthermore, the training set is aligned by using Generalized Procrustes Analysis (GPA) and the primary features of these samples are extracted by Principal Components Analysis (PCA). A few features in the training set are extracted and represented by the form of several eigenvectors. New strokes can be generated based on the extracted features.
Interactive Creation of Chinese Calligraphy
119
Other two samples of tablet Chinese character ”chun” are shown in figure 6 and figure 7. The strokes in figure 8 are extracted from the characters ”chun” in figure 1, figure 6 and figure 7. The reformed strokes are shown in figure 9. The overlapping result of skeleton and reformed strokes is shown in figure 10.
4 Synthesis and Experiments A lot of experiments have been made on many Chinese calligraphy characters in ”Li” style. As an example, given a few tablet images of Chinese calligraphy ”chun hui da di” (means ”Spring comes back in nature” in English), computer can create new Chinese calligraphy. First of all, preprocessing is performed on the original tablet images of Chinese calligraphy. Then the further processing is done as follows: (1) The skeleton is extract using thinning algorithm. (2) The contours are traced and the strokes are extracted. (3) The selected strokes are reformed properly. (4) New Chinese calligraphy is synthesized based on the re- formed strokes and the skeleton. The experimental results are shown in figure 11. The first three rows are samples of original tablet images of Chinese calligraphy required by the stroke reforming. The sample in the third row is selected as an example. The result after preprocessing is shown in the fourth row. The skeleton and extracted strokes are shown on the fifth and sixth row respectively. The final results in the last row are the Chinese calligraphy created by the computer. It is obvious that the created Chinese characters keep the ”Li” style and structure. Moreover, the results look smoother since the strokes are reformed properly.
5 Application in Calligraphy Education There are three phases including ”Mo”, ”Lin” and creation under the guidance of the teachers in traditional Calligraphy education. ”Mo” means that the learner puts a piece of semi-transparent paper on a copybook to write all strokes of Chinese characters with a hairy bush. A lot of practice is needed for the learner to master the skills to use the hairy bush and to write the Calligraphy as close as the Calligraphy in the copybook. ”Lin” means that the learner puts the copybook aside and writes the Calligraphy on a piece of paper to a degree that the written Calligraphy is similar to the Calligraphy in the copybook. Creation means that the learner writes the Calligraphy without any reference based on the knowledge and skills required in the former two phases. The computer is not involved in the whole process. Self-learning of Calligraphy in traditional Calligraphy education is very difficult since the learner don’t know how to improve. The semi-automatic creation scheme proposed in this paper in interactive way will be helpful to the Self-learners. First of all, the Calligraphy in the copybook and the Calligraphy written by the learner are digitized and stored in the computer. The computer can decompose the characters into strokes, compare the results between the Calligraphy from the copybook and the Calligraphy written by the learner and teach the learner how to improve.
120
X. Zhang et al.
6 Conclusion The research on automatic creation of Chinese calligraphy is a challenging field with intersection and fusion from a few fields including image processing, graphics, machine learning, art, imagery thinking etc. Its value can be found in many applications such as interactive calligraphy education and entertainment. More- over, starting from the simulation of the creation process for Chinese calligraphy, the investigation to simulate human intelligence by computer may be promising. Based on skeleton and strokes reforming, the approach proposed in this paper can create new Chinese calligraphy with the characteristics of a few different samples. The experiments show that the reforming effect of strokes is reasonable and the created Chinese characters are smooth and aesthetic. However, human- computer interaction is necessary for some complicated Chinese characters to extract and reform strokes as well as the synthesis of Chinese characters. In the future, the measurement model will be needed to evaluate the Chinese characters created by computer, which may be the next problem. Acknowledgments. This work was sponsored by Program of Shanghai Subject Chief Scientist (07XD14203), Shanghai Basic Research Key Project (08JC1409100), National Basic Research Program of China (2005CB321904) and Open Project Program of Key Laboratory of Complex Systems and Intelligence Science of Chinese Academy of Sciences (20070108).
References 1. Dong, J.: Introduction to Computer Calligraphy (in Chinese). Science Press, China (2007) 2. Strassmann, S.: Hairy Brushes. In: SIGGRAPH 1986, vol. 20(4), pp. 225–232. ACM Press, USA (1986) 3. Girshick, R.B.: Simulating Chinese Brush Painting: The Parametric Hairy Brush. In: SIGGRAPH 2004. ACM Press, USA (2004) 4. Wong, H.T.F., Ip, H.H.S.: Virtual Brush: a Model-based Synthesis of Chinese Calligraphy. Computers Graphics 24, 99–113 (2000) 5. Chu, N.S.H., Tai, C.-L.: An Efficient Brush Model for Physically-based 3D Painting. In: 10th Pacific Conf. Computer Graphics (PG 2002), pp. 413–421. IEEE CS Press, Los Alamitos (2002) 6. Guo, Q.-l., Kunii, T.L.: Modeling the Diffuse Painting of ’Sumi-e’. In: Proc. IFIP WG5.10., Modeling in Computer Graphics, pp. 329–338. Springer, Berlin (1991) 7. Lee, J.: Diffusion Rendering of Black Ink Paintings Using New Paper and Ink Models. Computers Graphics 25(2), 295–308 (2001) 8. Lee, C.-n., Wu, B.: A Chinese Character Stroke Extraction Algorithm based on Contour Information. Pattern Recognition 31(6), 651–663 (1998) 9. Dong, J., Xu, M., Pan, Y.-h.: Statistic Model-Based Simulation on Calligraphy Creation (in Chinese). Chinese Journal of Computers 31(7), 1276–1282 (2008) 10. Dong, J., Xu, M., Zhang, X.-j., Gao, Y.-q., Pan, Y.-h.: The Creation Process of Chinese Calligraphy and Emulation of Imagery Thinking. IEEE Intelligent Systems 23(6), 56–62 (2008)
Interactive Creation of Chinese Calligraphy
121
11. Xu, S.-h., Lau, F.C.M., Cheung, W.K., Pan, Y.-h.: Automatic Generation of Artistic Chinese Calligraphy. IEEE Intelligent Systems 20(3), 32–39 (2005) 12. Martens, A., Diener, H., Malo, S.: Game-Based Learning with Computers – Learning, Simulations, and Games. In: Pan, Z., Cheok, D.A.D., Müller, W., El Rhalibi, A. (eds.) Transactions on Edutainment I. LNCS, vol. 5080, pp. 172–190. Springer, Heidelberg (2008) 13. Tang, Y., Zhang, X.-z., Wang, Z.-x.: An Algorithm for Distilling the Skeletons of the Works of Chinese Calligraphy (in Chinese). Journal of Engineering Graphics 5, 98–104 (2006) 14. He, R., Yan, H.: Stroke Extraction as Pre-processing Step to Improve Thinning Results of Chinese Characters. Pattern Recognition Letters 21, 817–825 (2000) 15. Ma, X.-h., Pan, Z.-g., Zhang, F.-y.: The Stroke based Chinese Outline Font and its Application (in Chinese). Chinese Journal of Computers 19(3), 81–88 (1996)
Outline Font Generating from Images of Ancient Chinese Calligraphy Junsong Zhang1, Guohong Mao1, Hongwei Lin2, Jinhui Yu2, and Changle Zhou1 1 Lab of Mind, Art&Computation, Cognitive Science Department, School of Information Science and Technology, Xiamen University, Xiamen 361005, China {zhangjs,maogh,dozero}@xmu.edu.cn 2 State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, China {hwlin,jhyu}@cad.zju.edu.cn
Abstract. Chinese calligraphy is an art unique to Asian cultures. This paper presents a novel method for generating outline font from historical documents of Chinese calligraphy. The method consists of detecting feature points from character boundaries, and approximating contour segments. The feature-pointdetection is based on statistical method considering the characteristics of a calligrapher. A database of basic strokes and some overlapping stroke components of Chinese characters extracted from the calligrapher are constructed in advance. And the relation between the noise level of stroke contours and the standard deviation of Gaussian kernel is retrieved from the database using linear regression. Thus, given an input character contour, the standard deviation for smoothing the noisy character contour can be calculated. Furthermore, a new method is employed to determine the feature points at the standard deviation. The feature points at a character contour subdivide the contour into segments. Each segment can be fitted by a parametric curve to obtain the outline font. Some experimental results and the comparisons to existing methods are also presented in the paper. Keywords: Chinese Calligraphy, Historical Document, Outline Font, Feature Point Detection.
1 Introduction Chinese calligraphy has a long history of more than four thousand years, there are abundant historical documents of calligraphy left and stored in the libraries and museums (Fig.1 shows two examples). The calligraphy documents stored as images are not easy for further application. That is because each Chinese character is stored as an array of pixels, which can be seen as bitmap font, and the scaling operation for the character images will bring significant distortion effect. In addition, recording images of calligraphy document at pixel level also requires a lot of storage space [1]. In contrast to the bitmap font, the outline font, such as TrueType font [2], defined as a set of parametric curves, can be scaled to any size and otherwise transformed more easily than a bitmap font through certain numerical processing. It would be desirable therefore to convert the Chinese characters from bitmap font into outline font. Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 122–131, 2011. © Springer-Verlag Berlin Heidelberg 2011
Outline Font Generating from Images of Ancient Chinese Calligraphy
123
Fig. 1. Calligraphy tablet images from Zhenqing Yan (A.D.709 - 785)
However, due to long storage time of the original paper documents, uneven color of the document surfaces, as well as some noise generated by the scanning, the scanned document images of Chinese calligraphy are noisy. As a result, the detected character boundaries from the document images are usually rough, the feature points are thus hard to be detected accurately. Research on feature-point-detection from planar shapes has been conducted for a couple of decades, and many methods have been proposed [3]. These methods mainly fall into two categories, CSS (curvature scale space) based methods [4][5] and ROS (region of support) based methods [6][7]. The ROS based algorithms for feature-point-detection mainly depend on the decision of a region of support for every point on the curve, and the ROS should reflect the local property of the feature point. Teh and Chin [8] suggested that the detection of feature points relies primarily on the precise determination of the ROS. To support their argument, they applied three different significance measures (k-curvature, 1curvature and k-cosine) to different test shapes, and showed that once the ROS is determined properly, different significance measures produce similar results. However, the ROS is difficult to accurately determine, especially when the contour is noisy. And most of the previous methods need a number of additional thresholds and parameters for determining the ROS. Hence, the traditional ROS methods will not be considered here because of our need for a robust method with as few parameters as possible. The scale space concept was introduced by Witkin [9]. It has been widely extended to many research fields. The idea of scale space is that objects in nature present us with different scales, or sizes. From the viewpoint of shape analysis, the CSS method is essentially a multi-scale organization of invariant geometric features of a planar contour. The features consist of curvature zero-crossings or local extrema obtained from the smoothed curvature function at multiple scales. And curvature function smoothing is achieved by convolving the curvature function using a Gaussian kernel with increasing standard deviation. Since multi-scale organization of the curvature function results in a natural representation of shape information at various levels of
124
J. Zhang et al.
details, where noise and insignificant features are filtered out at smaller scales and only the prominent shape features surviving to larger scales, the CSS representation is therefore robust to noise. However, the CSS based feature-point-detection algorithm has been suggested that if too large a scale is selected, some feature points of fine features will be missed, and vice versa [8]. It is also computationally intensive due to the iterative convolutions, and some locations of the detected feature points on the curve are inaccurate because of the smoothing effect by Gaussian kernel. In this paper, we propose a statistical deduction based algorithm for detecting feature point on the noisy character contours extracted from historical calligraphy images. The core idea is to obtain the relationship between the noise level of character contour and the standard deviation of Gaussian function by use of linear regression. Given the curvature sum of a new character contour, a standard deviation for smoothing the curvature function of character contour can be calculated with the linear regression formula, all local extrema computed from the smoothed curvature function are recorded as feature point candidates. Furthermore the final feature points from the character contour is chosen by employing a novel method. These feature points are then used to divide the whole contour into segments, and each segment is fitted with a parametric curve according to their bending strength, resulting the outline font of the corresponding input character image. The key advantages of our method are that it is faster than CSS and more accurate than ROS for feature-point-detection. The calligraphic characteristics of the original characters is retained and the noises are removed from the original character outline. The remainder of this paper is organized as follows. In Section 2, the method for character outline extraction and contour tracing is introduced. In Section 3, we propose a statistical deduction based feature-point-detection algorithm for character contours. In Section 4, a least square based curve approximating method is presented to express the character contours in outline font form. And some experimental results are given in Section 5. Finally, Section 6 draws a brief conclusion.
2 Character Outline Extracting and Tracing To extract the outline of characters from calligraphy documents images, we choose Canny operator [10] as the outline detector. Let I denotes an input binary character image, after the character outline is extracted, the value of outline pixels is set 0 and the value of other pixels is set 1 to get a binary image I_c. We save all pixels with 0 value in a list L (l_1, l_2 ,..., l_k ), and record the pixels of each closed contour in l_k in clockwise direction according to the result of contour tracing. To trace a closed contour, we scan the pixels of I_c from bottom to top and left to right until a black pixel s is found, s is taken as the start point for contour tracing. Then we find the next black pixel p adjacent to s in the clockwise direction, and take p as the current start point for tracing the next pixel. The tracing process terminates when s is visited again. Most of Chinese characters consist of multiple strokes, thus we may get several closed contours after the character outline is extracted. In the case of multiple closed contour need to be traced, we use a flag f for every contour point, if a pixel is traced, its flag f is set to be 1, otherwise the flag f is set to be 0. The use of flag in our system ensures that all closed contours will be traced.
Outline Font Generating from Images of Ancient Chinese Calligraphy
125
3 Feature Points Determination 3.1 Constructing Statistical Samples Although Chinese characters in the calligraphy documents are numerous, each character is in fact represented by the combination of basic strokes. A basic stroke is a single stroke but varies in shape. Some typical basic strokes are point, horizontal, vertical, left slant and right slant. A Chinese character may contain basic strokes ranging from 1 to about 30, in which some basic strokes may overlap each other to form different components. These basic strokes and overlapping stroke components may result in closed contour corresponding to their outlines. On these closed contours, basic strokes have their own feature points due to their shapes, and overlapping stroke components have more feature points at the junction of basic strokes. To construct the sample database for evaluating the relationship between the noise level of their outlines and the standard deviation of Gaussian kernel, we should take both basic strokes and overlapping stroke components into account. Since calligraphers have written Chinese characters with different ways, various styles of Chinese calligraphy thus formed. To process various calligraphy images of different calligraphers, we first construct a statistical sample from their basic strokes and components respectively. In this paper, we choose 120 representative strokes and components from the masterpiece by a famous Chinese calligrapher Mr. Zhenqing Yan (A.D.709 - 785) to construct the sample database. 3.2 Linear Regression to Determine Feature Point Candidates After the sample database is constructed, we extract their outlines and calculate the absolute curvature for each point on the closed contour to obtain a curvature function with reference to the contour length. The curvature function is then convolved with a Gaussian kernel. The convolution will smooth the curvature function, and the standard deviation of Gaussian kernel is selected according to the noise level of stroke contour by users’ interaction. Generally, the Gaussian function is defined as −s2
1 2 g ( s, δ ) = e 2δ δ 2π
(1)
where δ is the standard deviation of Gaussian kernel, acting as the scale. The convolution at the scale between the curvature function and the Gaussian function is
δ
L 2 L u =1− 2 u =l +
C (l , δ ) = ∫
c(u ) g (l − u, δ )du
(2)
here, L is the total length of the curvature curve, c(u) is the absolute curvature at the contour point, and C(l,δ) is the convolution curvature curve at the scale. According to our experiment, the selected standard deviation δ for smoothing a given character contour is related to the curvature sum of the character contour. In Fig.2, this is illustrated, Fig.2(a) is an original stroke contour with curvature sum 3.3456. To simulate the character contour with various levels of random noise, we
126
J. Zhang et al.
add different uniform-random-noise to Fig.2 (a) to derive Fig. 2(b)-(e) with variance V=1, 2, 3, 4 respectively. Here the bigger variance added to the stroke contour will lead to the stroke contour with more noise. This suggests that for contours of the same character, bigger curvature sum indicates higher noise level of the stroke contour. To eliminate the influence of noise for the detection of feature points, a bigger standard deviation of Gaussian kernel is needed to smooth the noisy stroke contour. Hence the key is to reveal the relation between the curvature sum and the standard deviation of Gaussian kernel. For each contour in the sample contour database, we compute the curvature sum by adding curvatures of all contour points on a single closed contour together, and select the optimal scale δ by user interaction. Taking the curvature sum as x-coordinates, the scale as y-coordinates, a discrete point set (x_i,y_i), i=1,2,..., n, is obtained in the x-y plane. The relation between the curvature sum and the scale can be constructed using the linear regression model [11].
Fig. 2. A stroke contour with different variance of uniform random noise, from left to right: (a) original contour, (b)-(e) results of adding uniform random noise to (a) with variance V=1, V=2, V=3, and V=4 respectively
Generally, a valuable value for measuring the association between two variables is the correlation coefficient. The formula for computing the correlation coefficient is given by: [∑i =1 ( xi − x )( yi − y )]2 n
r2 =
∑
n
i =1
( xi − x )
2
∑
n
i =1
( yi − y )
(3) 2
denote the averages of the x-coordinates and y-coordinates of the Here, and point set, respectively. Thus, if given a character contour, after calculating its curvature sum, the scale for smoothing the curvature function can be determined using the formula (3). And then, the points on the character contour, which correspond to the local extrema of the smoothed curvature function, are extracted as the feature point candidates. 3.3 Final Feature Points Detection Feature point candidates derived from the linear regression contain both true feature points and some false feature points caused by noise. We should preserve the true
Outline Font Generating from Images of Ancient Chinese Calligraphy
127
feature points and discard false ones. To this end, we develop a new method to determine the final feature points from character contours. For each feature point candidate pi, we compute an angle according to the region of support formed by its two neighbor feature point candidate pi-1 and pi+1. Note that if is bigger than 180 degree, we use the value of 360- replace that of , this will ensure all the value of fall into the region between 0 to 180 degree. Thus, every feature point candidate pi has a ROS represented by the angle . These angles are sorted using the Bubble sort algorithm and represented in the 2D Cartesian system. Taking the serials of the angles as the x-coordinates, and the angles A(i), i= 0,1,...,n, as the y-coordinates, the chord C connecting the start point A(0) and the end point A(n) is the principal axis of the data set. The line segment with the max distance M(A,C) from the point A(i) to the principal axis is the sub-principal axis, and the corresponding angle A(max) is selected as the angle threshold. Those candidate points whose are small than A(max) are preserved as final feature points, since bigger means the current point lies on the smoother contour segment, it should not be seen as a feature point. Our experiments show that the method produces satisfactory results for our task at hand. The reason is that the principal axis connects two trivial angles. If choosing A(0) as the angle threshold, all candidate feature points are preserved as the final feature points, and while choosing A(n) means remaining no candidate feature points. The angle A(max) corresponding to the sub-principal axis is the point with the max distance to the two trivial angles, so it can be selected as the angle threshold.
θ
θ
θ θ θ
θ
θ θ
4 Contour Segment Approximation With the feature points determined above, a character contour is divided into some contour segments. To convert the characters from bitmap font into outline font, we depict the contour segment with a parametric curve. The parametric curve can be selected as a straight line or a quadric (cubic) Bezier curve according to the bend strength of the contour segment. Our method for approximating the contour segment is similar to that of Sarfraz and Khan's [12], but with some modifications. To measure the bend strength of a contour segment, the normalized accumulated chord-length parameterization method [13] is adopted. Assuming there are n points in the contour segment, with the first point S, and the last point E, a chord C is formed by connecting S and E. To test the bend strength of the contour segment, an average distance dave is computed from all points of the contour segment to their corresponding points of the chord. If dave is smaller than a pre-specified error tolerance T1, then a straight line connecting S and E is used to fit the current contour segment; if dave is smaller than a error tolerance T1 and bigger than T2, a quadric Bezier curve is used to approximate the contour segment; otherwise, a cubic Bezier curve is selected to fit the contour segment. To represent the contour segments using Bezier curves, the control points for Bezier curve need to be determined. We use the least square fitting to determine the control points of Bezier curves. In the following, we just describe how to obtain the control points of a cubic Bezier curve.
128
J. Zhang et al.
Generally, a cubic Bezier is expressed as follows:
Q (ti ) = (1 − ti ) 3 p0 + 3ti (1 − ti ) 2 p1 + 3ti2 (1 − ti ) p2 + ti3 p3 ,0 ≤ i ≤ n − 1 (4) where P0, P1, P2, P3 are the four control points of the cubic Bezier, and P0=S, P3=E. The least square fit is obtained by choosing the P1 and P2, which together with two end points P0 and P3 are used as the control points to generate a cubic Bezier to fit the contour segment. The approximating result is shown in Fig.4-Fig.6, where the blue, and black curves suggest that the contour segments are fitted with quadric Bezier or cubic Bezier.
5 Experimental Results In this section, part of our test examples are presented to demonstrate the effectiveness of our method. All test calligraphy images were obtained through scanning of published tablet images.
Fig. 3. Chinese character “Fang”: (a)-(c) RJ algorithm with k=0.02, 0.025, 0.030 respectively; (d)-(f)RW algorithm with k=0.025, 0.027, 0.030 respectively; (g) SD algorithm; (h) our method
Outline Font Generating from Images of Ancient Chinese Calligraphy
129
The performance of our method for feature-point-detection is compared with that of CSS based method by Sezgin and Davis (SD)[5], and classical corner detection algorithm based on ROS proposed by Rosenfeld and Johnston (RJ)[6], as well as the algorithm by Rosenfeld and Weszka (RW)[7]. In Figs.3(a)-(c), the feature points are detected using RJ algorithm with parameter k varying from 0.02, 0.025 to 0.030 respectively. Some redundant feature points can be found while k=0.02. However, the true feature points disappear and false feature points still exist while k=0.025, 0.030, respectively. The results shown in Figs.3(d)-(f) with RW algorithm are similar to that of RJ method in terms of true and false feature points detected. The result of the CSS based algorithm [5] is shown in Fig.3(g). In addition to a few abundant feature points, SD algorithm takes far more time than our method because of many times of convolution involved in it. While our method only need once. The comparison of the elapsed time between the SD algorithm and our method is listed in Table 1. Table 1. A comparison of elapsed time (in second) Fig Fig.4 Fig.5 Fig.6
SD 104.738 25.554 126.305
Our method 0.138 0.058 0.154
Fig. 4. Chinese character “You”. Left: binary image; Middle: contour and feature points; Right: the fitting result.
Fig. 5. Chinese character “Qian”. Left: binary image; Middle: contour and feature points; Right: the fitting result.
130
J. Zhang et al.
In Fig.4-Fig.6, the left of each figure is the original binary image, the middle illustrates the detected outlines and feature points, and the right demonstrates the fitting result with parametric curves. Although the three images are noisy on the character's outline, especially the character in Fig.6, our method is able to detect the feature points accurately. And the fitting result of the character outline looks satisfactory.
Fig. 6. Chinese character “Xie”. Left: binary image; Middle: contour and feature points; Right: the fitting result.
6 Conclusion In this paper, a novel method is proposed for outline font generation from historical documents of Chinese calligraphy. The outline font is described with parametric curves approximating outline segments divided by feature points. A statistical deduction method is developed for feature-point-detection from images of calligraphy documents. The proposed method significantly reduces the computational time and is robust to noise compared with previous feature-point-detection methods. The generated outline font preserves the style of the original calligraphic characters well, which can facilitate the further operation of the calligraphic characters, such as using the outline of calligraphic characters in advertising design. We also expect that the system can be used for computer-aided typeface design. Our currently implemented system works for Kaishu (regular style) well. In addition to Kaishu, there are some other styles such as Xingshu (running style), and Caoshu (cursive style) in Chinese calligraphy. Boundaries of strokes in these styles are rougher than those in Kaishu, because calligraphers wrote them in a fast manner. How to accurately extract the character boundaries and feature points for them is a topic of future work. Acknowledgments. This paper is an enhanced version of our conference papers [14] [15]. And we thank anonymous reviewers for their very useful comments and suggestions. The work described in this paper was supported by the National Nature Science Foundation of China (No.60903129) and the Natural Science Foundation of Fujian Province of China (No.F0910149).
Outline Font Generating from Images of Ancient Chinese Calligraphy
131
References 1. Wright, T.: History and Technology of Computer Fonts. IEEE Annals of the History of Computing 20(2), 30–34 (1998) 2. Apple Computer Inc., The TrueType Font Format Specication, Version 1.0. (1990) 3. Marji, M.: On the Detection of Dominant Points on Digital Planar Curves, PhD Thesis, Wayne State University, Detroit, Michigan (2003) 4. Rattangsi, A., Chin, R.: Scale-Based Detection of Corners of Planar Curves. IEEE Transactions on Pattern Analysis and Machine Intelligence 14(4), 430–449 (1992) 5. Sezgin, T.M., Davis, R.: Scale-space Based Feature Point Detection for Digital Ink. In: Making Pen-Based Interaction Intelligent and Natural, pp. 145–151 (2004) 6. Rosenfeld, A., Johnston, E.: Angle Detection on Digital Curves. IEEE Transactions on Computers 22(9), 875–878 (1973) 7. Rosenfeld, A., Weszka, J.S.: An Improved Method of Angle Detection on Digital Curves. IEEE Transaction on Computers 24, 940–941 (1975) 8. Teh, C.H., Chin, R.T.: On the Detection of Dominant Points on Digital Curves. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(8), 859–872 (1989) 9. Witkin, A.P.: Scale-space Filtering. In: Proc. 8th Int. Joint Conf. Art. Intell., pp. 1019– 1022 (1983) 10. Canny, J.: A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 8(6), 679–698 (1986) 11. Weisstein, E.W.: Least Squares Fitting, http://mathworld.wolfram.com/LeastSquaresFitting.html 12. Sarfraz, M., Khan, M.: An Automatic Algorithm for Approximating Boundary of Bitmap Characters. Future Generation Computer Systems 20(8), 1327–1336 (2004) 13. Farin, G.: Curves and Surfaces for Computer Aided Geometric Design: A Practical Guide, 4th edn. Academic Press, New York (1997) 14. Zhang, J.S., Lin, H.W., Yu, J.H.: A Novel Method for Vectorizing Historical Documents of Chinese Calligraphy. In: IEEE International Conference on Computer-Aided Design and Computer Graphics, pp. 219–224. IEEE Press, Los Alamitos (2007) 15. Zhang, J.S., Yu, J.H., Lin, H.W.: Capturing Character Contours from Images of Ancient Chinese Calligraphy. In: IEEE 2nd Workshop on Digital Media and its Application in Museum & Heritage, pp. 36–41. IEEE Press, Los Alamitos (2007)
Tangible Interfaces to Digital Connections, Centralized versus Decentralized Matthijs Kwak, Gerrit Niezen, Bram van der Vlist, Jun Hu, and Loe Feijs Designed Intelligence Group, Department of Industrial Design Eindhoven University of Technology Den Dolech 2, 5612AZ, Eindhoven, The Netherlands
Abstract. In the era of distributed digital media, technology is moving to the background and interoperability between devices increases. The handles for users to explore, make and break connections between devices seem to disappear in overly complex menu structures displayed on small screens. Two prototypes have been developed that introduce a tangible approach towards exploring, making and breaking connections between devices in a home environment. Findings suggest that users are better able to project their mental model of how the system works on decentralized representations and that a tangible solution is not necessarily a better one. Keywords: Ontology, semantic connections, tangible user interface, internet of things.
1
Introduction
In the era of distributed digital media, especially in a home environment, devices are connected to one another to create preferred experiences. A home theatre system is one example of how multiple devices can create one joint experience when interoperating [5,6,3,7]. With the introduction of portable media players, possibilities and needs for content sharing are even bigger. Currently these devices are connected wirelessly or with all kinds of cables, and users are currently occupied with finding the right cables to connect devices and have to deal with cables that physically allow for connections that are not possible. Even more, some possible connections never get explored, simply because physical cables do not allow for it. Wireless technologies such as Bluetooth solve part of the problem, but introduce overly complex menu structures and devices without proper interfaces. A single task like sharing music from the one device to another currently involves multiple steps on both devices, while one single high-level effort would be desirable. In ‘The Internet of Things’ [8] and ‘Shaping Things’ [14] a world is sketched in which each everyday object has an unique identity and is connected to the internet. In this world, technology has moved to the background and interoperability between devices has been achieved. Provided that these devices are able to communicate with each other and with the user, this could mean the end of Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 132–146, 2011. c Springer-Verlag Berlin Heidelberg 2011
Tangible Interfaces to Digital Connections, Centralized versus Decentralized
133
compatibility problems and the hassle of using cables, and that users will have less physical and visual handles to make sense of their environments and the devices therein. Design can play an important role in this sense-making with paradigms like Tangible User Interfaces [4], that believe that physical handles for digital information provide users with more freedom and control. The SOFIA project is a European research project that targets to “make ‘information’ in the physical world available for smart services - connecting the physical world with the information world” [13]. Within this project a “Semantic Connections” demonstrator was developed named Interaction Tile. This demonstrator allows users to tangibly explore, make and break connections between devices in a smart home environment [11,16,17]. A second demonstrator, named Interaction Tabs, was developed to explore alternative possibilities of Tunis. Where the Interaction Tile provides users with a centralized way of exploring, making and breaking connections, the Interaction Tabs provides users with a decentralized way to perform the same tasks. In order to see which demonstrator would be the easiest to use and allow for a better projection of the users’ mental model, a user experiment was set up to answer the following questions: – Are the demonstrators a better alternative, compared to the conventional method? – Will the users be able to work equally well with both demonstrators? In the first question, “better” is in the sense that exploring, making and breaking connections are easier (more efficient) and more satisfactory (positive user experience). An important aspect is the mental model that the participants have and how it compares to the actual architecture of the system.
2 2.1
Background SOFIA Project and the Interaction Tile
SOFIA (Smart Objects for Intelligent Applications) is a European research project addressing the challenge of Artemis sub-programme 3 on Smart Environments. The overall goal of this project is to connect the physical world with the information world, by enabling and maintaining interoperability between electronic systems and devices. Our contribution to the project is to develop smart applications for the smart home environment, and to develop novel ways of user interaction. For users to truly benefit from smart environments, it is necessary that users are able to make sense of such an environment. One way of facilitating this “sense making” is through design. Our contribution to the SOFIA project aims at developing theories and demonstrators, and investigating novel ways of user interaction with the smart environment, through interaction with smart objects in the space. To illustrate the concepts and ideas developed in the project, a demonstrator was developed. The demonstrator is a tile-like interactive object that allows for both exploration of a smart space in terms of connections, and manipulation of
134
M. Kwak et al.
Dries visits Mark with his music player...
...and wants to play some of his music on Mark’s sound system
they explore connection possibillities between music player and sound system
they make the connection
they also want to connect the ambient lighting system...
...and explore whether the ambient lighting system can be connected as well
they make the connection...
...and enjoy the music and lighting effects
Fig. 1. Interaction Tile in action
these connections and information/data streams through direct manipulation. This is done by making simple spatial arrangements. The Interaction Tile visualizes the various connections, by enabling users to explore which objects are connected to one another, and what can be connected to what. Colored LED lighting and light dynamics visualize the connections and connection possibilities between the devices, by means of putting devices close to one of the four sides of the tile. A user can then check whether there is a connection and if not, whether a connection is possible. By simply picking up the tile and shaking it, a user can make or break the connection between the devices present at the interaction tile. Fig. 1 shows a use case example of how the interaction tile can be used. The Interaction Tile supports the following interactions: – – – –
viewing/exploring existing connections; viewing/exploring connection possibilities; making connections; breaking connections.
Tangible Interfaces to Digital Connections, Centralized versus Decentralized
135
We want to enable users to explore and manipulate the connections within the smart space without having to bother with the lower-level complexity of the architecture. We envision this “user view” to be a simplified view (model) of the actual architecture of the smart space. Conceptually, the connections are carriers of information; in this case they carry music. Depending on the devices’ capabilities (e.g. audio/video input and/or output) and their compatibility (input to output, but no output to output), the Interaction Tile will show the connection possibilities. In our current demonstrators we do not distinguish between different types of data since we are only dealing with audio, but it will be inevitable in more complex scenarios. The Interaction Tile acts as an independent entity, inserting events and data into a triple store and querying when it needs information. The different types of events and the connections between smart objects and their related properties are described in an ontology. The ontology with “isa” relationships indicated is shown in fig. 2. Fig. 3 shows the architecture of the current setup. NFCExitEvent is-a NFCEvent is-a
SmartObject
is-a
NFCEnterEvent
is-a
Owl:Thing
is-a
is-a Event
is-a
NetworkEvent
DisconnectEvent
is-a ConnectEvent
is-a
MediaPlayerEvent
is-a
PlayEvent
is-a is-a
CueEvent
StopEvent
Fig. 2. Ontology indicating “is-a” relationships
We implemented the demonstrator using the Jena Semantic Web framework, the Processing library for Java, and Python for S60. Every interaction with either the music players (smart phones) or the interaction tile results in an interaction event. A semantic reasoner (Pellet) is used to reason about these low-level events in order to infer higher-level results. When the user shakes the tile to establish a connection, two N F CEnterEvent events (generated by the RFID reader inside the interaction tile) by two different devices that are not currently connected, will result in a new connectedT o relationship between the two devices. Because connectedT o is a symmetric relationship, the reasoner will automatically infer that a connection from device A to device B means that device B is also connected to device A. Since connectedT o is also an irreflexive property, it is not possible for a device to be connected to itself. A generatedBy relationship is also created between the event and the smart device that generated it, along with a timestamp and other event metadata.
136
M. Kwak et al.
KP / Surround Sound System Windows XP with Processing TCP Socket TCP Socket over WiFi
KP / Music Player TCP Socket over WiFi Python for S60 Nokia N95
Serial over Bluetooth
SIB Windows XP with Jena
KP / Ambient Lighng System Arduino-based Serial over USB
KP / Music Player Python for S60 Nokia 5800 XpressMusic
KP / Interacon Tile Arduino-based with Processing / Python (RFID)
Fig. 3. Overview of the demonstrator
(a) Interaction Tile
(b) Interaction Tabs
Fig. 4. Two versions of the demonstrator
2.2
Interaction Tabs: Decentralized
A second demonstrator was developed to explore other tangible solutions. The Interaction Tabs demonstrator was implemented using the same set-up and software, but replacing the Interaction Tile (fig. 4(a)) with the Interaction Tabs (fig. 4(b)). First the Interaction tile was analyzed using the Frogger Framework (fig. 5) [18]. The Frogger framework is a design framework that allows for both analyzing and synthesizing interactions. Six relations (couplings) between action and reaction are described: Time. The product’s reaction and the user’s action coincide in time. Location. The reaction of the product and the action of the user occur in the same location.
Tangible Interfaces to Digital Connections, Centralized versus Decentralized
137
(direction) (modality) (dynamics) (expression)
(time) (location) (direction) (modality) (dynamics) (expression)
(time) (location) (direction) (modality) (dynamics) (expression)
(time) (location) (direction)
Functional information
Action
(location)
(time) (location) (direction) (modality) (dynamics) (expression)
Augmented information
(time)
Inherent information
(time) (location) (direction) (modality) (dynamics) (expression)
(modality) (dynamics) (expression)
Fig. 5. Frogger Framework [18]
Direction. The direction or movement of the product’s reaction (up/down, clockwise, right/left and towards/away) is coupled to the direction or the movement of the user’s action. Dynamics. The dynamics of reaction (position, speed, acceleration, force) is coupled to the dynamics of the action (position, speed, acceleration force). Modality. The sensory modalities of the product’s reaction are in harmony with the sensory modalities of the user’s action. Expression. The expression of the reaction is a reflection of the expression of the action. Furthermore, Wensveen distinguishes between three types of feedback and feedforward; functional, augmented and inherent [18]. Feedback is “the return of information about the result of a process or activity” [2]. Functional feedback is “the information generated by the system when performing its function”. Augmented feedback is information generated by an additional source, not directly related to the system and its function. Inherent feedback was defined by Laurillard [9] as “information provided as a natural consequence of making an action. It is feedback arising from the movement itself.” Feedforward is the information provided to the user before any action has taken place. Inherent feedforward communicates what kind of action is possible and how one is able to carry this action out. When an additional source communicates what kind of action is possible it is considered augmented feedforward. Functional feedforward communicates the more general purpose of a product. There are many improvements one can consider for the Interaction Tile (fig. 6) when putting it in the Interaction Frogger framework. We decided, though, to stay as close to the original design as possible; for research purposes it is best to change as little as possible in order to be able to clearly identify what exactly causes change in user behavior (if users’ behavior actually changes).
(time)
(modality)
(time) (location) (direction)
Functional information
Augmented information
Inherent information
M. Kwak et al.
Action
138
(modality) (dynamics) (expression)
(time) (direction) (modality)
(time) (location) (direction)
Functional information
Augmented information
Inherent information
Action
Fig. 6. Interaction Tile in the Frogger Framework
(modality) (dynamics) (expression)
Fig. 7. Interaction Tabs in Frogger Framework
By removing the center tile and moving its functionality into the cubes that represent the devices being connected/disconnected, the demonstrator would become simpler as this would allow for a more direct manipulation of the connections. The (digital) states of the connections would be physically represented, cubes being aligned means that the devices that they represent are connected, and not being aligned means they are disconnected. Inspired by Siftables [10] the cubes were transformed to tabs, because tabs have a clear top and bottom. This does still afford stacking, but hopefully users would understand that the tabs were to be aligned (fig. 4(b)). An LED at each side gives the feedback:
Tangible Interfaces to Digital Connections, Centralized versus Decentralized
139
Red. No connection possible. This occurs when no relation is possible between the two devices of which the tabs are aligned. Green. This occurs when a relation exists between two devices of which the tabs are aligned. To make a connection, the tabs that represent these devices have to be aligned. To break the connection, the alignment has to be broken. As a result of removing the center tile, the Interaction Tabs will no longer allow for the exploration of existing connections and connection possibilities without immediately manipulating the connections. Moving towards having a more physical approach might influence the scalability and have some other practical implications; however, this is beyond the scope of this paper as we try to focus on the interaction itself. We expect these differences to also have different influences on the user’s mental model, in the way users conceptualize connections (including properties like: persistence, transitivity and directionality) and differences in how they imagine devices to be connected, e.g. devices connected in a networked fashion versus connecting devices peer-to-peer. When we also analyze the Interaction Tabs using the Interaction Frogger framework (fig. 7), there are two ways of comparing the two demonstrators: First we consider the Interaction Tile and Interaction Tabs as being part of the same demonstrator set-up as shown in Fig. 3, serving as a device to manipulate the connections between the various devices in the set-up. The changes might improve the interaction with regard to: Direction. With the center tile removed, the direction of making and breaking connections (although done remotely) corresponds better. Modality. With the shaking interaction removed, the modality of making and breaking connections corresponds better. Secondly, we look at the interaction devices themselves as if they were standalone products. This reveals more improvements: Information about time, location, direction and modality are augmented and inherent when the center tile and shaking interaction are removed. For the Interaction Tile only location is inherent, and time, direction and modality are augmented (fig. 8 and fig. 9).
3
Experiment
In order to answer the questions raised in the beginning of this paper, an experiment was conducted. 3.1
Participants
12 participants were invited to the experiment, in which 3 were women and 9 were male. The participants were between 21 and 26 years old. All but one had a BSc. in Industrial Design. One also had a MSc. in Industrial Design.
Action
(location) (direction) (modality)
(location)
(time) (direction) (modality)
Augmented information
(time)
(time) (location) (direction)
Functional information
M. Kwak et al.
Inherent information
140
(modality) (dynamics) (expression)
(direction) (modality)
(time) (location) (direction) (modality)
(time) (location) (direction)
Functional information
Action
(location)
(time) (location) (direction) (modality)
Augmented information
(time)
Inherent information
Fig. 8. Interaction Tile in the Frogger Framework (stand-alone)
(modality) (dynamics) (expression)
Fig. 9. Interaction Tabs in Frogger Framework (stand-alone)
3.2
Apparatus
The following equipment was used: – Dell laptop (Windows XP) with Wi-Fi, Bluetooth antenna, audio out and two USB ports. – Nokia N95 mobile phone with Python installed, running a script to be able to play a sample and communicate with the laptop. – Nokia 5800 XpressMusic mobile phone with Python installed, running a script to be able to play a sample and communicate with the laptop.
Tangible Interfaces to Digital Connections, Centralized versus Decentralized
141
– Ambient Light lamp: A Bluetooth Arduino-based lamp that renders the music in colored lighting using RGB LEDs, with code running to be able to communicate with the laptop. – Samsung NV8 digital camera mounted on a tripod to record the experiment. – Philips speaker set with two satellite speakers and a subwoofer, connected to the Dell laptop. – Netgear WPN824 wireless router. – Interaction Tile and Interaction Tabs, including software in Java and Python. A controlled setting was used to conduct the tests. The study took place in the ‘Contextlab’ at Eindhoven University of Technology. The lab is furnished to look like a living room, which is the context in which the demonstrators would normally be used. 3.3
Measures
We gathered data about the usability of the demonstrators in comparison to conventional methods of connecting devices, using Bluetooth pairing. Here the usability is divided in three aspects; efficiency, effectiveness and satisfaction. The setup of the test was exploratory, but includes two proven methods to gain insight in the participants’ mental models and have the participant score the usability, respectively the ‘Teach-Back protocol’ [15] and the ‘System Usability Scale’ [1]. The action cycle by Norman [12] was also used to gain insight in the participants’ mental models. Added to these methods, we also collected data about task completion time, errors, recovery from errors and participants’ satisfaction with using the method. A between-subjects design was used. 3.4
Procedure
Participants explored, made and broke connections between two mobile phones (a Nokia N95 and a Nokia XpressMusic), a sound system and an Ambient Light lamp. This was done using the Interaction Tile, Interaction Tabs and Bluetooth pairing. Every session was recorded and notes were made by the moderator. In this study, each participant worked through four phases of tasks starting with one of three methods (Interaction Tile, Interaction Tabs and Bluetooth pairing). Bluetooth pairing was tested as a comparative conventional method to measure the usability of the demonstrators. Briefing. Participants received a brief explanation (5 min.) before the test, outside the ‘Contextlab’. They were guided through the task path by the moderator. After the explanation, participants filled in a pre-test questionnaire and signed an informed consent form. The pre-test questionnaire included questions about: age, gender, occupation, self-report of familiarity with interactive systems (computers and mobile phones) and the participant’s experience with usability studies and focus groups.
142
M. Kwak et al.
Tasks. After the briefing, the participants worked on the actual tasks for about 30 minutes (including intermediary discussions). The task path for each method (Interaction Tile, Interaction Tabs and Bluetooth pairing) is as follows: First, users were introduced to the method and given three task descriptions. For each description they were asked to connect the devices or configure the demonstrator to perform the tasks (9 minutes). Second, users were given a task description and asked to fill in an Action Cycle diagram (6 minutes). Third, users were presented with three scenarios. For each scenario they were asked to explain which connections there were (9 minutes). Fourth, users were asked to explain what the method was they had used and how it worked using the teach-back protocol (6 minutes). The order of tasks was random but the same for each participant. Debriefing. After the main tasks were performed there was a post-test questionnaire (5 minutes), where participants filled in the SUS questionnaire to rate the satisfaction of using the method. The session was concluded with a posttest discussion (5 minutes), where the moderator followed up on any particular problem that came up for the participant. 3.5
Moderator Role
The moderator sat in the room with the participant while conducting the session. The moderator introduced the session, conducted a short background interview, and then introduced tasks as appropriate. Because this study is somewhat exploratory, the moderator sometimes asked unscripted follow-up questions to clarify participants’ behavior and expectations. The moderator also took notes and recorded the participants’ behavior and comments. The session was digitally recorded on video using a Samsung NV8 digital camera.
4
Results
Unfortunately the system was not stable enough to accurately measure the performance data that was intended to be measured. The stability program also influenced the grades given by participants in the SUS questionnaire. Therefore the SUS scores were not reliable. However, we did gain certain insights from the observations and other measurements. 4.1
Action Cycle Diagram
The participants clearly had problems with filling in the Action Cycle Diagram. Only a few descriptions correspond to the predefined description. This can be explained by the fact that people do not consciously think about the seven steps as defined by Norman [12] during everyday activity. It is also not uncommon to go through several cycles before a goal is reached and not all of these cycles
Tangible Interfaces to Digital Connections, Centralized versus Decentralized
143
have to include all seven steps. This would require participants to fill in several diagrams or include several cycles in one diagram. Because this issue did not surface during the pilots or the first test, it would have been incorrect to change the procedure. All participants followed roughly the same steps in achieving their goal. All but one participant forgot to mention the breaking of existing connections in ‘Action specifications’. The participants using the Interaction Tabs and Bluetooth noticed this during the execution (before they thought they had achieved the goal) and went through another iteration immediately. Of the participants using the Interaction Tile, all but one participant noticed this after the execution (after they thought they had achieved the goal). These participants went through another iteration at a later stage but were also able to achieve their goal. 4.2
Teach-Back Protocol
While it is possible to draw conclusions concerning the actual mental models of participants, the protocol was mainly used to see if there were notable differences between the methods. Although there were some differences between the participants individually, amongst the methods the drawings and explanations were roughly the same. None of the participants went into details about what happened in the background, but instead focused on the matters ‘at hand’. Three participants (2 for the Interaction Tile and 1 for Bluetooth pairing) mentioned extending the current system with more devices (more mobile phones and a TV). One participant (Interaction Tile) was able to conclude that the connected devices were networked; the rest explained the connections in a hierarchical way. In one of the examples given in [15] the researchers were able to conclude that participants tend to draw little when the system is transparent. If it is less transparent they are likely to make more detailed drawings to better support their story. In this test the level of detail amongst the methods was roughly the same. 4.3
Observations and Post-Test Discussion
None of the participants that worked with the Interaction Tabs had trouble working with that method. During the post-test discussion they only wondered what was happening in the background. This was not because they had been unable to perform certain tasks, but because they suspected more was going on than what was visible to the user. None of the participants working with Bluetooth pairing had trouble working with that method. They all mentioned that they were familiar with this way of connecting devices but had never experienced Bluetooth working this well. The only real trouble for the participants working with the Interaction Tile was the initial experience with that method. It was not clear what the relation was between the center tile and the cubes and all four interpreted the pulsing
144
M. Kwak et al.
green LED as a ‘working connection’. One participant initially thought the LEDs were lasers that could ‘read’ the cubes when placed on top. Another participant thought it was only necessary to align the ‘main’ device to the center tile and align the other devices to the ‘main’ device. During observations and post-test discussions it became clear that all but one participant were not able to get from the method that the connected devices were networked. The tasks given and the methods at hand led them to conclude that the connections were hierarchical and participants mainly followed one of two modes of arranging connections: Linear (from one device to the next) - This was seen with the Interaction Tile and Interaction Tabs. Centralized (from one device outwards) - This was seen with the Interaction Tabs and Bluetooth pairing. Some participants sporadically arranged connections with the Interaction Tile in a way that indicated they took it for a network, but they explained verbally that they expected the system to make a hierarchy out of their arrangement. Some participants also explicitly mentioned that certain connections should not be possible while in fact they were.
5
Discussion and Conclusion
The most interesting results came from the observations and post-test discussions with the participants. The fact that all but one thought and worked in hierarchies is an interesting one. The Interaction Tile was designed to convey a different way of thinking, but instead participants projected their hierarchical way of thinking on the method. By making connections between no more than two devices at a time they did not use the full capacity of the system, took longer to perform the tasks and were slightly annoyed by the ‘extra’ work. Also, for those who thought in centralized hierarchies (one device in the center, the others around it), there was no way of projecting this thought on the Interaction Tile. This is where the power of the Interaction Tabs showed, because it allows more ways of thinking (hierarchical, ontological, linear, and centralized). The participants found meaning in the arrangement of the tabs and the location of the tabs in relation to each other. For the system this does not matter; a connection is a connection and if devices are connected, they are networked. This leads to conclude that the Interaction Tabs are a better fit for this scenario. Because of the setbacks, it is not possible to say whether the Interaction Tile and Interaction Tabs are better than Bluetooth pairing, although it appears that the Interaction Tabs are. It also appeared that participants were better able to perform the tasks with Bluetooth than with the Interaction Tile but this can be attributed to the fact that they had experience with Bluetooth pairing and connecting devices using a GUI. For further research it would be interesting to see whether the hierarchical thinking of people can be generalized to scenarios other than the ones used in this user experiment. This could include other or more devices and media, or even
Tangible Interfaces to Digital Connections, Centralized versus Decentralized
145
completely different contexts. If it can be generalized, an interesting question would be whether solutions like the demonstrators should allow for hierarchical thinking while working with ontologies, or not. Although the described experiment was useful and showed interesting results, it clearly has some limitations. For such an experiment to be successful, more participants are required, preferably without a design background. Six participants for each method is limited, four even more so. It is clearly not possible to collect reliable quantitative data with this number of participants. Added to that, not all the methods used in the experiment worked out as expected. While the ‘speak-out-loud’ step of the Action Cycle diagram is useful to get insights in what participants think when performing tasks, the other steps seemed often unclear to them. The results indicate that it may also be possible to elicit the mental model of the participants using the Teach-back protocol. It was useful for this test to see that all methods equally provided the participants with information, but the full potential of the protocol was not utilized. If this user experiment were to be repeated at a later stage, the advice would be to have at least three people to be present during the tests; someone to manage the software and hardware, someone to guide the participant through the test and someone to make detailed notes. For a more qualitative approach, the fourth step of the Action Cycle diagram (think-out-loud) could be considered for each task. For a more quantitative approach, the Action Cycle diagram could be removed from the test completely, as well as the Teach-back protocol. This would allow for more tasks to be performed, which results in more data to be analyzed. A more elaborate usability questionnaire could be considered, although one has to take into account that lengthy questionnaires might annoy participants. This is especially to be considered when questionnaires are combined with performing tasks; this could lead to participants not paying enough attention when answering the questions.
Acknowledgement SOFIA is funded by the European Artemis programme under the subprogramme SP3 Smart environments and scalable digital service. We would also like to thank the participants involved in the user experiment.
References 1. Brooke, J.: Sus-a quick and dirty usability scale. Usability Evaluation in Industry, pp. 189–194 (1996) 2. Dictionary: American heritage dictionary. The American Heritage Dictionary of the English Language (2000) 3. Feijs, L., Hu, J.: Component-wise mapping of media-needs to a distributed presentation environment. In: The 28th Annual International Computer Software and Applications Conference (COMPSAC 2004), pp. 250–257. IEEE Computer Society Press, Hong Kong (2004)
146
M. Kwak et al.
4. Fitzmaurice, G., Ishii, H., Buxton, W.: Bricks: laying the foundations for graspable user interfaces, pp. 442–449. ACM Press/Addison-Wesley Publishing Co. 5. Hu, J.: Design of a Distributed Architecture for Enriching Media Experience in Home Theaters. Phd thesis, Department of Industrial Design, Eindhoven University of Technology (2006) 6. Hu, J., Feijs, L.: An adaptive architecture for presenting interactive media onto distributed interfaces. In: Hamza, M. (ed.) The 21st IASTED International Conference on Applied Informatics (AI 2003), pp. 899–904. ACTA Press, Innsbruck (2003) 7. Janse, M., van der Stok, P., Hu, J.: Distributing multimedia elements to multiple networked devices. In: User Experience Design for Pervasive Computing, Pervasive 2005, Munich, Germany (2005), http://www.fluidum.org/events/experience05 8. van Krannenburg, R.: The internet of things: A critique of ambient technology and the all-seeing network of rfid. Institute of Network Cultures (2008) 9. Laurillard, D.: Rethinking university teaching: A framework for the effective use of educational technology. Routledge, New York (1993) 10. Merrill, D., Kalanithi, J., Maes, P.: Siftables: towards sensor network user interfaces. In: TEI 2007, pp. 75–78. ACM, New York (2007) 11. Niezen, G., van der Vlist, B.J.J., Hu, J., Feijs, L.M.G.: From events to goals: Supporting semantic interaction in smart environments. In: 2010 IEEE Symposium on Computers and Communications (ISCC), Riccione, Italy, pp. 1029–1034 (2010) 12. Norman, D.: The design of everyday things. Basic Books, New York (2002) 13. SOFIA: Project website, http://www.sofia-project.eu/ 14. Sterling, B.: Shaping things (2005) 15. van der Veer, G., Del Carmen Puerta Melguizo, M.: Mental models. In: The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies, and Emerging Applications, pp. 52–80 (2003) 16. van der Vlist, B., Niezen, G., Hu, J., Feijs, L.: Design semantics of connections in a smart home environment. In: Design and Semantics of Form and Movement (DeSForM 2010), Lucerne, Switzerland (2010) (accepted) 17. van der Vlist, B.J.J., Niezen, G., Hu, J., Feijs, L.M.G.: Semantic connections: Exploring and manipulating connections in smart spaces. In: 2010 IEEE Symposium on Computers and Communications (ISCC), Riccione, Italy, pp. 1–4 (2010) 18. Wensveen, S., Djajadiningrat, J., Overbeeke, C.: Interaction frogger: a design framework to couple action and function through feedback and feedforward. In: DIS 2004, pp. 177–184. ACM, New York (2004)
Research and Implementation of the Virtual Exhibit System of Historical Sites Base on Multi-touch Interactive Technology Yi Lin and Yue Liu School of Optics and Electronics, Beijing Institute of Technology 100081, Beijing, China
[email protected],
[email protected]
Abstract. With the help of multi-touch interactive technology, we build up a virtual exhibit system for Mogao Grottoes, Dunhuang in China to represent the carved murals, ancient scriptures and Dunhuang historical culture. We analyze the theory of multi-touch and evaluate the advantage of virtual reality interactive engine “Virtools”. A program is designed that can use the Multitouch and the Behavior Interaction Module of Virtools to achieve the integration of Multi-touch Interaction. The virtual objects are created and artistically processed in “3Ds MAX” software. After that, we comply interact function for them in Virtools. By recording the change of location to the touch points and gestures, the moving, scaling and rotating to manipulate the images by fingers are achieved. Furthermore, we simulate the field of human eyes to panoramic tour the Mogao Grottoes. The proposed system can bring the user into a virtual world that is not only lifelike but also fantastic. Keywords: Multi-touch, Virtools, Interactive Technology, Virtual Reality.
1 Introduction 1.1 Human-Computer Interaction and Multi-touch Human–computer interaction (HCI) is the study of interaction between people (users) and computers. Multi-Touch refers to the ability to simultaneously register three or more distinct positions of input touches [6]. There are many Multi-touch types like Infrared, Resistive, Capacitive, and Shape Recognition. This proposed system uses the Shape Recognition type. Through capturing the image’s change from the front side or back side of the contact surface by camera, the platform can recognize the anchor point from the image [4]. 1.2 Content and Significance of the Article 1) The background and significance of research Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 147–157, 2011. © Springer-Verlag Berlin Heidelberg 2011
148
Y. Lin and Y. Liu
People put a higher demand to the new display devices and natural human-computer interaction [7]. The fast development of Virtual reality and Multi-channel user interface which pursuit Multi-dimensional information space and Interactive approach based on natural interaction style will became the trend to the future HumanComputer interaction technology [12]. Use your fingers to operate directly on the screen in control of the process can be different from the mouse and keyboard. It can get more fun when we do some interactions. Through the combination of Virtools which Supporting operations to 3D object in scenes and Multi-touch controlling, Multi-touch interaction will have a better prospect. Since the influence caused by the inevitable Light irradiation, temperature and humidity during the opening time, Mural have constantly being weathered, oxidation and corrosion. This precious heritage is facing growing threat of extinction. Through a multi-touch way to achieve the virtual tour, we can either protect the heritage or to meet the needs of tourists to explore the historical sites. Besides, the researchers can investigate the heritage using virtual way. By updating latest study information, they can let the whole world understand and share DunHuang’s Historical miracle better. 2) Content of the study Using digital interactive and virtual reality technology, we demonstrate the Dunhuang style monuments, architectural, cultural situation in interesting way. Through the finger touch or gesture in a different way, viewer can see the pictures by scaling and rotating. They can read the ancient books or look around in 360 degree of the internal structure of Grottoes. Another important feature of the system is that we have solved the defect that is caused by the situation there is only allow one person to operate in each time while others have to wait when touring. Comparing to travel in real scene, there have more practical significance to visitors for us to construct a virtual touring system based on Multi-touch interaction.
2 Theory of Multi-touch 2.1 Implementation Principle of the Touch Table The system uses the "LLP (laser light plane) technology” to improved touch table. LLP works as follow: the LED emit infrared light. Lights cause total reflection in total reflection layer. It will undermine the region's total reflection conditions when a finger touches the surface of total reflection layer [3]. We can get the location information by IR image acquisition devices and tracking registration algorithm [1]. Touch technology is illustrated in schematic diagram below. First of all, install one line of infrared light-emitting diode to the vertical and horizontal side edges of desktop box. While place light detectors at the relative vertical and horizontal side edges. They are supposed to record those rays that have been interrupted when touching. Two interrupted cross beams can determine the location of horizontal and vertical coordinates on the selected screen. As for the dense rows of LED, it is possible to interrupt two horizontal rays and two vertical rays. We record two average positions between two interrupt rays.
Research and Implementation of the Virtual Exhibit System
149
Fig. 1. Implementation principle of touch system based on LLP
Machine vision based tracking technology of multiple points of light could get the touch location by the processing according to the Infrared image. [2] We can process events when finger press, drag and lift. The process is shown in Figure2.
Fig. 2. Algorithm processes of Multi-touch tracking recognition
2.2 High-Precision Automatic Calibration Technology 1) Camera Calibration We apply a wildly used Zhengyou Zhang’s checkerboard plane two-step calibration template for camera calibration. To shoot template image plane from different angles, we can get the relative relationship between the dot from temple plane and the dot which is projected on the image plane. Then according to the attribute of the rotation matrix, we have the constraint equations within camera’s intrinsic parameters. Each image obtains homograph, then get the inside and outside parameters of the camera [5]. 2) Screen calibration Screen calibration is a step to determine the coordinates on the camera image related to the feature points on the screen.[11] By mathematical models we can achieve 2D image registration from image points of camera to the projection screen. This highprecision auto-calibration is based on the OpenCV image processing. General process is as follows: automatic drawing background and the entire row (column) index point on the screen. Cameras capture the background image and logo point image alternatively and process. Extract the entire row (columns) of the signs point in the image position. Sequence corresponds to the point of output for each symbol.
150
Y. Lin and Y. Liu
2.3 Identification of Interactive Gestures There are four main interactive methods throughout the interactive process: Click, Drag, Zoom and Rotate [13]. The machine determines the exact location when the finger touches the display surface. Select the virtual scene objects your fingers pointed at, this object is used as operating targets [9]. It is considered as turn clicks if a finger press and raised in a short time. If the finger began to move after the press, the virtual objects will move in the same scale visible degree. If there are two or more fingers touch the same object at the same time then started finger tracking and analysis. It is considered to zoom in when touching is outward trend. It will define to zoom out when touching is inward trend. If the trend of finger movement is follow the counterclockwise to a point as the center. This operation is thought to be rotate left. In contrary, that would be rotate right.
Fig. 3. Hand interaction species
2.4 Information of Interactive Point Multi-point interactive system achieves through establishing a multidimensional array. Each row stores information of a point. Generally, System has 100 points at most as default. It supports screen operation with 100 points in another meaning. For single point, there is only two states press and bounce. While for multi-point, we need to add the state of moving.
3 Application of Virtools 3D Engine in Multi-touch Interactive System 3.1 Virtools 3D Engine Introduction Virtools Dev is integrated software developed by the French VIRTOOLS company. Using the way of dragging the Building Blocks (Behavior interaction module) into the proper Object or Character; determine the before or after processing order of BB in a flowchart way, it enable visualization of the interaction script design. Gradually it can be edited into a complete, interactive virtual world [8]. The software's features are as follows: It has many functions of virtual reality interaction. These functions for example are: movement, collisions, gravity, event trigger, description, CG rendering, movement preparation. Each function contains a number of related feature set.
Research and Implementation of the Virtual Exhibit System
151
Supporting many kinds of software and hardware, it does not require additional reprogramming when compatible with such software. Virtools has ready to plug the interface program. It can connect to database. Also support Multi-texture format and be able to CG rending. Open SDK available to modify. More data lib can directly be used for programming. Users will be able to modify the program to their own window interface. 3.2 Hardware and Software Interfaces and Applications Through the “MultiTouchX” interaction module, this information can be used after it reached the next step in Virtools. The module is not originally been built in Virtools. It is developed with the combination of using SDK in Virtools and system’s hardware. The return location coordinates of the point can be got when triggered in touch screen. The coordinate system we used is the same as in computer screen. Set the left top corner as the origin point, X-axis as horizontal direction, vertical direction defined by Y axis. The status of this point is given and classified. They are bounce, press, and move which is separately presented by 0, 1, and2. Leave the points different labels to distinguish. Finally we got a set of numbers as Figure4 shown below:
Fig. 4. The contents return by Interface
The content returned is an array with four columns. The values in the first two columns represent the X, Y coordinates of points; the third column indicates the status of the points; the fourth column is the serial number of the points. The values in four columns are initialized to -1 when there is no touch. The array has 100 rows as default. Theoretically it can record the state of 100 points in the screen which means the system support interaction of multiple points.
4 The Implementation of Virtual Exhibit System with Multi-touch Interaction 4.1 System Framework This proposed system is composed by human, hardware and software three parts. The optical sensor imaging devices accept the receipt of impact when users touch the projection surface. The results of their input changed. Then it processes the results in system. The framework is shown in Figure 5.
152
Y. Lin and Y. Liu
Fig. 5. System Framework Flow Chart
4.2 System Function Modules The proposed system mainly consists of three modules: pictures display, panoramic scenario simulation, and ancient scriptures reading. These three modules are presented by 2D user interface. User can enter the corresponding module by finger touch. The interface of interactive system must be designed around the main functions [10]. 1) Picture display Visitors are free to click on the rolling picture to appreciate. Images have different ways to enlarge, shrink or rotate through clicking. The system will automatically display the image text description after the picture is selected.
Fig. 6. Interface when initial and after click/move/rotate/scale events
2) Panoramic scenario simulation This part is the representation of Dunhuang Caves. Visitors can rotate 360 degrees to tour the panoramic scene by moving gesture of finger touch.
Research and Implementation of the Virtual Exhibit System
153
Fig. 7. Panoramic scenario presentation
3) Ancient scriptures reading Visitor can select different scriptures by clicking touch to see the text message in ancient scriptures. When the user touch the scroll on the shelves, the scripture will pop up hopping, it will stopped in the lower right corner of the screen, then slowly expand. The text will show following in the scroll. 4.3 3D Scene Monuments Creating 1) Scene modeling According to the real scene of the ancient Grottoes, three Buddha statues representing the past, present and future were placed in main hall. Simulate solar light with parallel light; simulate the ground reflected light and overlooking light with spotlights; we set the proper light attenuation value and add auxiliary lighting to dark areas.
Fig. 8. Model of two main scenes
154
Y. Lin and Y. Liu
Visitor use lab scene to read ancient scriptures. It is composed of 3 old bookshelves, a lamp, and a Dunhuang mural as background wall. Central lamp illuminates three shelves. Fan-shaped mural is behind the shelves. The rolling scrolls extent in the most front. The model of two main scenes is show in Figure 8. 2) Scene model texturing Mogao Grottoes is known to the world in name of a variety of beautiful paintings. Mural paintings inside is basically filled from the walls to the ceiling, rather than intermittently arranged disorderly. We need to grasp the overall to maintain the continuity of color during the texturing time. It looks discontinuity when we posted different maps onto the same side of wall. Import these inconsistent maps into Photoshop for color matching. Through adjusting color balance, brightness/contrast and color hue, we receive the fine effect. Combining image processing function in Photoshop with 3Ds MAX, we achieved better results. Ensure the integral color tone on the same face of grotto. Each wall murals not seem abrupt. It shows the real charm of the Dunhuang murals. The use of bump mapping increases the sense of historical vicissitudes for texture.
Fig. 9. Effects of Model Texture
5 The Technical Complement of System Function Module 5.1 Picture Display Module This module is an important module of the system; it is the main part to achieve multi-point interaction, it supports simultaneously tour by multiple users to the Dunhuang murals. Following steps introduce the program flow details and technology achievement. It requires recording the information of last frame during the changing process. It includes the touch point of last frame, the position of operation object, etc. Therefore, we need to back up multi-dimensional array. Re-establish an array of the same type of elements to the array we have at first. Use it to store last frame information. We named the original array “DataArray”, backup array named “MemoryArray”. We also need to define two objects named “Now” and “last” with the search type. They are used for
Research and Implementation of the Virtual Exhibit System
155
storing the correspondence of captured objects between the previous frame and current frame. Object “picked” with info type is used for capturing the information of object. Virtools script design flow chart is as follows, “MultiTouchX” is the hardware and software interface, “determine” is the judgment cell choosing to execute into different scenarios. “Scene” is the cell to enter different scenarios through different button clicking. “Reaction” is the main multi-point interactive VSL. Previous “index = i “and “Copy Value” is the operation to maintain the consistency between the value of the interface and the DataArray parameters of “Reaction” cell. Following “display of text description” and “Reset” is the supplementary achievement to the picture display function.
Fig. 10. Script flow chart of Multi-touch interact part
5.2 Panoramic Scenario Simulation Panoramic scenario simulation module imitates the perspective view of human observation. Through the finger paddling to the left, right, up and down direction, it can examine in these 4 directions. This part mainly uses the operation to the camera. Set up a camera to simulate the direction of visitor's attention. According to the changes of finger paddling, let the camera to rotate in different directions. Specific details to the technical implementation are the same as the method used in Picture display module. 5.3 Ancient Scriptures Reading 1) Use the extent animation Virtools support to add multiple pictures to one texture. Set to display different content to the texture. We produce the scroll’s extent animation in 3Ds Max. Then use “Movie Player” building block in Virtools to achieve the frame animation. 2) Dynamic loading image sequence Following are Ideas to achieve: at first, input parameters have the String of initial path. This path is relative paths which contain the location and name of fold the first picture placed in. And then we call the string operation function “int Find(char iCar, int iStart=0)” in the process. By these steps we have completed the loading of dynamic sequence image.
156
Y. Lin and Y. Liu
3) Ancient Chinese characters’ display and tracking display Virtools only can use interaction module of “3d frame” or “Text Player” to shows the Chinese character. We can deal the problem by turning the text into transparent map. By the setting of UV parameters in 3dMax, we can control the proportion of map which will be displayed. Virtools can only make UV parameters setting to object of grid type. Therefore, we can give the text map to a grid object. Then program the function to control the changes of UV parameter. Finally, we achieve the effect text presented line by line.
6 Conclusions and Prospect In the system proposed in this paper, a touch table is used as hardware carrier. Combined with multi-touch technology and behavior interactive software Virtools, compared to previously limited form of video display of monuments appreciation, we show the visitor a brand new and interesting way of virtual exhibit of place of interest. Through a natural interact pattern and friendly interface operation, a user can understand the historical monuments of Dunhuang in many ways. By touching the screen, a visitor can enjoy the different attractions of valuable historical content. Even the scene or treasure which no longer existed in reality could be seen again. Through compensating the weakness that only a single user can operate at a time, we realize the humanity operation that multi-users can manipulate simultaneously. The proposed system is edutainment and the Integration of cognitive science, ergonomics and psychology. If this system is placed in the exhibition hall involving Dunhuang culture, there will be a totally fresh type of education to be presented to the user. The proposed system would have a good prospect in the field of virtual museum exhibition in near future. Acknowledgments. This work is supported by the National High Technology Research and Development Program of China (863 Program). Grant No.: 2008AA01Z303, 2009AA01Z337 and the Innovation Team Development Program of the Chinese Ministry of Education, Grant No.: IRT0606.
References 1. Chen, D.: Large-size Multi-touch Technology Based on Optical Sensing. Electronic Engineering & Product World 9 (2009) 2. Zhang, D.-m., Zhang, G.-f., Dai, S.: Research on Vision-Based Multi-user Gesture Recognition. Human-Computer Interaction, Journal of System Simulation S1 (2008) 3. Lu, R.-x., Zhou, C.-J., Li, J.-m.: An infrared touch screen and its multi-touch detecting method. China, patent application No. 200710031082.6 (March 26, 2008) 4. Microsoft Corp. Microsoft surface [EB/OL] (February 5, 2005) 5. Fu, Y., Zhang, F., Dai, G.: Two-Handed User Interface: Review and Research. Journal of Computer Research and Development 4 (2005) 6. Hollands, R.: The Virtual Reality Home—brewer’s Handbook. John Wiley and Sons, NY (1996)
Research and Implementation of the Virtual Exhibit System
157
7. Zhen, Z.: Touchscreen and Multi-point Touch Technology. Electronic Engineering & Product World 4 (2008) 8. Wang, D.-x., Zhang, M.-j., Xiong, Z.-h.: Survey on multi-touch research. Application Research of Computers 7 (2009) 9. Ren, H.B., Zhu, Y.X., Xu, G.Y., Lin, X.Y., Zhang, X.P.: Spatio-Temporal appearance modeling and recognition of continuous dynamic hand gestures. Chinese Journal of Computers 23(8), 824–828 (2000) 10. Xu, J.: Research of the Key Technology of Virtual Environment Generation Based on Images, PhD thesis, Nanjing University of Science (2008) 11. Trumello, E., Santangelo, A., Gentile, A., Gaglio, S.: A Multimodal Guide for Virtual 3D Models of Cultural Heritage Artifacts. In: International Conference on Complex, Intelligent and Software Intensive Systems (2008) 12. Dong, S.: Progress and Challenge of Human-Computer Interaction. Journal of Computer Aided Design & Computer Graphics 1 (2004) 13. Elias, J., Westerman, W.C.: Multi-touch gesture directory. United States, 20070177803 (2007)
A Highly Automated Method for Facial Expression Synthesis Nikolaos Ersotelos and Feng Dong Department of Information Systems & Computing, Brunel University, UB8 3PH
[email protected]
Abstract. This paper proposes a highly automatic approach for a realistic facial expression synthesis that allows for enhanced performance in speed and quality, while minimizing user interferences. It will present a highly technical and automated method for facial feature detection, by allowing users to perform their desired facial expression synthesis with very limited labour input. Moreover, it will present a novel approach for normalizing the illumination settings values between the source and the target images, thereby allowing the algorithm to work accurately, even in different lighting conditions. We will present the results obtained from the proposed techniques, together with our conclusions, at the end of the paper. Keywords: Computer Graphics, Image Processing, Facial Expression Synthesis.
1 Introduction The synthesis of realistic facial expressions has been an unexplored area for computer graphics scientists. In the last three decades several approaches have offered different construction methods in order to obtain natural graphic results. However, despite this progress, existing techniques require costly resources and heavy user intervention and training. Also, outcomes are not yet completely realistic. This paper, therefore, aims to achieve an automated synthesis of realistic facial expressions at low cost. According to facial painting canons, the difficulty of creating a realistic expression lies in the illumination settings that give fine details, such as creases and wrinkles; but facial expression synthesis, based only on a geometric deformation process, lacks such fine details. By using an Expression Ratio Image (ERI) to capture the difference in the illumination settings during the expression, and then transferring it to a target image, Liu et al, [9] presented an algorithm for capturing and transferring a real facial expression into a geometrically deformed facial expression. However, in order to obtain the required accuracy, significant user training and processing time is required. Also, the algorithm only works by assuming that the illumination change only happens when the facial expression changes and the source and target images are subject to similar lighting conditions. If one of the above conditions cannot be held, the ERI approach is unable to achieve high quality. This paper will present a novel technique of highly automated facial expression synthesis that requires very limited user interaction and that has provided accurate results under different lighting conditions along the following lines: Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 158–176, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Highly Automated Method for Facial Expression Synthesis
•
•
•
•
•
159
Firstly, we have established a facial expressions database library from which the user may choose an expression that can be automatically transferred onto the imported target picture. The importance of having such a library is to eliminate the processing time needed to manually, or even automatically, place coordinate dots around the facial characteristics, since each picture in the library will be stored with dots already in place. Secondly, instead of defining all the facial features – ears, hair, neck, eyes, eyebrows, nose, mouth, cheeks, etc – the system will extract from the images the eyes, eyebrows and mouth areas and, after the deformation process has taken place, will replace them in their original positions. This has the advantage of reducing the processing time by simplifying the geometrical deformation, because it affects small areas of the image, thereby avoiding possible distortions. Thirdly, we will present a highly automated algorithm for the detection and definition of specified facial features that will affect the target image minimally should it be interact with, if it’s essential by a user. Our system is able to calculate the size, shape and position of the facial features in relation to those edges, by surrounding them with dots used for the geometrical deformation and colour ratio processes. Fifthly, for the algorithm to work accurately, even in different lighting conditions, we provide a novel approach for normalizing the illumination settings values between the source and the target images. A threshold corresponding to ERI is suggested for transferring a specific amount of illumination setting data to the target image. Finally, in cases of geometrical, or illumination, distortion, we will offer solutions that will require simple user interaction.
The remainder of this paper is organized as follows: Section Two will consist of a survey of some of the most important approaches in facial modelling and animation. Section Three will be an analysis of the general methodology. Section Four will contain the novel algorithm for an automatic and accurate depiction of a 2D facial picture. Section Five will present the techniques of transferring facial details for expression synthesis, and Section Six will show some experimental results. We will conclude our paper by discussing the advantages and limitations of the technique, and we will offer some proposals for the direction of future research.
2 Previous Work Early work on computer facial modelling and animation dates back to the 1970s, when the first 3D facial animation was created by Parke [10]. This was followed by Badler et al, [1] who introduced a 3D face based on a mesh of triangles, which, when morphed, could carry new expressions. The disadvantage, however, was that the user had to be very accurate, since large deformations could completely change the shape of the face. Later, Waters [12] divided the mesh into facial areas, which could be changed individually, thereby producing new expressions without interfering with other parts of the face.
160
N. Ersotelos and F. Dong
Since the 1980s several other approaches have produced accurate 3D and 2D facial models and expressions based on facial images, anthropometric libraries and even video data. As mentioned above, the basic method used for constructing a 3D head is a triangular mesh containing dots connected by common edges, while another innovative method – a laser scanner called Cyberware [3] used by Lee et al [9] to accurately scan a 3D cylindrical object – produces a 3D model with dynamic skin consisting six layers of triangular meshes, each forming a layer of pseudo muscle, which allows for the synthesis of new facial expressions by pushing, or pulling, or by moving the elastic angles of the triangles. The disadvantage of this process, however, is that the user has to be specially trained because of the complex manner in which the meshes are inter-connected. DeCarlo et al [4] generated a static geometrical 3D facial surface variational model by synthesising the fundamental elements of anthropometric measurements held in statistical data libraries and ordered according to race, gender, age and specific characteristics. Yet another approach for modelling a 3D head, also based on anthropometric measurements, has been presented by Kahler et al [7] who defined its features, position and shape by placing several landmark dots capable of being moved by the application of anthropometric data, which they used to calculate and synthesize the growth of the face. Blanz et al [2] also used manually placed landmarks to describe the facial features of a 2D facial image. This system, which requires a set of 3D models, automatically separates the 2D face from the image – excluding the hair and neck – and fits it to a 3D morphable model. By optimizing all the parameters – such as 3D orientation, position, camera focal length and the direction and intensity of illumination – new facial expressions can be produced, which are then pasted back onto the 2D image. Zhang et al [13] automatically synthesised a new expression manually by introducing system geometrical points that delineate features on a 2D image by dividing the face into fourteen sub-regions, which are necessary for the synthesis of new expressions. This system infers the feature points of the expression, derived from a subset of the tracked points, by using an example facial expression based approach whereby new expressions are generated by geometrical deformation. Another approach for creating realistic facial expressions was presented by Sifakis et al, [11] who used a 3D head consisting of thirty thousand surface triangles. This analytical model of a head, which consists of eight-hundred and fifty thousand threshholds and thirty-two muscles, is controlled by muscle activations and the degrees of freedom allowed by the kinematic bones. The model is marked with coloured landmarks, each identifying different muscles, which may be activated to generate new expressions, or, even, animations. Expression mapping [9] is a technique for transferring facial expression from one image to another. Generally this approach is based on geometric deformation and the transfer of illumination settings. The materials necessary are two pictures of the same person – the first, neutral, and the second with an expression – and one picture from another person with a neutral expression. The characteristics, such as eyes, mouth, nose, eyebrows, hair, ears and shape of head, are identified manually by dots and the landmarks around the characteristics are connected by triangles. By calculating the points position difference from the source images, the target image is geometrically
A Highly Automated Method for Facial Expression Synthesis
161
deformed. An image warping process then equalizes the source image according to the deformed target image. The two source images are then divided to give the Expression Ratio Image (ERI), which is the capture process of the illumination changes – creases and wrinkles – between the two images. The ERI is then multiplied pixel by pixel with the target deformed image in order to give the appropriate illumination settings for the new expression. This method works on the assumption that the illumination change happens when the facial expression changes and the source-target images has similar lighting conditions. The landmarks have to be placed manually on all images simultaneously in order to describe all facial features; therefore, the user has to be very accurate so as to avoid distortion. Approximately three hundred dots are needed for every facial expression and several filters are necessary in order to reduce the hard colour values and the noise from the resulting target picture. An algorithm synthesising facial expression based on music interaction was presented by Chuan-Kai Yang et al [14]. As in this paper the system requires one facial picture to be imported as a target image; it then necessitates user interaction to identify the facial features of the image. This can synthesize newly animated facial expressions when it is used continuously with a morphing process and the introduction of Midi (Musical Instrument Digital Interface) music files. Tommer Leyvand et al [15] created an algorithm that was intended to enhance the aesthetic appeal of a human faces in frontal photographs. The key component of this beautification engine was the use of datasets of male and female faces selected following a process of attractiveness ratings, collected from human ratters. This semiautomatic process requires a frontal image as an input whereby the user identifies the facial features by using landmarks. Secondly, with the use of a planar graph and landmarks coordinates, the most proper position and shape of the facial features are redefined. Then, by using a warping process, the system deforms the existing facial features as closely as possible to the proposed coordinates obtained from the beautification data engine. Bernd Bickel et al [16] presented a new methodology for establishing real-time animation of facial expressions based on a multi-scale decomposition of the facial geometry into large scale motion and fine-scale details. This algorithm requires a linear deformation model whereby the facial features are inscribed within a mesh, based on triangles, connected to approximately forty landmarks, which are manually placed on the target model and on the source image. Another approach designed for the analysis and synthesis of 3D wrinkles and poses was introduced by Golovinskiy et al [17]. Using a 3D scanner and a custom-built face scanning dome, they produced highly detailed 3D face geometry; thereafter they subdivided the surface to separate the affected area from the rest of the facial mesh in order to add highly detailed elements, such as wrinkles and poses. The facial lighting details in this system derive from a database containing a cross-section of facial characteristics categorized by age and gender.
3 The Automatic Facial Expression Synthesis Process This approach is based on two main elements: “geometrical deformation” and “illumination settings transfer” (Fig.1). In geometrical deformation, the system synthesizes the
162
N. Ersotelos and F. Dong
new expression by warping specific facial features, and, in illumination transfer, it gives realistic lighting settings based on the real lighting settings of the source image. The main requirement for facial expression synthesis is the importation of a neutral face onto the system. It is important that a complete facial picture is fitted in the frame, in order provide for an accurate definition of the mouth, eyes and eyebrows. After the facial expression has been selected from the library, the geometrical deformation process starts with the target and source images being equalized in size; then the target image is re-scaled back to its original size. When the geometrical deformation process begins, the newly scaled target image is transformed into a greyscale format (each pixel carries only intensity information). This is necessary to enable the later edge-detection process, when the system will only have to decide between the different values of black and white pixels and not between thousand of different colours and lighting conditions. The system then extracts two specific areas from the target image – the mouth and the eyes-eyebrows. The sizes of the rectangles to be
Fig. 1. Process Diagram
A Highly Automated Method for Facial Expression Synthesis
163
used are already defined in the documentation of the source images. After the automatic detection of the specified facial characteristics, dots are placed around them in order to define their size and positions; the coordinates of the dots for the same facial characteristics are then loaded from the source image. Finally, a geometrical deformation is used to calculate the difference between the dots coordinates of the source image and add that difference to the dots coordinates of the target image. This results in the creation of a new expression, without, however, the application of proper lighting conditions. At this point the “illumination settings transfer” process starts; by using the geometrical deformation process again, the system will equalize the source images with the new synthesized facial expression of the target image. After the equalization process of the source images with the target image is complete, the division of the source images will extract only the illumination settings, which afterwards will be added by a multiplication process, pixel by pixel, to the target image. 3.1 Splitting the Face into Areas A facial expression derives from the activation of one or more facial muscles, which are movements that indicate the emotional state of the individual to observer. Those muscles have been divided into two areas for the purposes of this paper. The first area contains those elements that must be geometrically deformed in order to produce a new expression. The second area contains those elements – wrinkles and creases – that, under proper lighting conditions, can upgrade a geometrically deformed expression into a realistic one. The only two areas of muscles that require delineating with dots and triangles are the mouth and the eyes-eyebrows areas; it is not necessary to identify any other facial features in this way. This system, therefore, requires fewer dots and triangles, and this also implies the elimination of distortion during the geometrical deformation of the target image. These areas are now extracted from all the images and pasted on layers, which are rectangles covering, in the case of the mouth, the facial area that is vertically defined by the nose and the chin, and horizontally defined by the edge of the face.
First Step
Second Step
Fig. 2. First Step: the mouth area has been extracted from the source images and the target image and placed on the rectangle. According to the source images dots positions difference, the target image mouth is geometrically deformed. Second Step: the target image deformed result has been copied and pasted back on top of the original target image.
164
N. Ersotelos and F. Dong
The size of the rectangle has been originally defined and stored in the library and it loads when an expression is being selected. The rectangle is big enough to include any size of the target image’s mouth (Fig.2). The same approach applies to the eye and eyebrows layer. 3.2 The Facial Expression Database For a faster process, and for user convenience, a database library has been created that contains several pairs of source images with their points position coordinates documented. Each pair corresponds to a specific facial expression and consist one picture with a neutral expression and one with a specific expression. More specifically, three types of data sets have been created for each facial expression, which correspond to all the data point coordinates for the mouth and eyes-eyebrows areas, and also a set that contains both of the above-mentioned areas, together for the wrinkles process. This library also contains sets of seven source images for video animation projects, also together with their documentations. More analytically, it contains groups of seven images that have been taken for animation purposes. The first image has a neutral expression; from the second until the last image the group contains pictures of the same person with gradually changing expressions. The library is created to eliminate the need to place dots on the source images manually, or even to use the automatic detection process of the facial characteristics to determine them. All six primary expressions to communicate emotions that Ekmаn et al [5] identified – anger, disgust, fear, happiness, sadness, and surprise – can be found in the library facial expression database, together with an option of more expressions that may be added by the user..
4 Automatic Detection Process Instead of placing the dots manually around the mouth, eyes and eyebrows in the target image, this paper will present an innovative approach for the detection of the facial characteristics and the automatic placement of dots around them. This system transforms the colour image to a greyscale image, then, by using an edge detection process, it will identify, by using dots, the eyes, eyebrows, mouth shapes. Finally, the position and coordinates of the dots will be copied and placed on the original colour target image to activate the geometrical deformation and lighting transfer processes. 4.1 Edge Detection An edge detection process is used in order to eliminate the skin-color details and to reveal the required facial characteristics. The edges in an image contain pixel areas with strong values of contrasts and an edge detection process reduces the amount of data by filtering out useless information, while preserving its important structural properties. Green [6] has presented an algorithm for an automatic edge detection process. His Sobel Edge Detector uses masks to move over the image, manipulating a square of
A Highly Automated Method for Facial Expression Synthesis
165
pixels at a time and, for non-facial pictures, his approach is fast and accurate. We have also considered increasing the sizes of the masks in order to get rid of noise. In the 3x3 mask in example A (Fig.3), the fourth pixel is black, whereas the second one is white, therefore the system will identify the observed pixel as an edge. Consequently we have increased the size of the mask to 5x5 pixels. In example B, the mask size is bigger, so the system will recognize that the black pixel is noise and it will therefore retain the originally observed pixel color value. The size of the mask, we found, is dependant on the size of the picture. For instance, if the picture in use is 150x150px it would be appropriate for the mask to be 3x3px. If the mask is bigger it avoids important pixels that describe the facial characteristics, and, if it is smaller, it defines the noise pixels as edges; therefore, we found that pictures measuring 500x500px, which required a 5x5 mask size, were the most suitable.
Fig. 3. Accuracy between mask A and mask B
4.2 Noise Reduction After the rectangle shape is defined, the process of noise reduction begins. A new mask is used to identify the “almost white” pixels. A colour threshold defines how much none- white colour is allowed to be inside the mask rectangle for an area to be considered as “almost white”. If the area covered by the mask is characterised as “almost white”, then the pixel in the centre of the mask will be changed to 255 (255 is 100% white) (Fig.4). This process is important for removing those pixels that do not identify the shape and position of the facial characteristics and are thus just noise. More specifically, a square of the same size with the mask rectangle is defined around the observation pixel, so that the observation pixel is an equal distance from each of the square’s angles. In order to decide whether the observation pixel is noise, or whether it belongs to the contours of the facial characteristics, the system calculates the whiteness value ratio, which is defined by calculating the average normalized colour value of the mask pixels. Each pixel’s normalized colour value is described as MPCol/255.
166
N. Ersotelos and F. Dong
Fig. 4. The green colored square is the noise reduction mask. The red dot is the observation pixel.
Therefore, if the average normalized colour value of the mask pixels is 1, then all the mask pixels are totally white. If it is 0, then the pixels are totally black. The threshold has been defined as 0.7, therefore the system considers the observation point as part of the facial characteristics when the whiteness color value is less than 0.7. 4.3 Dots Connection (Mesh) – Geometrical Deformation The geometrical deformation process does not require any user interaction, because there is no need to synthesize a facial expression from scratch; all that needs to be done is to transfer an original expression from source images onto the target image. In a similar way to Liu’s [9] process, the algorithm captures the facial features movement from source image 1 to source image 2 and the movement is continuously added onto the target image. This approach explains the necessity for both source image 1 and the target image to have neutral expressions. The detected dots of the target image’s facial features as well as the source images’ defined dots (from the library) are connected creating a mesh consisted by triangles. According to the first dots’ position of the source image 1 and the correspondent dots position of the source image 2 the algorithm calculates the difference and adds that difference to the target image 1 correspondent dots with the use of a geometrical deformation process. 4.4 Geometrical Distortion Elimination Process If geometrical distortion of the target image occurs, it generally affects the face shape; for instance, in the case of mouth deformation, the chin. In order to avoid this, the user can copy the correctly deformed area and place it on top of the neutral expression
A Highly Automated Method for Facial Expression Synthesis
167
target image. More specifically, if the person in the source image has a large mouth, then the rectangle for that area (chapter 3.1) must be big enough to cover it. A larger rectangle may also cover the chin, or parts of the facial limits, of the target image. Unfortunately, this might produce a distortion on the target image following the geometrical deformation process. To eliminate this particular distortion, the user may place as many dots as are preferred on the target image, avoiding the distorted parts, in order to specify the correctly deformed area. Afterwards, a new window will appear showing the target image without any deformation. The copied area will then be placed on top of the corresponding place on the new window; this does not necessarily have to be a rectangle – since the user defines its shape by placing the dots.
5 Wrinkles Transfer Approach In order to calculate the wrinkles of a facial expression and transfer them onto the target image, Liu et al [9] presented a method that has been analyzed and improved on in this paper in order to obtain better results under numerous illumination settings. In Liu’s algorithm, the system aligns source images with the target image through a warping process. It then divides the warped source images in order to calculate the ratio image and multiplies the result with the target deformed image to transfer all the illumination setting from the source images to the target geometrically deformed image. The disadvantages of such a process are the amount of wrinkles data, or the quality of those illumination settings that will be transferred on to the target image. If the source images are of poor quality, they will directly inflect the target image, produce lighting distortions, create hard colours on the wrinkles areas, or simply look artificial. 5.1 Source Images Equalization with the Target Image To eliminate the above disadvantages, Liu’s process has been changed and split into four steps. The first step is the equalization of the source images with the target image. Both of the source images must be changed according to the new facial expression of the target image through a geometrical deformation process in order for both of the images’ facial characteristics to match perfectly with the facial characteristics of the target image. Where there is a difference, a distortion appears on the lighting settings in the final result. As stated above, it is not important to deform facial characteristics that are not involved in the facial expression. Therefore the hair, neck, ears or shoulders are avoided in the geometrical deformation process. However, even if the rectangle is bigger than the face of the target image, the opposite solution will be used to eliminate any distortion. The size and position of the rectangle is stored with the source images. The area it covers are the mouth, eyes and eyebrows, as in Figure 9. The dots that define the facial characteristics of the source images are stored and they are loaded automatically when the process begins. The dots that describe the facial characteristics on the target image are copied from the previous process (section 3) and are pasted in the new rectangle. Afterwards, the geometrical process begins and is repeated automatically several times until the dot positions on the source images are aligned with the dot positions on the target image.
168
N. Ersotelos and F. Dong
5.2 Colour Normalization According to Liu’s process [9] it is important that the source images are of a similar quality to the target image. Otherwise the differences will cause distortion on the final result. If, for instance, the faces of the source and target images have different skin colours, then that difference will also be transferred through the illumination settings. Moreover, if the source images are black and white, the distortion will be greater. All the lighting settings will be mixed with a grey shadow because the illumination settings are part of the facial skin. It must be assumed, therefore, that the picture quality, or the colour of the facial skin, matters.
A
A’
Bd
ABd
A’Bd
Bd’
Fig. 5. Skin colours and illumination setting are combined before the final result is achieved (The source images A, A’ are from the Liu et al [9])
For that purpose, instead of dividing the source images, one by the other, each of them is first combined, pixel by pixel, with the target image, not only to keep the wrinkle values of the source images, but also to combine, or normalize, them with the illumination and skin colour settings of the target image. The results are two new source images – ABd and A’Bd, where Bd is the target deformed image (Fig.5). However in order not to mislay some percentage of the source images illumination settings, the first pair, ABd uses 50% of its images and the second pair that has the desired expression uses 70% from the source image and 30% from the target image – A’Bd. This different percentage will collect all the source image illumination details together with the skin colour settings of the target image. 5.3 Ration Image Threshold Following the synthesis of the two new images, the system divides them one with the other (ABd/A’Bd) in order to extract their illumination settings, which are described as a ‘ratio image’. A threshold has been invented in order for the user to transfer a specific amount of data from the ratio image for a realistic final result on the target image. More analytically, with the use of that threshold, the user has the option to transfer a specific percentage of data. The ratio image is like a transparent file that is added on the target image through a multiplication process – Ratio Image x Bd = Bd’. This threshold helps avoid details that produce lighting distortion on the target image, such as beauty spots, scars, etc derived from the source images. If the rectangle that contains the facial features for the ‘illumination transfer approach’ is bigger than the area of the eyebrows, eyes, nose mouth area, and it contains hair, or ears, or scars, this will produce a distortion because all the head characteristics
A Highly Automated Method for Facial Expression Synthesis
169
Fig. 6. a) B is the original dot placed by the user. B’ is the internal dot of the circle that describes the shortest distance between A and C. b) The blue color diagram describes the orthogonal vectors that are used to gradually change the color of the pixels area.
of the source images have not been equalized with the correspondent characteristics of the target image; therefore, it will transfer the distorted illumination settings to the final result. To correct this, simple user interference is provided. The user can identify the non-distorted area by simply placing any number of dots around the area that contains the proper lighting settings. Afterwards the system will automatically extract that area and paste it back to the original target image, thereby avoiding the distorted parts. When the user places the dots on the face, a small circle of 20 pixels diameter is drawn with the specified dot at its center. According to the neighbors – dots A and C – the system will calculate a new dot, B’, which describes the shortest path from A to C via the circle. B’ will be defined as the inner dot (Fig.6 a). This process will be repeated for all the dots placed by the user. The purpose of the internal dots is to achieve a gradual change of color, which takes place in an (approximately) 10 pixels area between the outer and the inner dots, from the color of the original target image to the color of the new synthesized area. Therefore, two borders, ten pixels apart, are created around the extracted facial area. The outer border consists of the line achieved by connecting the original dots, placed by the user, and the inner border, by connecting the new internal dots created in the above-mentioned procedure. Orthogonal vectors are used to move from one pixel to another, inwardly, from the outer border lines, in order to gradually change the pixels in the area (Fig.6 b).
6 Results The new approach has been applied to detect automatically the facial characteristics of the target image, separate them into two areas, deform them and then transfer the appropriate illumination data from the source images to the final image. The results will be presented in this section. The source images have been chosen because their facial expressions and illumination settings vary; these are presented in Figure 7. The images are grouped in pairs of neutral and non-neutral facial expressions.
170
N. Ersotelos and F. Dong
1st Example (Neutral and Smiling expression source images)
2nd Example (Neutral and Sad expression source images)
3rd Example (Neutral and Surprised expression source images) Fig. 7. Examples of source images from the library database that are used in order to synthesize new facial expressions
A Highly Automated Method for Facial Expression Synthesis
171
Figures 8 and 9 show the target images and the resultant images after the deformation process and the illumination setting transfer, having used as source images the pair at Fig. 7, 1st example. It can be observed that the wrinkles around the mouth contribute to a realistic result, providing a good level of physical detail. Deformation has been applied only in the areas of the mouth and eyes. As can been seen, the logic of splitting a facial image into areas does not deteriorate the naturalness of the facial expression. Even though no deformation was applied to the area of the nose by the user, this is deformed according to the geometrical deformation of the mouth. Moreover, the illumination settings threshold enables the transfer of a suitable proportion of illumination data that does not produce any distortion, but only physical results.
(a)
(b)
(c)
Fig. 8. Based on source images – Figure 9, 1st example, (a) target image with a neutral expression, (b) after the geometrical deformation (c) the deformed image with a smiling expression and the corresponding wrinkles
(a)
(b)
(a)
(b)
Fig. 9. More results based on Figure 9, 1st example, (a) the target image with a neutral expression, (b) the deformed image with the smiling expression and the corresponding wrinkles
172
N. Ersotelos and F. Dong
Figures 10 and 11 present the target neutral and the deformed expression of the process using as source images the pair in Figure 7, 2nd example. The wrinkles of the facial expression of the source image have been transferred to the deformed image, by capturing the illumination settings. The difficulty in this example is the resultant wrinkles in the forehead. However the result illustrates that highly detailed graphics can be achieved, even though the face has been split into areas.
(a)
(b)
(c)
(d)
Fig. 10. (a) Target image with a neutral expression, (b) automatic edge detection, (c) after the geometrical deformation (d) the deformed image with a sad expression and the corresponding wrinkles between the eyes and the mouth
(a)
(b)
(c)
Fig. 11. More results based on Figure 9, 1st example: (a) the target image with a neutral expression, (b) after the geometrical deformation (c) the deformed image with a sad expression and the corresponding wrinkles
In Figure 12, the target image has been deformed according to the source image’s surprised expression (Figure 7, 3rd example). An interesting feature in this figure is the raising eyebrow (surprised) expression and the resultant wrinkles in the forehead. It is very important to note that the source image’s model must have a similar age to the target image model. If the target image describes a young person, and the source
A Highly Automated Method for Facial Expression Synthesis
173
image an old one, then the algorithm will transfer amount of wrinkles that describe the older age facial characteristics, thereby creating an unwanted distortion on the final target image result.
(a)
(b)
(c)
Fig. 12. results based on Figure 9, 3rd example: (a) the target image with a neutral expression, (b) after the geometrical deformation (c) the deformed image with a surprised expression and the corresponding wrinkles
According to Liu et al [9] approach, the executable time for a facial expression synthesis is approximately thirty minutes. More analytically, the user has to manually place dots that will cover all the facial features on the source and target images. Moreover, the user must be very accurate. If a dot is placed incorrectly, then distortion will certainly occur during the geometrical deformation. According to our approach, the process is primarily automatic. Only in the case of geometrical or illumination distortion does the user interfere by a limited process of interaction. More specifically, the geometrical deformation is based on two specific facial areas. Therefore there is no need to place dots around all of the facial features, such as the hair, ears, nose, etc. Moreover, by using an expression library database, the system can automatically load all the dot coordinates of the source images. The number of dots required is therefore reduced from three hundred (Liu’s approach) to sixty-nine and the ‘executable’ time required has been reduced from thirty minutes to less than two minutes – depending on the power system. The target neutral expression images in Figures 8, 9, 11, 12, 13 belong to FEI Face Database [20]. 6.1 Source Images Equalization with the Target Image Another function that the system can provide is the synthesis of six facial expressions simultaneously for video editing animation purposes. As with the individual facial expression synthesis process, the system contains a set of source animated pictures with the relevant documentations stored in the library. When the process is selected from the user the automatic edge detection of the target image’s facial features begins.
174
N. Ersotelos and F. Dong
Where the user interferes by manually correcting the detected dots, the system stores the dots’ position in a temporary folder in order for the dots to be re-used in the same positions in order to obtain the remainder of the images. All the synthesized target images are being produced, taking as inputs the neutral expression of both the source and the target images along with the subsequent source image expression (Fig. 13). After having synthesized the sequence of the target facial animation images, a video editing software package can be used to render the images in a video format. By using a “fade in, fade out” effect, very realistic facial video animation can be achieved.
(a)
(b) Fig. 13. (a) source images, (b) the geometrically deformed target images with the correspondent facial expression and the illumination settings
7 Conclusion – Future Work This paper presents a novel technique to produce natural looking facial expressions in a simple, accurate and highly automated way. It features a highly automated method for facial feature detection, as well as a wrinkle transfer method that allows for the synthetic wrinkle under different illumination settings between the target and source images. The process has been accelerated by separating parts of the face and extracting two facial areas for the expression synthesis. Moreover, a “facial expressions” library is integrated into the process. Other improvements are the significant reduction in the numbers of dots required to build the whole process and a minimization of necessary interference from the user by deploying an edge detection process for identifying the facial features. Future work could include the creation of a 3D model that will be generated from the synthesized 2D facial expression. For this purpose, a library of 200 3D heads [18, 19] based on different anthropometric measurements could be used, the heads being categorized by race, age, gender and the sizes of their facial characteristics.
A Highly Automated Method for Facial Expression Synthesis
175
The final deformed image from such a system would contain information about the position and shape of facial characteristics, defined by landmarks and triangles, together with data about desired illumination settings. This would allow the system to search the library by utilizing an efficient algorithm in order to identify the 3D head which most accurately matched the target face; the image could then be adjusted accordingly on the 3D model. The same geometrical deformation that had been used on the 2D images would also have to be incorporated on the 3D model in order for the new expression to be fitted on the 3D head without distortion. The advantage to such a process would be that it would enable the user to take images of the face from different angles. The results presented in this paper have been accurately created in highly graphical details by using a personal computer. Potentially, applications could be used by computer game designers, or movie animators, in order to allow them to generate various facial animations quickly for their characters.
References 1. Badler, N., Platt, S.: Animating facial expressions. ACM SIGGRAPH Comput. Graph. 15(3), 245–252 (1981) 2. Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, pp. 187– 194 (1999) 3. Cyberware Laboratory, 3D Scanner with Color Digitizer, Inc., Monterey, California, 4020/RGB (1990) 4. DeCarlo, D., Metaxas, D., Stone, M.: An anthropometric face model using variational techniques. In: Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, pp. 67–74. ACM SIGGRAPH, New York (1998) 5. Ekman, P., Friesen, W.: Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto (1978) 6. Green, B.: Edge Detection Tutorial, http://www.pages.drexel.edu/~weg22/edge.html 7. Kähler, K., Haber, J., Yamauchi, H., Seidel, H.-P.: Head shop: generating animated head models with anatomical structure. In: Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 55–63. EUROGRAPHICS, San Antonio (2002) 8. Lee, Y., Terzopoulos, D., Waters, K.: Realistic modeling for facial animation. In: Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques, pp. 55–62. ACM SIGGRAPH, New York (1995) 9. Liu, Z., Shan, Y., Zhang, Z.: Expressive expression mapping with ratio images. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 271–276. ACM SIGGRAPH, New York (2001) 10. Parke, F.: Computer generated animation of faces. In: Proceedings of the ACM Annual Conference, pp. 451–457. ACM, Boston (1972) 11. Sifakis, E., Neverov, I., Fedkiw, R.: Automatic determination of facial muscle activations from sparse motion capture marker data. ACM Trans. Graph. 24(3), 417–425 (2005) 12. Waters, K.: A muscle model for animating three-dimensional facial expression. Comput. Graph. 22(4), 17–24 (1987)
176
N. Ersotelos and F. Dong
13. Zhang, Q., Liu, Z., Guo, B., Shum, H.: Geometry-driven photorealistic facial expression synthesis. In: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 177–186. Eurographics, San Diego (2003) 14. Yang, C.K., Chiang, W.T.: An interactive facial expression generation system. In: Multimedia Tools and Applications, pp. 41–60. Kluwer Academic, Hingham (2008) 15. Leyvand, T., Cohen-Or, D., Dror, D., Lischinski, D.: Data-driven enhancement of facial attractiveness. In: Proceedings of the ACM SIGGRAPH. Article No. 38. ACM, New York (2008) 16. Bickel, B., Lang, M., Botsch, M., Otaduy, M.A., Gross, M.: Pose-Space Animation and Transfer of Facial Details. In: Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 57–66. ACM, New York (2008) 17. Golovinskiy, A., Matusik, A., Pfister, H., Rusinkiewicz, S., Funkhouser, T.: A statistical model for synthesis of detailed facial geometry. In: ACM Transactions on Graphics (TOG), pp. 1025–1034. ACM, New York (2006) 18. 3D_RMA: 3D Database, http://www.sic.rma.ac.be/~beumier/DB/3d_rma.html 19. The extended M2VTS Database, http://www.ee.surrey.ac.uk/Research/VSSP/xm2vtsdb 20. FEI Face Database, http://www.fei.edu.br/~cet/facedatabase.html
Sketch Based 3D Character Deformation Mo Li and Golam Ashraf National University of Singapore, Computing 1, Computing Drive 13, 117417 Singapore, Republic of Singapore
[email protected],
[email protected]
Abstract. Most 3D character editing tools are complex and non-intuitive. It takes lot of skill and labor from the artists to create even a draft 3D humanoid model. This paper proposes an intuitive 2D sketch-driven drafting tool that allows users to quickly shape and proportion existing detailed 3D models. We leverage on our existing vector shape representation to describe character bodypart segments as affine-transformed circle-triangle-square shape blends. This is done for both the input 2D doodle as well as for the extracted point clouds from 3D library mesh. The simplified body part vector shapes help describe the relative deformation between the source (3D library mesh) and the target (2D frontal sketch). The actual deformation is done using automatically setup Free Form Deformation cages. To perform body-part shape analysis, we first segment the mesh with Baran and Popovic’s algorithm for automatic fitting of an input skeleton to a given 3D mesh, followed by our existing 2D shape vector fitting process. There are several promising character design applications of this paper; e.g. accelerated personality pre-visualization in movie production houses, intuitive customization of avatars in games and interactive media, and procedural character generation. Keywords: Deformation, sketch interface, vector art.
1 Introduction While designing a humanoid character, artists typically use shape, size, pose and proportion as the first design layer to express role, physicality and personality traits of a character. The establishment of these traits in character design is one of the most important factors in the process of successful storytelling in any animated feature. Recent advancement in digital multimedia technologies has triggered widespread creation of aesthetic digital character art in the form of videos and images with textual labels or descriptions. But the process of creating humanoid characters with aesthetics matching the desired art style, role, physicality or personality traits still requires tedious labor and can only be done by experienced artists. A rapid visualization tool can be quite valuable in facilitating the character design brainstorming process. From several shape-proportion guides in art and psychology literature [1, 2, 3], we find that typically artists use primitive shaped body parts, skeletons and motion arcs to draft characters. Promising creations are then layered with more details like color, attire, facial expression, and accessories. We take inspiration from this workflow to drive Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 177–188, 2011. © Springer-Verlag Berlin Heidelberg 2011
178
M. Li and G. Ashraf
3D character deformation with a sketch-like interface. The input doodle is constructed as a sum of coarsely sketched body part shapes. Each body part is estimated by the system as a combination of circle-triangle-square primitives. This also motivates us to de-construct existing 3D meshes into similar body-part primitive vectors, and thus implement consistent deformation of 3D characters in response to the sketches. Since shape-vector targets can also be procedurally generated, characterattribute centric deformations can be done for online generation of virtual characters. In this paper, we describe the relevant details that allow shape vector deconstruction of 3D meshes, and shape fitting of input strokes, as well as automatic construction of Free Form Deformation (FFD) lattices to implement the deformation. Though we use FFD to implement the deformation, we could theoretically use any other method like skeletal or wire deformers. We choose FFD here because it is a model-free space transforming method, as opposed to parametric methods like Linear Blend Skinning. Our system works under no assumption on the geometry topology and resolution of the given character model. However, we assume a generic humanoid structure, where all models and drawings have similar number of body parts and semantic linkage between different body parts. The proposed system can be summarized as follows: a) b)
c)
The 2D input character doodle is processed to extract the vector shape information of each body part. The 3D library character mesh is segmented into different body parts with an existing skeleton fitting algorithm, and then each set of body part vertices are projected and fitted into vector shapes. FFD lattices are setup around the body parts of the 3D character model according to the vectors extracted in step 2. These lattices are then deformed according to the vectors extracted from the 2D character drawing in step 1, which will in turn deform the 3D model.
We first present a literature review of relevant techniques. Next we include a brief description of our existing supporting algorithms on shape representation, fitting and parameterization, for completeness. We then present details on 3D mesh body part segmentation and fitting, automatic FFD lattice construction, and deformation. Lastly, we present results, accompanied with analysis of potentials and limitations.
2 Related Work Shape Signature: Shape representation is a well-studied field because of its tremendous importance in pattern recognition and computer vision [4,5,6,7,8]. These methods can be classified according to several criteria. The first classification is based on the use of shape boundary points as opposed to the interior of the shape. Another classification can be made according to whether the result is numeric or non-numeric. The scalar transform techniques map the image into an attribute vector description, while the space-domain techniques transform the input image into an alternative spatial domain representation. The third classification can be made on the basis of whether a transformation is information preserving or information losing. There is also an approach called mathematical morphology that is a geometrical based approach for image analysis [4].
Sketch Based 3D Character Deformation
179
It provides a potential tool for extracting geometrical structures and representing shapes in many applications. Inspired by all these developments and from the fact that primitive shapes like circle, triangle and squares play a central role in human perception we developed the shape descriptor with a scaled/rotated/blended combination of these three primitive shapes [9]. Our descriptor can approximate any convex shape with a mixture of these three primitives. Every arbitrary shape is represented as a vector of height, width, rotation, centroid-position and three weight values for circle, triangle, and rectangle. Deformation: Laplacian deformation allows user-specified tweaks to one or a few points on the deformable surface, to be smoothly propagated to the vicinity. The tweaks are treated as hard constraints and the aim is to find an optimal deformation to satisfy them [10,11]. Igarashi et al [11] first proposed an interactive system that lets user deform a two-dimensional shape using a variant of constrained Laplacian deformation. Laplacian deformation is good for quickly resizing a given part with a few vertex-edits, but it is still fairly tedious to control the body part shape. In Free Form Deformation methods [12], the displacement of a cage control-point influences the entire space inside the lattice. However, specifying mesh deformations this way is both cumbersome and counterintuitive. Griessmair and Purgathofer [13] extended this technique to employ a trivariate B-spline basis. Though these methods are simple, efficient and popular in use, they suffer from the drawback of a restrictive original volume shape. Parallelepiped volumes rarely bear any visual correlation to the objects they deform and typically have a globally uniform lattice point structure that is larger than is required for the deformations to which they are applied. EFFD [14] is an improvement as it allows user-specified base-shapes, but manual lattice creation and deformation are still cumbersome [15]. MacCracken and Joy [16] use a volume equivalent of the Catmull-Clark subdivision scheme for surfaces to iteratively define a volume of space based on a control point structure of arbitrary topology. This is a significant step in increasing the admissible set of control lattice shapes. The technique is powerful and its only real shortcoming is the potential continuity problems of the mapping function (a combination of subdivision and interpolation) of points within the volume. The approach also suffers from the same discontinuity problems as Catmull-Clark surfaces at extraordinary vertices [17]. Sketching: Schmidt et al [18] explain the importance of the scaffolding technique in their review of sketching and inking techniques used by artists. In this method, artists construct characters from basic blocks representing different body parts. Our paper addresses this need for rapid abstraction of these basic blocks from rough strokes. Thorne et al [19] proposed the concept of sketching for character animation, but do not include shape modeling. Orzan et al [20] propose "Diffusion Curve" primitives for the creation of soft color-gradients from input strokes, along with an image analysis method to automatically extract Diffusion Curves from photographs. Schmidt et al [21] propose “ShapeShop”, a 3D sketch authoring system generating implicit surfaces, with non-linear editing via a construction history tree. Although these curvebased methods are intuitive, they require a fair amount of detailing. Thus they are inappropriate for rapid drafting. Our primitive blocks are a lossy abstraction of detailed convex shapes, and thus are easier to represent, construct and perceive. GUI: Exposing mathematical parameters for indirect manipulation via a GUI interface has two major disadvantages. Firstly, there is no intuitive connection between these parameters and the user-desired manipulation. Secondly, deformations defined
180
M. Li and G. Ashraf
using the handles of a specific representation cannot be trivially applied to other shape representations or even different instances of the same shape representation [22]. Integrated bone and cage deformation systems avoid potential artifacts that may arise in case of independent localized cages [19]. Our work focuses on creating 3D models of humanoid characters from a rough 2D sketch input from the user. It is trying to solve the character-drafting problem in the same spirit as Sykora et al [23, 24, 25], Gingold et al [26], and Fiore et al [27]. However, none of these works factor in the role of psychology in primitive shape scaffolding of characters. We derive inspiration from the use of primitive shapes outlined in art books [1,2,3] as well as shape perception literature [28]. Since primitive shapes like circle, triangle and rectangles play a central role in human perception, our underlying shape abstraction is closer to artists’ creative intentions. We have recently proved this computationally through data mining techniques on perception feedback collected implicitly through online character puzzle games [9,29,30].
3 Supporting Algorithms We briefly describe our prior work on shape representation [9] and parameterization [31] for completeness, as we will develop on it to implement scaffold drawing driven FFD deformation of character meshes. 3.1 Vector Shape Representation As shown in Fig. 1, we store each of the three normalized primitive shapes as a set of eight quadratic Bezier curves. The solid points represent segment boundaries and the ragged blotches represent mid-segment control points. Note how a null segment (1-2) had to be created for the apex of the triangle. The reason why our piecewise curve segments work so well is that we were able to carefully identify the corresponding segments for the diverse topologies of circle, triangle and square. As a result, even under simple linear interpolation, we do not notice any tears or inconsistent shapes. The normalized shapes can be affine transformed to any location, scale and rotation. Finally, the shape weights are applied to blend the corresponding Bezier control points, to yield an in-between shape. Note that start-end-mid control points of only corresponding segments are interpolated, as shown in Eqns. 1 and 2. 3
p 'j = ¦ ( wi . pi , j )
(1)
i =1 3
m 'j = ¦ ( wi .mi , j )
(2)
i =1
3
where
j ∈{1,2,3,4,5,6,7,8} and
∑w
i
=1
i=1
In the above equations, p′j and m′j represent the j-th blended segment boundary and midpoints respectively, while pi,j, and mi,j represent the corresponding control points
Sketch Based 3D Character Deformation
181
in the i-th primitive shape (circle, triangle, square). is the weight contribution from the i-th primitive shape. Results of some blend operations are shown in Fig. 1. The cross hair under the shapes indicates the shape weights. 1
2
1
1
2
3
2
3
3
4 5
4
5
4
b
l
5
Fig. 1. Consistent interpolation of circle, triangle, and square [9]
Results of some blend operations are shown in Fig. 2. The cross hairs under the shapes indicate the shape weights. With this background information about our primitive representation, we are now ready to describe vector fitting of stroked body-part line drawings. We assume that the input shapes are roughly symmetric about their medial axis, and generally convex.
Fig. 2. Blended shapes after consistent interpolation (shape weights indicated by cursor positions) [9]
3.2 Vector Fitting A closed input stroke can be treated as a set of connected points, where the first and last points are fairly close to each other. We first resample the stroke at fixed angular intervals about the centroid of the input points. This helps avoid any bias due to variances in stylus pressure and stroke timing. A standard projection variance maximization algorithm, commonly employed to compute Oriented Bounding Boxes, is used to find the medial axis. In this algorithm, a ray is cast through the centroid, then all the boundary points are projected onto the ray, and the variance of the projected point distances from the centroid is noted. The ray that produces maximum variance is estimated to be the medial axis. Once the medial axis is noted, the axial-length and
182
M. Li and G. Ashraf
lateral-breadth of the shape can be easily calculated. We then perform a normalization affine transform to align the input shape to the Y-axis and scale it into a unit square. This simplifies shape error checking while ensuring rotation/translation/scale invariance during the fitting process. Lastly, we compute the best primitive shape combination, by minimizing boundary distance errors between our template shape combinations and the input points. In practice, this is a simple 2-level for-loop, incrementing shape weights by a fixed small value, and measuring the accumulated shape error. The shape error is calculated by accumulating slice-width errors over 40 lateral segments (along the medial axis). We have achieved decent fitting results for most cases. However, there are some cases where shapes computed with boundary distance errors do not match with human perception. We are currently working to improve the qualitative results through a perception regression model. 3.3 Space Parameterization As shown in Fig. 3, we use a tuple {s,t} for parameterizing the cage and correctly positioning corresponding lattice points in the source (mesh) and target (sketch) FFD lattices for each body part. Parameter t is a floating point number whose integral part holds the Bezier segment number of the curve and parameter s is the measurement of distance along the line joining the center of a cage and the point on the Beziersegment-curve. Each pixel in Cartesian coordinates {x,y} can be easily converted into polar shape coordinates {s,t} and vice versa.
Fig. 3. Polar Coordinate Parameterization of a Cage [31]
4 Vector Segmentation of 3D Mesh Similar to the 2D drawings, the 3D character model needs to be analyzed for body part shape vector extraction. The details of the process are as follows: a) For each vertex on the mesh, its body part membership information is computed, which specifies the body part this vertex belongs to. To accomplish this, a
Sketch Based 3D Character Deformation
183
standard humanoid skeleton is created to fit the humanoid mesh using Baran and Popovic's automatic skeleton fitting algorithm [32]. Their skinning algorithm returns a set of influence weights and active bone indices for every vertex. We use this information to partition the vertices into body part sets, using influence weight thresholds and identity of the most influential bone. Since the segmentation algorithm uses many iterative calculations to search for the optimal skeleton (matching the input skeleton structure), this step is performed offline on all the 3D meshes in the library, to allow for efficient deformation during the sketching process. b) All the vertices that belong to the same body part are then grouped and projected onto the XY plane as both the source 2D character drawings and the 3D character model are posed in the front profile. The convex hull for each of these groups is then computed, giving the exact contour for that body part in the front profile. To extract the shape vectors of the convex hulls, 2D points are sampled at regular intervals along hull outline, and then fed into our vector fitting routine (see Sec. 3.2) to generate the corresponding shape vectors. Note that accurate shape fitting of the 3D mesh body parts is only useful for arbitrary topology FFD [14, 16] implementations. For basic parallelepiped FFD implementations [12], doing an Oriented Bounding Box computation of the above projected convex-hull points is sufficient.
5 Character Mesh Deformation Character deformation is achieved by first segmenting the source (sketched body parts) and target (3D mesh) figures, and then, generating shape vectors from them. Since the sketch consists of a set of body part outlines, the segmentation is simply the process of auto-identifying the body parts using a set of heuristics similar to [19]; e.g. head appears top-most, under which appears neck and/or torso, etc. The overall sketch driven mesh deformation idea is illustrated in Fig. 4. Using the template of a skinny girl image, a rough sketch of body parts (omitted in Figure 4a for clarity) is fed into the system, which then deforms a pre-segmented mesh from the 3D library. The process starts with 2D character drawings being processed (Fig. 4a), to extract shape vectors for each individual body part (Fig. 4b), using the vector fitting technique in Sec. 3.2. Its corresponding 2D FFD lattice deformer setup (Fig. 4c), serves as the deformation target. Similarly, automatic body part shape analysis is performed on the source character model (Fig. 4d), to produce a set of vectors (Fig. 4e) corresponding to those of the 2D drawings. A set of FFD lattice deformers (Fig. 4f) can then be constructed from these vectors, which completes the process by deforming the source model (from Fig. 4d to Fig. 4g). Lattice deformation proceeds in a standard manner, as described by Sederberg and Parry [12], once the source and target lattices are set up from the sketch and 3D undeformed model, respectively. In practice, arbitrary topology FFD [14, 16] yields better results than the original parallelepiped deformer base configuration in [12]. The rest of this section explains the automatic full body FFD lattice system construction from step (e) to (f) in Fig. 4. Given a shape vector, v, a lattice, l, needs to be created such that its shape and affine transformation match that of the body part front profile. This ensures the subsequent lattice deformation is accurate. The following steps illustrate the details of the algorithm in the case of a 5×5×2 3D lattice deformer.
184
M. Li and G. Ashraf
Fig. 4. Deformation pipeline. a) 2D input drawing/sketch. b) Shape vectors of 2D drawing. c) Full body lattice construction for 2D drawing. d) Source 3D character model. e) Shape vectors of the 3D model body parts. f) Full body lattice construction for 3D model. g) Final deformed 3D model.
However, it should be noted that the same algorithm also applies to any lattice subdivision configuration, though this particular configuration proves to be capable of producing satisfactory deformation results without adding much complexity to the real-time deformation calculations. The steps are as follows: a) A unit sized 5×5×2 3D lattice deformer, L, is created at the world origin. Before any global affine transformation is applied to match L to its corresponding body part in terms of rotation, scaling/size and position, L is deformed into a linear combination of the three primitive shapes according to the weights indicated in the body part vector, V. As shown in Fig. 5, for every boundary lattice control point ( ) along the outline of L (in clockwise direction), three position values are calculated: i) square with shape weights (1, 0, 0); ii) triangle with shape weights (0, 1, 0); iii) circle with shape weights (0, 0, 1). Denoted by Si, Ti, and Ci, these values represent the corresponding positions of lattice control point, Pi, on the respective primitive shape. The final interpolated position Pi is then given by a linear combination of the position values: Pi = Siv s +T iv t + Civ c , where (v s,v t ,v c ) denote the shape weights of vector shape V. Based on the number of sub-divisions, we can easily assign regular {s, t} intervals (see Sec. 3.3) to the lattice points, and accurately extract Cartesian coordinates for both source and target FFD cages. b) The positions for the internal lattice control points, Pmn, are computed by interpolating the positions of the boundary control points, Pi. We traverse through these points in topÆdown and leftÆright order. Each internal lattice point is computed as a distance-weighted sum of the two boundary lattice points, Pa and Pb, on the same lattice row m, as shown in Fig. 5.
Sketch Based 3D Character Deformation
185
Fig. 5. Computing lattice control points from shape vectors
c) With the shape defined, L is now scaled along X and Y-axis, rotated and finally translated according to V to complete the construction of a lattice deformer, L. In order to prevent unwanted distortion to the geometry of the source model, L is set to influence the mesh geometry only after the entire construction process is completed. d) Finally, the depth of L is set to be a fixed value, which should exceed the girth of the model along the Z direction. Since our system concentrates on the front profile of the prototyping process, the exact value of this parameter is not that significant.
6 Results and Analysis Fig. 6 illustrates completely automatic results of body-part shape analysis on a 2D input sketch (body parts traced over an existing “skinny-girl” stock image), as well as two different (muscular and fat) 3D humanoid meshes. As can be seen, the deformed models inherit the dominant shape traits from the corresponding body parts of the input character drawing while still preserving the smoothness at the joined area between body parts. The deformation input was sketched within 20 seconds, and the FFD mesh deformation result was achieved within 1-2 seconds. The un-optimized 3D mesh segmentation code, Pinocchio [32], takes a few minutes on lightweight meshes (1-100K triangles), so we do this as an offline step. There are a few limitations to our current work. Firstly, we notice that the degree of compliance with the source sketch shapes varies with different models. This is expected, as we implement the shape transfer as a relative shape deformation operation, rather than a hard boundary constrained optimization problem. Secondly, foreshortening of the input drawing is inevitable in the general case as our body part shape analysis is currently limited to the front profile only. Lastly, some vertex collapsing artifacts are produced for vertices in overlapping FFD influence regions.
186
M. Li and G. Ashraf
Fig. 6. Mesh deformation results. a) Input 2D sketch with body part vector analysis. b) Source 3D character model with body part vector shape analysis. c) Final deformed 3D model.
As shown in Fig. 7, limited control over overlapping lattice deformers at joint areas like shoulders tend to create geometry artifacts such as shrinking. Such problems can be addressed by setting up better procedural fall-off of influence, as well as controlled smoothing of influence between neighboring FFD lattices.
Fig. 7. Deformation artifacts at joints with overlapping lattices
Sketch Based 3D Character Deformation
187
7 Conclusion and Future Work We have successfully demonstrated a system that allows artists to intuitively reshape an existing detailed 3D character model using 2D character sketch inputs, in just a few seconds. This enables them to quickly visualize characters in the 3D space, without spending much effort in modeling/texturing/deformation. We believe this can help significantly in the brainstorming of new characters, as well as in the procedural re-purposing of existing 3D meshes. We have illustrated decent quality results for deforming two characters models with significantly different builds. Our approach focuses on intuitiveness and automation, which makes it suitable as a quick 3D character visualization tool. Improvements currently under development include: use of our novel parametric deformation [31] to replace FFD deformation, for better volume preservation, multiview and pose aware adjustments to foreshortened body-parts, and support for multistroked silhouette inputs instead of body-part scaffold drawings) to cater to more experienced artists.
Acknowledgement This research is funded by MDA GAMBIT fund (WBS: R-252-000-357-490), sponsored by Media Development Authority of Singapore.
References 1. Beiman, N.: Prepare to Board! Creating Story and Characters for Animated feature. Focal Press (2007) ISBN: 978-0240808208 2. Camara, S.: All About Techniques in Drawing for Animation Production, 1st edn. Barron’s Education Series, Inc. (2006) ISBN: 978-0764159190 3. Hart, C.: Cartoon Cool: How to Draw New Retro-Style Characters 4. Bebis, G., Georgiopoulos, M., Da Vitoria Lobo, N.: Using self-organizing maps to learn geometric hash functions for model-based object recognition. IEEE Transactions on Neural Networks 9(3), 560–570 (1998) 5. Ballard, D.H.: Generalizing the hough transform to detect arbitary shapes. Pattern Recognition 13, 111–122 (1981) 6. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape context. IEEE Transactions on Visualization and Computer Graphics 24(4) (2002) 7. Loncaric, S.: A survey of shape analysis techniques. Pattern Recognition 31, 983–1001 (1998) 8. Pavlidis, T.: A review of algorithms for shape analysis. Comput. Graphics Image Process. 7(2), 243–258 (1978) 9. Islam, M. T., Nahiduzzaman, K.M., Peng, W.Y., Ashraf, G.: Learning from Humanoid Cartoon Designs. In: Perner, P. (ed.) ICDM 2010. LNCS, vol. 6171, pp. 606–616. Springer, Heidelberg (2010) 10. Botsch, M., Pauly, M., Wicke, M., Gross, M.H.: Adaptive space deformations based on rigid cells. Computer Graphics Forum 26(3), 339–347 (2007)
188
M. Li and G. Ashraf
11. Igarashi, T., Moscovich, T., Hughes, J.F.: As-rigid-aspossible shape manipulation. ACM Trans. Graphics 24(3), 1134–1141 (2005) 12. Sederberg, T.W., Parry, S.R.: Free-form deformation of solid geometric models. Comput. Graph. 20(4), 151–160 13. Griessmair, J., Purgathofer, W.: Deformation of solids with trivariate B-splines. Eurographics 89, 137–148 14. Coquillart, S.: Extended free-form deformation: A sculpturing tool for 3D geometric modeling. Comput. Graph. 24(4), 187–196 15. Gain, J., Bechmann, D.: A survey of spatial deformation from a user-centered perspective. ACM Transactions on Graphics (TOG) 27(4), 1–21 (2008) 16. MacCracken, R., Joy, K.: Free-form deformations with lattices of arbitrary topology. In: SIGGRAPH 1996 Conference Proceedings, pp. 181–188 (1996) 17. Singh, K., Kokkevis, E.: Skinning Characters using Surface Oriented Free-Form Deformations. In: Graphics Interface 2000, pp. 35–42 (2000) 18. Schmidt, R., Isenberg, T., Jepp, P., Singh, K., Wyvill, B.: Sketching, Scaffolding, and Inking: A Visual History for Interactive. 3D Modeling (2007) 19. Thorne, M., Burke, D., van de Panne, M.: Motion Doodles: An Interface for Sketching Character. ACM Siggraph, 424–431 (2004) 20. Orzan, A., Bousseau, A., Winnemöller, H., Barla, P., Thollot, J., Salesin, D.: Diffusion Curves: A Vector Representation for Smooth-Shaded Images. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2008) 27 (2008) 21. Schmidt, R., Wyvill, B., Sousa, M.C., Jorge, J.A.: ShapeShop: Sketch-Based Solid Modeling with BlobTrees. In: 2nd Eurographics Workshop on Sketch-Based Interfaces and Modeling, pp. 53–62 (2005) 22. Angelidis, A., Singh, K.: Space deformations and their application to shape modeling. In: ACM SIGGRAPH 2006 Courses, Boston, Massachusetts, July 30-August 03 (2006) 23. Sýkora, D., Buriánek, J., Zára, J.: Sketching cartoons by example. In: Proceedings of Eurographics Workshop on Sketch-Based Interfaces and Modeling, pp. 27–34 (2005) 24. Sýkora, D., Dingliana, J., Collins, S.: As rigid-as-possible image registration for handdrawn cartoon animations. In: Proceedings of International Symposium on Nonphotorealistic Animation and Rendering, pp. 25–33 (2009) 25. Sýkora, D., Sedláček, D., Jinchao, S., Dingliana, J., Collins, S.: Adding depth to cartoons using sparse depth inequalities. Computer Graphics Forum 29, 2 (2010) 26. Gingold, Y., Igarashi, T., Zorin, D.: Structured annotations for 2D-to-3D modeling. ACM Transactions on Graphics (TOG) 28(5), 148 (2009) 27. Fiore, F.D., Reeth, F.V., Patterson, J., Willis, P.: Highly stylised animation. The Visual Computer 24(2), 105–123 (2008) 28. Garrett, L.: Visual design: A Problem-Solving Approach, ISBN: 978-0882753324 29. Ashraf, G., Why, Y.P., Islam, M.T.: Mining human shapes perception with role playing games. In: 3rd Annual International Conference on Computer Games, Multimedia and Allied Technology, Singapore, pp. 58–64 (2010) 30. Islam, M.T., Why, Y.P., Ashraf, G.: Learning Shape-Proportion Relationships from Labeled Humanoid Cartoons. In: 6th International Conference on Digital Content, Multimedia Technology and its Applications, Seoul, pp. 416–420 (2010) 31. Ashraf, G., Nahiduzzaman, K.M., Hai, L.N.K., Li, M.: Drafting 2D Characters with Primitive Scaffolds. In: Second International Conference on Creative Content Technologies, Computation World, Lisbon (November 2010) (in Press) 32. Baran, I., Popović, J.: Automatic rigging and animation of 3D characters. In: ACM SIGGRAPH, San Diego, California, August 05-09 (2007)
Mean Laplace–Beltrami Operator for Quadrilateral Meshes Yunhui Xiong1,2, Guiqing Li1, and Guoqiang Han1 1
School of Computer Science & Engineering, South China Univ. of Tech., Guangzhou, China 2 College of Science, South China Univ. of Tech., Guangzhou, China {yhxiong,ligq,csgqhan}@scut.edu.cn
Abstract. This paper proposes a discrete approximation of Laplace-Beltrami operator for quadrilateral meshes which we name as mean Laplace-Beltrami operator (MLBO). Given vertex p and its quadrilateral 1-neighborhood N(p), the MLBO of p is defined as the average of the LBOs defined on all triangulations of N(p) and ultimately expressed as a linear combination of 1-neighborhood vertices. The operator is quite simple and numerically convergent. Its weights are symmetric, and easily modified to positive. Several examples are presented to show its applications. Keywords: Laplace–Beltrami operator, Quadrilateral meshes, Mean curvature, Mesh smoothing.
1 Introduction Laplace–Beltrami operator (LBO) plays a key role in computer graphics and image processing [1]. A great number of approaches have been contributed to estimate the LBO at vertices of triangular meshes in literature [1][2]. On the other hand, there are many applications such as parameterization used in tensor-product B-spline reconstruction, texturing atlas, and simulations based on finite element analysis favoring quadrilateral meshes. This makes it useful to directly evaluate the LBO at vertices of quadrilateral meshes. Let M ⊂ R 3 be a 2-manifold surface and f a smooth function over M. Given point p = ( x , y , z ) ∈ M its mean curvature is defined as [2]
,
H ( p) = − lim k diam ( A) − >0
∇A( p) A( p) ,
(1)
where A(p) is the area of a small region around p on the surface, diam(A(p)) the diameter of the region, ∇ A(p) is the gradient of A(p) with respect to a specific parameterization around p, and k a constant which is employed to cancel the scaling caused by different parameterization and is generally set to 1/2 [3], 2[4] or others [2]. It is well known that the LBO and the mean curvature of p have the following relation [4]
Δ M ( p) = 2 H ( p ) . Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 189–201, 2011. © Springer-Verlag Berlin Heidelberg 2011
(2)
190
Y. Xiong, G. Li, and G. Han
More generally, the LBO for f, denoted by Δ M f ( p ) , has the following form
Δ M ( f ( p)) =
∂ 2 f (ξ , ζ ) ∂ 2 f (ξ , ζ ) = 2 H ( f ( p)) , + ∂ζ 2 ∂ξ 2
where ( ξ, ζ ) is the local parametric coordinates near p. The discretization approaches of LBO usually start from (1) and (2), and use different tricks to estimate the area around p. For example, Desbrun et al. [3] employed the area of 1-neighborhood of p to evaluate (1) while Meyer et al. [5] exploited the Voronoi region of p to do this. Xu and coauthors applied the approach to quadrilateral meshes and used bilinear interpolating surface to approximate the quads incident to p [4][6]. Their approach involves the integration of a complicated differential geometric quantity and cannot provide a closed form for the combination coefficients. Instead, a 1-point or 4-point integral formula is employed to give a numerical solution. We present a generic approach to extend a discretization of the LBO for triangular meshes to quadrilateral meshes in this paper. For given quadrilateral 1-neighborhood of vertex p, it evaluates the discrete LBO of p on two extreme triangulations of the neighborhood of p. As the derived LBO is actually the average of discrete LBOs of all possible triangulations of the quadrilateral 1-neighborhood of p, we name it the mean Laplace-Beltrami operator (MLBO). Applying the approach in terms of the discrete operator by Desbrun et al. [3], we explore a symmetric and positive MLBO for quadrilateral meshes. The rest of the paper is organized as follows. Section 2 introduces related work. Section 3 describes our MLBO. Section 4 discusses experimental results. Section 5 draws conclusions.
2 Related Works We have mentioned in Section 1 that (1) and (2) are widely used to estimate the discrete LBO on polygonal meshes. In this section, we summarize some important results for triangular and quadrilateral meshes, respectively. 2.1 LBOs over Triangular Meshes Let M be a triangular mesh, f a smooth function on M, and p a vertex on M. The LBO of f at p can usually be approximated by the weighted average of the difference between function values at p and its 1-ring neighbors,
Δ M f ( p) ≈
∑ w ( f ( p ) − f ( p)) ,
j∈N ( p )
j
j
where N(p) stands for the index set of the adjacent vertices of p, and wj is the weight of pj. A trivial scheme with w j = 1 N ( p) views the contribution of each adjacent vertex as the same, where N ( p) is the valence of p. Fujiwara [7] introduced
Mean Laplace–Beltrami Operator for Quadrilateral Meshes
191
w j = 1 p − p j to emphasize the fact that the closer neighbors have more influence on the operator. Desbrun et al. derived the following weights by approximating A(p) with the area of 1-neighborhood of p (See Fig. 1(a)), denoted as AM(p) , in (1)
wj =
1 2 AM ( p )
(cot α j + cot β j ) ,
where αj and βj are two opposite angles with respect to edge (p, pj ) as shown in Fig. 1 (a). Meyer et al. [5] further improved the result by replacing the area with that of a so-called Voronoi region (Denoted by AV(p), the gray area in Fig. 1 (a)) and obtained X
X
X
X
w j = (cot α j + cot β j ) / 2 AV ( p) . Both approaches share the same normalized version
cot α j + cot β j
wj =
∑ (cot α
k∈N ( p )
α
k
+ cot β k )
j
AV ( p )
pj +1
p
p
.
p' j
β
pj j
pj
AM ( p)
pj−1
(a)
' j −1
p
(b)
Fig. 1. (a) Illustration of 1-neighborhood of p, angles αj ,βj , area AM(p) of the whole region and area AV(p) of the Voronoi region (gray); (b) Quadrilateral 1- neighborhood of p, and labeling of its adjacent vertices
2.2 LBO over Quadrilateral Meshes Though many discrete LBOs have been explored for triangular meshes, less literature addressed the case of quadrilateral meshes. Analogously, let M be a quadrilateral mesh with n vertices, and f a smooth function on the mesh. For given vertex p of valence m on M, we enumerate its adjacent faces by Q j = ( p, p j , p′j , p j +1 ) for j = 0, 1, 2… m-1, where p j and p′j are edge adjacent vertices and opposite vertices, respectively (see Fig. 1 (b)). Noticing that four vertices of Q j are generally not coplanar, Zhang et al. employed a bilinear function S(u,v) X
X
( (u , v) ∈ [0,1]2 ) interpolating the four corners of Q j as approximation of the underlying surface 0 . In this case, the area of Q j can be calculated by X
192
Y. Xiong, G. Li, and G. Han 1 1
Aj = ∫
∫
0 0
Su
2
Sv
2
2
− Su , S v dudv .
(3)
The area of the 1-neighborhood is obtained by summating the area of these quads m −1
A( p) = ∑ A j j =0
,
whose gradient is evaluated as the integral of gradient of the square root in (3) while all integrals are numerically computed using a 4-point integral formula. The LBO is then obtained by combing (1) and (2). Liu et al. [4] employed a 1-point integral formula to simplify the computation, and further analyzed the convergence of their LBO. X
2.3 Properties of the LBO Positive weights. Positive weights ( w j ≥ 0 ) is preferable in applications such as mesh editing [8] and harmonic function generating [9] . We further explain this in X
X
Section 4 using some examples. Symmetry. Given edge ( pi , p j ) , let wij and w ji be the weight of p j in Δ M f ( pi )
and the weight of pi in Δ M f ( p j ) . The operator is symmetric provided that
wij = w ji holds. Recently, a great number of spectral methods have been proposed in geometry processing and analysis [10] such as compression, segmentation, sequencing, smoothing, watermarking, reconstruction, and remeshing of mesh models . LBOs X
with symmetric weights are fundamental in these applications since a symmetric Laplace matrix has real eigenvalues and the computation of its spectral is simple. Convergence. A discrete LBO is convergent if it approximates the exact mean curva-
ture normal when the mesh approaches to the underlying surface. The property is desirable for numerical simulations. Xu [2] presented a full investigation on this topic X
for triangular meshes.
3 MLBO for Quadrilateral Meshes Our goal is to establish a strategy to extend an arbitrary LBO for triangular meshes to quadrilateral meshes and then derives a discrete LBO satisfying the properties described in Subsection 2.2 for quadrilateral meshes. Considering that the LBO for triangular meshes introduced by Desbrun et al. and Meyer et al. meet our requirement, we will use their LBO (See Section 2.1) to show how our method works.
Mean Laplace–Beltrami Operator for Quadrilateral Meshes
193
3.1 Definition of MLBO
Using the notations in Section 2.2, we consider the definition of MLBO at vertex p. We firstly discriminate two triangulations for each quad adjacent to vertex p ( Fig. 2 (a)). The triangulation obtained by connecting p and its opposite vertex is called a primal triangulation of the quad ( Fig. 2 (b)) while the other triangulation is dual ( Fig. 2 (c)). Denote {T j1 ≡ Δpp j p′j , T j 2 ≡ Δpp′j p j +1} the primal triangulation of Q j and X
X
X
X
X
X
{T j 3 ≡ Δpp j p j +1 , T j 4 ≡ Δp j p′j p j +1} the dual triangulation of Q j . pj +1
p
pj +1
p pj' pj
pj
(a)
α j5 T j2 α j4 T j1 α pj' j3 α j2
(b)
α j6
p T j3
pj +1
Tj4 p j '
α j1 pj
(c)
Fig. 2. (a) Quad Q j , its primal triangulation (b) and dual triangulation (c) pj+1
p
p' j pj
pj−1
pj−1'
(a)
(b)
(c)
(d)
(g)
(h)
(i)
(j)
(e)
(f)
(k)
(l) pj +1 pj'
p
pj pj −1
(m)
(n)
(o)
pj−1'
(p)
Fig. 3. All the 2 m triangulations of the 1- neighborhood of p. (a) is the primal triangulation and (p) is the dual triangulation.
194
Y. Xiong, G. Li, and G. Han
We gain the primal triangulation ( Fig.3 (a)) of the 1-neighborhood of p if all adjacent quads are primarily triangulated, and the dual triangulation ( Fig.3 (p)) if all adjacent quads are dually triangulated. Obviously, there are total 2 m triangulations for the 1-ring neighborhood of p, as shown in Fig.3. Each of them induces an LBO operator. This results in the following definition. X
X
X
X
X
Definition 1. MLBO of p is the average of the discrete LBOs defined on all triangulations of the quadrilateral 1-neighborhood of vertex p. 3.2 MLBO from the LBO of Desbrun et al.
Notice that in 2 m triangulations, there are a half containing the primal triangulation of Q j and a half containing the dual triangulation of Q j . Henceforth, the average of the area of all triangulations with p common vertex can be computed as A( p ) =
1 2m
∑
1≤ j ≤ m
[
]
2m −1 A(T j1 ) + A(T j 2 ) + A(T j 3 ) =
[
]
1 ∑ A(T j1 ) + A(T j 2 ) + A(T j 3 ) (4) 2 1≤ j ≤ m .
Applying the gradient operator to (4) yields ∇A( p) = ∇
[
]
[
]
1 m −1 1 A(T j1 ) + A(T j 2 ) + A(T j 3 ) = ∑ ∇A(T j1 ) + ∇A(T j 2 ) + ∇A(T j 3 ) (5) ∑ 2 j =0 2 1≤ j ≤m .
According to [3], we have
∇A(T j1 ) = ∇A(T j 2 ) = ∇A(T j 3 ) =
cot α j 3 ( p j − p ) + cot α j 2 ( p′j − p )
, 2 cot α j 5 ( p′j − p ) + cot α j 4 ( p j +1 − p )
2 cot α j 6 ( p j − p ) + cot α j1 ( p j +1 − p ) 2
,
,
where α j1 = ∠pp j p j +1 , α j 2 = ∠pp j p′j , α j 3 = ∠pp′j p j , α j 4 = ∠pp′j p j +1 , α j 5 = ∠pp j +1 p′j , and α j 6 = ∠pp j +1 p j . Therefore (5) can be rewritten as m −1
m −1
j =0
j =0
∇A( p) = ∑ w j ( p j − p) +∑ w′j ( p′j − p) ,
(6)
where
wj =
cot α j 3 + cot α j 6 + cot α ( j −1)1 + cot α ( j −1) 4 4
, w′j =
cot α j 2 + cot α j 5 4
.
Setting k=3/2 and noticing (1), (2) and (5), we finally obtain a new discretization of the LBO on the quadrilateral mesh M.
Mean Laplace–Beltrami Operator for Quadrilateral Meshes
Δ M ( p) = 2 H ( p) =
m −1 ⎞ 3 ⎛ m −1 ⎜ ∑ w j ( p j − p ) + ∑ w′j ( p′j − p ) ⎟ . ⎜ ⎟ A( p ) ⎝ j = 0 j =0 ⎠
195
(7)
As (4) is the average of area of all triangulations of 1-neighborhood of p and (5) the corresponding gradient, we claim that (7) is the MLBO of p. Remark 1. From (4), one can show that the MLBO in (7) can also be derived as the average of the two triangular LBOs defined by primal and dual triangulations of the 1-neighborhood (see Fig.3 (a) and (p)) of p. For general function f with M as the parametric domain, we have 3 ⎛ m−1 ⎜ ∑ w j ( f ( p j ) − f ( p)) + A( p ) ⎜⎝ j =0
Δ M f ( p) ≈
m−1
⎞
j =0
⎠
∑ w′j ( f ( p′j ) − f ( p)) ⎟⎟.
(8)
Based on the similar strategy, it is not difficult to extend the approach to arbitrarily polygonal meshes, especially tri/quad ones. 3.3 Positive Weights
The weights in (8) are possibly negative if there are obtuse angles in the triangulations. Similar to [11], we overcome the artifact by simply reducing the angles to half, namely, wj =
α j6 α j3 α ( j −1)1 α ( j −1) 4 ⎞ 1 ⎛⎜ ⎟, cot + cot + cot + cot 4 ⎜⎝ 2 2 2 2 ⎟⎠
w′j =
α j2 α j5 ⎞ 1 ⎛⎜ ⎟, j = 0,1, + cot cot ⎜ 4⎝ 2 2 ⎟⎠
, m − 1. .
In some cases, a normalized version of (8) is desired
Δ M f ( p ) Norm =
m −1 1 [ w j ( f ( p j ) − f ( p )) + w′j ( f ( p ′j ) − f ( p ))]. (9) ∑ ′ ( w + w ) ∑ j j j =0
0≤ j ≤ m −1
3.4 Laplace Matrices
Suppose that mesh M has n vertices p1, p2, …, pn.. Denote X = { p1 , p2 ,
, pn }T ,
f ( X ) = { f ( p1 ), f ( p2 ), , f ( pn )}T , and
Δ M f ( X ) = {Δ M f ( p1 ), Δ M f ( p 2 ),
, Δ M f ( pn )}T .
We then have
Δ M f ( X ) = −L ⋅ f ( X ) ,
(10)
where n × n matrix L is the so-called LBO matrix whose entries are calculated as follows
196
Y. Xiong, G. Li, and G. Han
⎧ i ≠ j , and j ∉ N ( pi ), 0 ⎪⎪ L(i, j ) = ⎨ − wij i ≠ j , and j ∈ N ( pi ), ⎪ i= j ⎪⎩∑ j∈N ( pi ) wij in which wij is the weight of item f ( pi ) − f ( p j ) in Δ M f ( pi ) , the Laplace-Beltrami operator of f ( pi ) , defined by (8) or (9) . From the definition of wi and wi′ in (8) or X
X
(9), it is easy to show wij = w ji , which says that L holds the symmetric property.
4 Experimental Results We firstly give some examples to show the numerical behaviors of the MLBO and then compare it with the previous approach [4]. We also demonstrate applications of the MLBO to mesh smoothing and field generation. In all pseudo color figures of this section, red color stands for the maximal value of the corresponding scalar field, blue color indicates the minimal one, and green color the middle value. The effectiveness of our method is verified by applying the MLBO to estimate the curvature normals of quadrilateral meshes sampled from parametric surfaces. Following Xu [2], we also select the four surfaces below to perform the examination:
f1 : z ( x, y) = e ( x+ y ) , f 2 : z ( x, y) = e xy , π (x2 − y2 ) , f 3 : z ( x, y ) = sin 2 2 2 f 4 : z ( x, y ) = sin (3 x )sin (3 y )e − ( x + y ) . Let S h = {qij = (i * h, j * h), i,j = -1/h, ,−2,-1,0,1,2, ,1/h} be a uniform sampling set of the square domain [−1,1]2 , where h is the sampling step length. It induces a quadrilateral mesh for each of the above surfaces. Fig.4 (a)-(d) depicts the accurate mean curvature maps of the four surfaces with h =0.05.
(a)
(b)
(c)
(d)
Fig. 4. From left to right: mean curvature mappings for quadrilateral meshes defined by f1~f4 on which the mean curvature of vertices are accurate
Mean Laplace–Beltrami Operator for Quadrilateral Meshes
197
Numerical convergence. We test the convergence of the MLBO using a sequence of sampled meshes with respect to h=
1 1 1 1 1 . , , , , 20 40 80 160 320
Fig.5 demonstrates that the curvature error decreases with h approaching to zero (red in the figure). As a comparison, we also draw the error of Liu et al.’s algorithm [4] (blue in the figure). Numerical results show that the two approaches are comparative in approximating the mean curvature of the surfaces in this setting. Fig. 6 illustrates mean curvature maps for more complex models (Fig. 6 (a), (d)) using Liu et al.’s method (Fig. 6 (b), (e)) and our operator (Fig. 6 (c), (f)). Quadrilateral mesh smoothing. Fig. 7 presents two examples to show application of the MLBO to quadrilateral mesh smoothing. Here we choose the following surface diffusion flow as smoother [12] : XX
∂p p(t + Δt ) − p(t ) = Δ M ( p) ≈ , ∂t Δt
(11)
whereΔt stands for the smoothing step length. Eq. (11) implies the following iterative formula for smoothing p(t + Δt ) = p (t ) + Δ M ( p)Δt .
(a)
(c)
(b)
(d)
Fig. 5. Use the MLBO (red) and the LBO of Liu et al. (blue) to approximate the curvature of the quadrilateral meshes with respects to surfaces f1~f4 ((a)~(d)). In all cases, the vertical coordinates stand for the maximal curvature error and the horizontal coordinates associate with the five values of h1 = 1 20 , h2 = 1 40 , h3 = 1 80 , h4 = 1160 and h5 = 1 320 .
Harmonic fields. Many applications in the field of computer graphics involve the computation of harmonic fields on a manifold surface [9] . Let Cmin and Cmax be the sets of all minimal and maximal points on the surface, respectively. Introduce n×n square matrix A and n dimensional vector B such that X
198
Y. Xiong, G. Li, and G. Han
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 6. Mean curvature distribution on two models: (a) Isis model; (b) and (c) are the mean curvature distribution computed by the LBO of Liu et al. and the MLBO respectively; (d) The torus model; (e) and (f) are the mean curvature distribution computed by the LBO of Liu et al. and the MLBO
(a)
(b)
(c)
(d)
Fig. 7. Quadrilateral mesh smoothing: (a) and (b) are respectively the noisy model and the smoothing result after 10 iterations; (c) and (d) exhibit another example
if i = j and pi ∈ Cmin ∪ Cmax , ⎧ 1 ⎪ if i ≠ j and pi ∈ Cmin ∪ Cmax , Aij = ⎨ 0 ⎪ L(i, j ) otherwise, ⎩
and ⎧ 1 if pi ∈ Cmax , ⎪ Bi = ⎨− 1 if pi ∈ Cmin , . ⎪ 0 otherwise. ⎩
The following linear system
Af ( X ) = B
(12)
defines a scalar field f ( X ) . Fig. 8 illustrates some harmonic fields generated from (12) with matrix L defined by (8) for ball-joint and torus models (Fig. 8 (b)) on which a minimal point (blue) and a maximal point (red) are placed, respectively. The fields not only exhibit a good structure but also sustain the number of extreme points unchanged. As there are many negative coefficients, the LBO by Liu et al. [4] yields harmonic fields with bad X
X
X
Mean Laplace–Beltrami Operator for Quadrilateral Meshes
199
structure as shown in Fig. 8 (c). Though one can gain a smoother field by replacing the weights of the LBO in [4] with their absolute values, the number of the extremal points increases (See Fig. 8 (d)). It is a shortcoming for applications that require the number of extremal points unchanged. X
(a)
X
(b)
(c)
(d)
Fig. 8. Harmonic field creation: (a) Original quadrilateral meshes with a minimal (shown in blue) point and a maximal (shown in red) point; the harmonic fields established using our method (b), the LBO of Liu et al. (c), and the LBO of Liu et al. with all weights replaced with absolute values (d)
Eigenvectors of LBO matrices. Let λ be an eigenvalue of Laplace matrix L (see (10) ) and V be the corresponding eigenvector, namely X
LV = λV .
(13)
(a)
(b)
1
2
3
4
5
6
7
8
9
10
Fig. 9. Illustration of the first 10 eigenfunctions of an LBO matrix: (a) based on the MLBO and (b) based on the LBO of Liu et al.
200
Y. Xiong, G. Li, and G. Han
This indicates that each vertex of the mesh can be assigned a scalar value (the i-th entry of V is mapped onto the i-th vertex of the mesh). It then uniquely defines a piecewise function on the mesh by linearly interpolating these scalar values. The function is called an eigenfunction which has been widely used in quadriangulation [13] , mesh deformation [14] and mesh analysis [10] in the setting of triangular meshes. Fig.9 demonstrates the first 10 eigenfunctions of the matrix of LBOs defined by (8). We note that though both the weights of MLBO and the weights of Liu et al.’s LBO are symmetric (wij=wji), some eigenfunctions shown in Fig. 9(a) seem smoother than those shown in Fig. 9 (b). For example, Fig. 9(a5) and Fig. 9(a10) are smoother than Fig. 9 (b5) and Fig. 9 (b10). X
X
5 Conclusions A MLBO is proposed for estimating the LBO on quadrilateral meshes and the convergence of the MLBO is numerically verified using some examples. The weights of the MLBO are symmetric, simply computed and easily be migrated from negative ones to positive ones. Examples show that the MLBO exhibits good behavior in quadrilateral mesh smoothing and harmonic function generation. As future work, it is significant to analyze the convergence of the MLBO theoretically. It is also interesting to apply the MLBO to applications such as editing, recovering, spectral analysis, and solving geometric partial differential equations of quadrilateral meshes. Acknowledgments. The work described in the paper is supported by National Natural Science Foundations of China (Grant No. 60973084); National Science & Technology Pillar Program (Granted No. X2JS-B1080010); Natural Science Foundations of Guangdong (Grant No. 9151064101000106) Fundamental Research Funds for the Central Universities (Grant No. 2009zz0016) and Natural Science Foundation for the Youth of South China Univ. of Technology.
,
References 1. Sorkine, O.: Laplacian Mesh Processing. In: Proc. of Eurographics STAR, pp. 53–70 (2005) 2. Xu, G.: Discrete Laplace-Beltrami Operators and Their Convergence. Computer Aided Geometric Design 21(8), 767–784 (2004) 3. Desbrun, M., Meyer, M., Schroder, P., Barr, A.H.: Implicit Fairing of Irregular Meshes Using Diffusion and Curvature Flow. In: Proc. of SIGGRAPH, Los Angeles, California, USA, pp. 317–324 (1999) 4. Liu, D., Xu, G., Zhang, Q.: A Discrete Scheme of Laplace-Beltrami Operator and its Convergence over Quadrilateral Meshes. Computers and Mathematics with Applications 55(6), 1081–1093 (2008) 5. Meyer, M., Desbrun, M., Schroder, P., Barr, A.H.: Discrete Differential Geometry Operators for Triangulated 2-manifolds. In: Proc. of International Workshop on Visualization and Mathematics, Berlin, Germany, pp. 35–57 (2002)
Mean Laplace–Beltrami Operator for Quadrilateral Meshes
201
6. Zhang, Y., Bajaj, C., Xu, G.: Surface Smoothing and Quality Improvement of Quadrilateral/Hexahedral Meshes with Geometric Flow. In: Proc. of 14th International Meshing Roundtable, San Diego, CA, pp. 449–468 (2005) 7. Fujiwara, K.: Eigenvalues of Laplacians on a Closed Riemannian Manifold and its Nets. In: Proc. of AMS, vol. 123, pp. 2585–2594 (1995)
8. Alexa, M., Nealen, A.: Mesh Editing Based on Discrete Laplace and Poisson Models. In: Braz, J., et al. (eds.) VISAPP and GRAPP 2006. CCIS, vol. 4, pp. 3–28. Springer, Heidelberg (2007) 9. Dong, S., Kircher, S., Garland, M.: Harmonic Functions for Quadrilateral Remeshing of Arbitrary Manifolds. Computer Aided Geometric Design 22(5), 392–423 (2005) 10. Zhang, H., van Kaick, O., Dyer, R.: Spectral Methods for Mesh Processing and Analysis. In: Proc. of Eurographics STAR, pp. 1–22 (2007) 11. Floater, M.S.: Mean Value Coordinates. Computer Aided Geometric Design 20(1), 19–27 (2003) 12. Xu, G., Pan, Q., Bajaj, C.: Discrete Surface Modelling Using Partial Differential Equations. Computer Aided Geometric Design 23(2), 125–145 (2006) 13. Dong, S., Bremer, P., Garland, M., Pascucci, V., Hart, J.: Spectral surface quadrangulation. ACM TOG 25(3), 1057–1066 (2006) 14. Rustamov, R.M.: Laplace-Beltrami Eigenfunctions for Deformation Invariant Shape Representation. In: Proc. of the Fifth Eurographics Symposium on Geometry Processing, vol. 257, pp. 225–233 (2007)
Multi-user 3D Based Framework for E-Commerce Yuyong He and Mingmin Zhang State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058 {yuyonghe,zmm}@cad.zju.edu.cn
Abstract. The lack of attraction and visibilities of products in nowadays e-Commerce applications may lead users to reduce their interesting to purchase products online. It is very useful to adopt 3D virtual world into e-business websites to attract users. Based on the current rendering plug-ins, a collaborative protocol of Multi-user 3D System (M3DS) for virtual environment is designed. It is a set of messages which coordinate avatars interaction in virtual world. We also introduce the main technologies of rendering plug-ins which are used to present 3D information on the client side. Meanwhile, an architecture and application framework for multi-user 3D web applications is designed. Keywords: Collaborative Protocol, Browser plug-ins, Multi-user, e-Commerce.
1
Introduction
During recent decade, thousands and millions e-Commerce related applications are developed and deployed. The contents of most web sites are text based descriptions with static pictures. Some e-business web sites can provide dynamic generated recommendations when users surf those web sites for online shopping, but it is not so attractive [1]. With the new techniques and rapid developments of the internet, it will become more and more popular to develop web application with multi-user 3D virtual environment related technologies, such as, virtual demonstration space, 3D products and 3D avatars. For e-Commerce, the most important thing is to attract people to purchase products online as many as possible with real items [2]. Avatars in virtual world can provide assistance and advices when users are online [3][4]. There are several protocols designed in Virtual Reality field. Such as VRTP (Virtual Reality Transfer Protocol), DWTP(Distributed Worlds Transfer Protocol), DVRMP(Distributed Virtual Reality Multicast Protocol). VRTP was designed to support VRML only. It provided client, server, multicast streaming & network-monitoring capabilities [5]. DWTP is an application layer protocol.
Project supported by National High-tech R&D 863 Program (No. 2009AA062704), National Science and Technology-World Expo Special Support Program (No. 2009BAK43B07).
Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 202–213, 2011. c Springer-Verlag Berlin Heidelberg 2011
Multi-user 3D Based Framework for E-Commerce
203
Originally, it was designed to support large scale distributed virtual worlds, independent of the underlying network protocols. DVRMP ran on top of RTP and UDP. It is a protocol between transport layer and session layer. All DVRMP messages are multicast [6]. In this paper, related rendering plug-ins technologies are introduced, a collaborative protocol of Multi-user 3D System (M3DS) virtual environment is designed, the high level architecture is presented, and the implemented application of multi-user 3D system by using those technologies are proposed.
2
Rendering Plug-ins
Over the last 15 years, there have been many protocols, systems and proposals related to 3D in the web. Most of them have disappeared, but today, there still are a lot of protocols or proposals existing. Most of them are following the traditional browser plug-in-based approach [7]. Below is the introduction main technologies existing for rendering plug-ins. 2.1
Flash and PaperVision
Adobe Flash Player [8] is a multimedia platform which is developed and distributed by Adobe Systems. The first version was published in 1996. Right now, Flash has become a popular method for adding animation and interactivity functions on the web pages. And it is commonly used to create animations and various multimedia components on web pages. It can integrate video/audio into web pages. By using this technology, content rich Internet applications can also be developed. Before Adobe Flash Version 10 [8], there was no real-time 3D support at all in Flash. But authoring tools, such as Anark [9], Cult3D [10] and Director support 3D elements natively and make it easy to incorporate 3D contents into web based 2D-movies. Until Adobe Flash Version 10, flash was only known how to display 2D vector objects and how to calculate math expressions. Simple 3D rendering systems had been built by exploiting these 2D vector objects, like PaperVision3D does. The results are very impressive, even they are very limited to simple 3D objects and effects. Adobe Flash Version 10 includes simple 3D transformations and objects, but they are very limited, and they are only designed for simple 3D composite and GUI effects. However, the PaperVision developers already started to update their systems to fulfill the new functions and features. The new application will be called as PaperVision X. The results on the 2D render pipeline have been produced very impressively [11]. 2.2
Silverlight
Silverlight was developed by Microsoft, and is based on Dot Net framework. It is a programmable web browser plug-in that enables 3D features, such as animation, video/audio playback and vector graphics. Microsoft considered Silverlight as
204
Y. He and M. Zhang
Flash alternative. There are Windows and Mac version for Silverlight plug-in, but the number of users and installations are much smaller compared to Flash. The reason is that Microsoft follows the development of Flash very similar. There is no native 3D support in Silverlight until the last Version 3.0. Developers faked 3D content using similar techniques like in Flash before Version 3.0. There are new features in Silverlight 3 called Perspective Transforms [12]. They allow transforming 2D objects into a 3D coordinate system, but they are no real 3D shapes. 2.3
Java3D, JOGL and JavaFX
The Java3D [13] is a scene-graph system that incorporates the VRML/X3D design. It was wide used in desktop applications development too. Its power was never really utilized for the web. Today, Java3D is no longer supported by Sun. Sun gave up the high-level Java3D library, and now it provides a lower level interface. This lower level interface provides direct language-bindings for the OpenGL interface. It is called JOGL which is used in the Xj3D runtime. JavaFX is announced in 2008 officially. It is the last effort Sun Corp. made to build an alternative technology to Flash based on Java. It offers similar media and 2D elements. However, it does not include any 3D support officially. 2.4
O3D
O3D [14] is proposed by Google. It provides graphics APIs for creating interactive 3D applications within a browser. There are two layers. The lower level APIs are implemented in C/C++. It acts as browser plug-in, and it provides a geometry and shader abstraction which is mapped to OpenGL or DirectX. The higher level APIs are implemented in JavaScript, and it provides the APIs which are similar to Java3D, OpenSG, or C3DL. The scene-graph model the system provides is close to the standards, such as X3D, but it does not provide a method to define the content in a declarative way. It is application developers tasks to use JavaScript to build and develop the graph-content. However, there are several offline processing tools, which allow transferring declarative 3D data into JavaScript calls to build the tree structure. The system provides basic functions as well. They are similar to other scene-graph APIs, for picking and culling. Those functions are implemented in the JavaScript layer. So they are slower compared to native implementations provided by high-level runtime abstractions. For O3D, performance might be an issue because the O3D model requires programmers to implement all parts of the application including logic and behavior in JavaScript. 2.5
X3D
X3D is an ISO standard [15] .It is IO-devices independent, portable and supports multiple data file-encodings. X3D can describe an abstract functional behavior
Multi-user 3D Based Framework for E-Commerce
205
which is time-based, interactive 3D, multimedia information. One of interesting points is it supports a multi-parent scene- and event graph. It has the ability to encode the scene using XML syntax, Open Inventor, or by using a binary encoding. It is the successor of VRML97 with a lot of new and extended features and components, such as NURBS, GeoVRML, Humanoid Animation, and so on. The X3D specification describes various internal and external APIs and a full runtime, event and behavior model, so it is not a simple exchange format of VRML97. X3D defines several profiles for kinds of levels of capability which are X3D Core, X3D Interchange, X3D Interactive, X3D CAD Interchange, X3D Immersive, and X3D Full. For X3D, its browser integration model allows running plug-ins inside a browser. The mechanism is that the browser holds the X3D scene internally, and the application developers can update and control the content through the Scene Access Interface (SAI). The standard already defines an integration model for DOM-Nodes. It is part of SAI. XMT-A is a subset of X3D. It is defined in MPEG-4 Part 11 as a link between X3D and 3D content in MPEG-4 (BIFS). The abstract specification for X3D (ISO/IEC 19775) was first approved in 2004, the XML and Classic VRML encodings for X3D (ISO/IEC 19776) were approved in 2005. Fig.1 shows 3D data-encodings and programming language bindings.
Fig. 1. X3D data-encodings and programming language bindings
2.6
Collada
Collada [16] is a 3D-file standard for content exchange developed by the Khronos group. Originally, it was developed to improve the interoperability of DAE-tools
206
Y. He and M. Zhang
by Sony and Intel. The main goal was to simplify and streamline the game development process and existing game creation pipelines. The specification does not include a runtime-model or event-model, which would allow defining interactive elements or the behavior and content. So it is an intermediate format to be used in the creation pipeline combine with a final deployment format, such as X3D.There are some tools, such as GoogleEarth, which uses Collada data-files directly to define content in the runtime-environment.
3
Architecture
In 3D web application, the 3D scenes can be displayed smoothly by utilizing the above plug-ins techniques if the scenes data and interactive and collaborative messages can be delivered correctly. To support the real time 3D web applications with multi-user, the architecture of Multi-user 3D System Framework is designed as Fig.2 based on event driven.
Fig. 2. Architecture of Multi-user 3D System Framework
The core part in this architecture is the management sub-framework. It is designed to supporting multi-user 3D virtual environment which is based on B/S structure. The management sub-framework includes resource management, scene management, time management and collaboration management.
Multi-user 3D Based Framework for E-Commerce
3.1
207
Resource Management
All resources in M3DS is marked by URL (Uniform Resource Location). It means the 3D support server can retrieve the resources on other 3D servers through URL. Therefore, the resources in M3DS includes 3D objects which can be added or deleted in the scene on the fly runtime, the communication information which used to trace or exchange the behavior status of 3D objects in real time etc. 3.2
Scene Management
Through scene management, system can dispatch the scenes dynamically. It also can exchange the scenes data even they residences on different servers. It allows users to switch the scene without realizing where the scene store since it is done by scene management service. So the main functions in scene management module are scene information exchange and switch the scene seamlessly. 3.3
Time Management
Time Management service provides a mechanism to synchronize the resources among the 3D servers. It synchronizes all shared resources within the same time frame even they are on the different servers. On the other hand, the events between various 3D objects in the scene can be kept in sequence in order to get response correctly. In M3DS, the sequences of event processing are event receiving sequence, priority sequence and timestamp sequence. Those time-related events must be handled correctly so that user can cooperate smoothly in the virtual environment. 3.4
Collaboration Management
The collaborative information, which presented by avatars in multi-user 3D virtual environment, need to be processed well. Collaboration management service provides interfaces to response for it. The service includes managing collaboration rules, schedules, responses (especially within in shared scene environment). It also maintains the consistence of all client agents data. Each client in the system has a corresponding agent. The agent manage shared objects in the virtual environment, and agent also can drive those object based on the status. Therefore, those agents have the overall view of objects in real time. Based on the knowledge of the system, it can determine if there is collaboration relationship between objects. If there is, then it should know what the result is. Meanwhile, it can update the objects status and notify all related clients to refresh related scene and objects in the shared environment. The main functions of collaboration management service in M3DS are authorization, collaboration control and data store or update control.
208
3.5
Y. He and M. Zhang
Collaborative Protocol
After analyzing the architecture of Multi-user 3D system based on the internet, the corresponding protocol for collaboration can be defined. The model of this protocol is depicted as in Fig.3.
Fig. 3. Collaborative Protocol Model in Multi-user 3D System
Collaborative protocol is consisted of a group of messages. These messages transmit the information between the browser and the 3D server, or/and among the 3D servers. Message itself is not an object of scenes or events, but it contains scene or event objects. The type of transmitted message indicates what kinds of objects it contains.
4
Collaborative Protocol Specification
The format of message is XML definition. There are 6 types of messages in the collaborative protocol. M3DSEvent defines all kinds of events which transmitted among 3D servers, between 3D servers and browsers with rending plug-ins. M3DSReq is for defining condition for search queries. M3DSRes is the response message if an M3DSReq is sent. M3DSRSS and M3DSUnRSS are messages used to ask for subscribing interesting events, and cancel the registered M3DSRSS.
Multi-user 3D Based Framework for E-Commerce
209
M3DSAck is an acknowledgement to verify if the message in the system is valid. The validation is done by the mechanism which based on XML DTD or schema validation. 4.1
Header
There is always a header in every message, and it has six elements. We define: Header = {Version, MsgID, TimeStamp, Validate, Source, Destination}. Where, Version indicates messages version. MsgID is a unique number in the system. It can be generated by either application or 3D servers. TimeStamp is the date and time that the message is generated. Validate indicates the valid existing period of the message. Source is the unique address of the sender, and Destination is the unique address of the receiver. Usually, they are machines IP addresses with or without port number. 4.2
M3DSEvent
System delivers events to 3D servers or web applications. The events include attending the scene site event, change status of objects, leave the scene event, and so on. An M3DSEvent consists of a Header element and a list of Event. Event can be defined as a combination of event ID, event Name, event Type, event Content and event time. Where, event ID is the predefined IDs for events. User can use the default event name or give a new name. Event Time is when the event occurs. EventContent is the detail of the event which contains a list of parameters and its values. The diagram of M3DSEvent message is shown in Fig.4.
Fig. 4. M3DSEvent MSG diagram
4.3
M3DSReq
M3DSReq message is designed for transmitting search criteria for all kinds of search events. For instance, search a specified scene including the dynamic or static objects in the scene and their properties.
210
Y. He and M. Zhang
Except Header element in M3DSReq as usual, there is another element which called Queries. Queries contains one or more than one Query requests. It is expressed as Query: {Object, Conditions}. Object is the type of the query result that wants to be returned. It includes object name (OName) and it type (OType). Conditions is a list of search criteria. ConditionType is the logical relationship between conditions which values are and, or, not. Condition element includes Value, Operator and Element. Inside, Element must have the same type which indicated in the OType, otherwise, an error will occur. Operator is a set of operation: {Equal to, Not Equal to, Great Than, Less Than, Like}. The structure of M3DSReq message is shown in Fig.5. 4.4
M3DSRes
M3DSRes is a response of M3DSReq correspondingly. It contains three elements which are Header, ResID and Responses. ResID is an indicator of request message. Responses is a set of query results. In each Response, there are Object and Records elements. And Records contains one or more Record with values. The structure of M3DSRes message is depicted inFig.6.
Fig. 5. M3DSReq MSG diagram
4.5
Fig. 6. M3DSRes MSG diagram
M3DSRSS/M3DSUnRSS
M3DSRSS is used for subscribing for event objects or data objects which users are interested from 3D servers. When the requested events occur or change happened, 3D server will send the updated information to subscribers. Vice versa, M3DSUnRss is for canceling subscriptions. In M3DSRSS, except the Header element, there is an RSS element. It consists of one or more than one Orders. Order element includes UserID, Session, OName, OrderID, Element. 3D server will determine if accept the subscription according to its access control list based on UserID and authorization information which resident in the Session element. The Object indicates what kind of event or
Multi-user 3D Based Framework for E-Commerce
211
data that user is interested. Element is the condition to specify when to send information to users. The detailed structure of M3DSRSS is shown in Fig.7. M3DSUnRSS message includes Header and UnRSS elements. UnRSS contains one or more than one UnOrder elements. When an 3D server receives a M3DSUnRSS message, it will accept unsubscribing based on its access control list with the information UnOrder provides. Fig.8 shows the structure of M3DSUnRSS.
Fig. 7. M3DSRSS MSG diagram
4.6
Fig. 8. M3DSUnRSS MSG diagram
M3DSAck
M3DSAck message is an acknowledgement. It indicates if the message sends to 3D server or application is valid or not. M3DSAck includes Header, AckID and Ack elements. AckID is the unique ID. Ack is the details of acknowledgement. It contains AckCode and related description. Fig.9 shows the structure of M3DSAck message.
Fig. 9. M3DSAck MSG diagram
5
Implementation
Multi-user 3D Framework can be implemented in web 3D e-Commerce applications with rendering plug-ins techniques. Compare to traditional e-Commerce applications, web 3D e-Commerce applications provide more visualization and interactivities [17]. In 3D e-Commerce system, one avatar presents one user. And the shops are displayed in 3D virtual environment. When users are shopping in this virtual environment, through avatars, they can communicate with
212
Y. He and M. Zhang
Fig. 10. Screen Shots of 3D e-Commence demonstration
shop owners and other buyers within the same scene in real time. They also can look at merchandise items in 3D model. It enhances the realities which is similar to real shopping in the mall. Fig.10 is the screen shots of a demonstration based on Multi-user 3D Framework.
6
Conclusion and Future Work
Multi-user 3D virtual environment provides new ways for people surfing on the internet. Based on the rendering plug-ins technologies and the analysis of the requirements of Multi-user virtual environment, a collaborative protocol is designed, and the related M3DS architecture is presented as well. It can be used to build 3D virtual web application for e-business to attract more customers shopping online since it gives the real shopping realities. In the future, we should research on integrating the rendering technologies into the browser architecture directly, or fake 3D renderer through 2D pipelines. Through these technologies, the performance issue might be solved significantly. Also, we should improve and perfect the protocol based the business requirements and the feedback of the usage.
Multi-user 3D Based Framework for E-Commerce
213
References [1] Liu, Z.: An Architecture of Intelligent Virtual Avatars for E-Business. In: The 3rd International Conference on Innovative Computing Information and Control (ICICIC 2008), pp. 214–214. IEEE, Los Alamitos (2008) [2] Chen, T., Pan, Z., Zheng, J.-m.: EasyMall - An Interactive Virtual Shopping System. FSKD (4), 669–673 (2008) [3] Liu, Z., Pan, Z.: An Emotion Model of 3D Virtual Characters in Intelligent Virtual Environment. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 629–636. Springer, Heidelberg (2005) [4] Zhang, X., Zhong, S., Pan, Z., Wong, K., Yun, R. (eds.): Edutainment 2010. LNCS, vol. 6249. Springer, Heidelberg (2010) [5] Pan, Z., Chen, J.X.: VR-based edutainment. Virtual Reality 12(1), 1 (2008) [6] Pan, Z., He, G., Su, S., Li, X., Pan, J.: Virtual Network Marathon: FitnessOriented E-Sports in Distributed Virtual Environment. In: Zha, H., Pan, Z., Thwaites, H., Addison, A.C., Forte, M. (eds.) VSMM 2006. LNCS, vol. 4270, pp. 520–529. Springer, Heidelberg (2006) [7] Yao, J., Pan, Z., Zhang, H.: A Distributed Render Farm System for Animation Production. In: Natkin, S., Dupire, J. (eds.) ICEC 2009. LNCS, vol. 5709, pp. 264–269. Springer, Heidelberg (2009) [8] http://www.adobe.com/products/flashplayer/ [9] http://www.anark.com/ [10] Yun, R., Zhang, B., Pan, Z.: Research on Using Cult3D and Java to Realize Virtual Assembly. In: Chang, M., Kuo, R., Kinshuk, Chen, G.-D., Hirose, M. (eds.) Learning by Playing. LNCS, vol. 5670, pp. 363–370. Springer, Heidelberg (2009) [11] Pan, Z., Zhang, X., El Rhalibi, A., Woo, W., Li, Y.: Edutainment 2008. LNCS, vol. 5093. Springer, Heidelberg (2008) [12] Shao, P., Liao, W., Pan, Z.: Adopting Virtual Characters in Virtual Systems from the Perspective of Communication Studies. T. Edutainment 2, 70–89 (2009) [13] https://java3d.dev.java.net/ [14] http://code.google.com/apis/ [15] http://www.web3d.org/x3d/specifications/ [16] Arnaud, R., Barnes, M.: Collada: Sailing the Gulf of 3d Digital Content Creation, 1st edn. AK Peters, Wellesley (August 30, 2006) ISBN-13: 978-1568812878 [17] Sun, C., Pan, Z., Li, Y.: SRP Based Natural Interaction between Real and Virtual Worlds in Augmented Reality. In: CW 2008, pp. 117–124 (2008)
Coordinate Model for Text Categorization Wei Jiang and Lei Chen School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
[email protected]
Abstract. Nowadays most of text categorization algorithms use vector space model. It can not make full use of position of terms, and the position brings much semantic information. This paper proposes a coordinate model. By using this model, the terms’ position information can be utilized. In this model, some central terms are selected as origin and a multidimensional space is built, other words will be put into this space by their position relative to these origin. In our experiment, we present a boost algorithm based on coordinate model. The result shows that there are much information can be mined from coordinate model. Keywords: text categorization, information retrieval, boost algorithm.
1 Introduction Text categorization is the task of classifying texts into categorizations, that is assigning a Boolean value to each pair
d j , ci ∈ D × C , where D is a domain of documents
and C is a set of categories. A value of TRUE assigned to document
d j , ci indicates that the
d j belongs to the categorization ci , while a value of FALSE indicates that
d j does not belong to ci . For the information retrieval to be efficient, the documents must be transformed into a suitable representation. There are three classic models for this purpose [1][2]. (1)
(2)
(3)
Set-theoretic models represent documents as sets of words or phrases. Similarities are usually derived from set-theoretic operations on those sets. Common models are: Standard Boolean model, Extended Boolean model and Fuzzy retrieval. Algebraic models represent documents and queries usually as vectors, matrices, or tuples. The similarity of the query vector and document vector is represented as a scalar value. Common models are: Vector space model, Generalized vector space model and Extended Boolean model. Probabilistic models treat the process of document retrieval as a probabilistic inference. Similarities are computed as probabilities that a document is relevant for a given query. Probabilistic theorems like the Bayes' theorem are often used in these models [3][4].
Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 214–223, 2011. © Springer-Verlag Berlin Heidelberg 2011
Coordinate Model for Text Categorization
215
Now most of text categorization methods use vector space model to index the document and mine information from those vectors. For example, one vector can be
d j = w1 j , " , w T
j
which represents one document, where the T is the set of
terms that occur at least once in at least one document. The term is one word or one phrase, the term weight usually ranges between 0 and 1. If the weights are binary, then 1 denotes presence and 0 absence of the term in the document. If the weights are not binary, the value usually represents some frequency information, and the tfidf function is usually used, it is defined as one term,
tfidf (t k , d j ) = # (t k , d j ) ⋅ log
Tr
# Tr (t k )
, where
t k is
# (t k , d j ) denotes the number of times t k which occurs in document d j ,
Tr is training set which is a set of documents, by observing the characteristics of these documents, the classifier is built, and # Tr (t k ) denotes the number of documents in Tr in which t k occurs. It can be found that the traditional vector space model ignores the term’s position, the vector space model only considers the presence or absence of the term in the document and some frequency information. So in this paper, our purpose is to do an attempt at mining information from term’s position by use of “coordinate model”. In this paper, we first discuss the coordinate model in Section 2. And then Section 3 gives a boosting algorithm based on “coordinate model”. Section 4 reports the experiments and results. Finally, Section 5 summarizes the main findings in our study and come to conclusions.
2 Coordinate Model For using the term position information, we need build coordinate. For building coordinate, we must fix on the origin firstly, then some word’s position can be indicated by the distance between the word and the origin. In Coordinate model, the origin is one term; other terms’ positions can be represented by the distances from the origin to the terms. As shown in fig.1, the word1, word2 and word3 are origins. We suppose that when word x occurs in one scope of the coordinate, it will have special meaning and this can help us to decide which categorization the text should belong to. For example, in these three sentences: {“I love this game!” “I love basketball!” “Basketball is a good game!”}, where we use the term “love” as origin, because “love” occurs in two sentences, the positions of “I” is (-1,-1), the positions of “game” is (2,max)(max is a large number which means that the term is absent in this sentence). While one term is selected as origin, the other terms’ positions form a model. For above example, the model of “love” is {[I, (-1,-1)], [this,(1,max)], [game, (2,max)], [basketball, (max,1)], [is, (max, max)],[a, (max, max)], [good, (max, max)]}. If we only regard some ones of the other words, we’ll get a sub-model. For example, while we only regard “game” and “basketball”, we get a “love”’s sub-model with “game” and “basketball”, which is {[game, (2, max)], [basketball, (max, 1)]}. We think that the
216
W. Jiang and L. Chen
Word x’s scope
word2
word3
Fig. 1. Instance of coordinate model
semantics of one term depends on which different terms occur beside this term and where they are. And in a document, while some sub-model of some terms accord with some hypothesizes, we can file this document under some categorizations.
3 Boost Algorithm Based on Coordinate Model Finding a useful sub-model of one term with some of the other terms is not an easy task, so our approach is that we only find one term’s sub-model with another term in every round (this kind of sub-model can be named 1-sub-model, and a sub-model of one term with n different terms can be named n-sub-model), then every n-sub-model will be a combination of some 1-sub-model. Finally some hypothesizes will be mined from those sub-model. By using 1-sub-model, we can find a weak hypothesis which may be only moderately accurate. Boosting algorithm can combined some weak hypothesizes into one powerful hypothesis, and because this algorithm can be combined with other new text classification methods, it have been successfully used to information retrieval and NLP [5][6][7][8], so we adopt boosting algorithm in our experiment. 3.1 Adaboosting.MH Adaboosting.MH was introduced by Schapire and Singer [9]. We give this algorithm firstly.
Coordinate Model for Text Categorization
Let X denote the domain of training set, let Y denote a set of labels and
217
k = Y is
{(x1 , y1 ),", (xm , y m )}, where xi ∈ X and y i is an array whose size is k, while yi [ j ] = +1 , xi belongs to the jth classes of Y, if xi does not belong to the jth classes, yi [ j ] = −1 . Let Dt be a two-dimension array, and Dt (i, j ) is the weight over the ith document and the jth
the size of Y. Let S be a sequence of training examples
class. The algorithm is: (1) Given: S. (2) Initialize D1 (i, (3) For
j ) = 1 (mk ) where m is size of S.
t = 1," , T ;
Dt to weak leaner. b. Get weak hypothesis ht . c. Choose α ∈ ℜ . a. Pass distribution
d. Update:
Dt +1 (i, j ) =
Dt (i, j ) exp(− α t Yi [ j ]ht ( xi , j )) Zt
m
Z t = ∑∑ Dt (i, j ) exp(− α t Yi [ j ]ht ( xi , j )) . i =1 j∈Y
Where the sign of
ht ( xi , j ) represents whether or not the label j is assigned to xi , the
magnitude of the prediction the prediction. When
ht ( xi , j ) is interpreted as a measure of “confidence” in
Dt is updated, the weights of document-label pairs misclassified
ht increase.
by
(4) Output the final hypothesis: T
f ( x, j ) = ∑ α t ht ( x, j ) t =1
3.2
Real Abstaining Adaboosting.MH
Let w denote a term, and “ w ∈ x ” means that w occurs in document x, and let
For
b ∈ {− 1,+1}, let
X 0 = {x : w ∉ x} , X 1 = {x : w ∈ x}
(1)
m
Wb j = ∑ Dt (i, j ) f (i, j , b ) i =1
(2)
218
W. Jiang and L. Chen
where
⎛ 1, (xi ∈ X j ∧ Yi [ j ] = b ) = true f (i, j , b ) = ⎜⎜ ⎝ 0, (x i ∈ X j ∧ Yi [ j ] = b ) = false
W0 =
∑ D (i, j )
i: xi ∈X 0
(3)
t
In this algorithm the hypothesis h makes predictions of the form:
⎧ 0, w ∉ x h(x, j ) = ⎨ ⎩c j , w ∈ x where
cj =
(4)
1 ⎛ W+1j ⎞ ⎟ ln⎜ 2 ⎜⎝ W−1j ⎟⎠
And it can be gotten that:
Z t = W0 + 2∑ W+1j W−1j
(5)
j∈Y
In every round, this algorithm searches each term w and chooses the term which the value of
Zt
is smallest for getting the best hypothesis. In our experiment, one term is
one word. 3.3 Real Abstaining Adaboosting.MH Based on Coordinate Model In this algorithm, we defined pattern P of the form: ( w1 ,
w2 , begin, end) where w1
w2 are two different terms, “begin” and “end” are two integers. Now we can say that a document matches a pattern, and denotes it as P ∈ x , while in the document x there is w1 ’s 1-sub-model with w2 : { w2 , ( p1 ,", p2 ) } and at least one pi ∈ [begin, end ](1 ≤ i ≤ n ) . and
[
]
For example, for the sentence “I love this game!”, we use “love” as origin to make coordinate, so the position of “game” is 2. Now we can say that this sentence matches the pattern (love, game, -1, 3) and dose not matches the pattern (love, game, -2, 1). Then we can let:
X 0 = {x : P ∉ x}, X 1 = {x : P ∈ x}
(6)
Now we can get another hypothesis:
⎧ 0, P ∉ x ht = ⎨ ⎩C j , P ∈ x
(7)
Coordinate Model for Text Categorization
where
cj =
219
1 ⎛ W+1j ⎞ ⎟ ln⎜ 2 ⎜⎝ W−1j ⎟⎠
In every round, we search every possible pattern P and choose a P with the smallest Z t . In our experiment, for the sake of saving time, we set the biggest interval of [begin, end] as [-5, +5]. Note that if we allow that w1 and w2 are the same term, the pattern ( w1 , w1 , 0, 0) can be chosen. In this case, this algorithm’s hypothesis can be identical to Real abstaining Adaboosting.MH’s hypothesis. We call this algorithm Real abstaining Adaboosting.MH based on Coordinate Model 2. It can be seen as a combination of Real abstaining Adaboosting.MH and Real abstaining Adaboosting.MH based on Coordinate Model. 3.4 Real Abstaining Adaboosting.MH Based on Common In this algorithm, we have a different pattern P of the form: ( w1 ,
w2 ) where w1 and
w2 are two different terms. We use P ∈ x to denote that the document x matches the pattern P, while in the document x, w1 and w2 occur in a same sentence at least once. By the same way, we let:
X 0 = {x : P ∉ x}, X 1 = {x : P ∈ x} ⎧ 0, P ∉ x ht = ⎨ ⎩C j , P ∈ x
(8) (9)
1 ⎛ W+1j ⎞ ⎟, where c j = ln⎜⎜ 2 ⎝ W−1j ⎟⎠ and search every possible P for getting the P with the smallest
Z t in every round.
4 Experiment Results 4.1 Algorithms We will use the three algorithms described above. For it may happen that is zero, so let:
cj =
1 ⎛ W+1j + ε ln⎜ 2 ⎜⎝ W−1j + ε
⎞ ⎟⎟ ⎠
W+1j or W−1j
(10)
220
W. Jiang and L. Chen
where
ε=
1 mk
When we choose the best pattern in “Real abstaining Adaboosting.MH based on Coordinate Model”, it may well happen that we find more than one patterns with the smallest Z t . We have two options: choosing the patterns which have the smallest interval (that is, its “end-begin” is smallest), or choosing the patterns with biggest interval. So we have four different algorithms: “Real abstaining Adaboosting.MH based on Coordinate Model-small pattern” and “Real abstaining Adaboosting.MH based on coordinate model-big pattern” as well as “Real abstaining Adaboosting.MH based on Coordinate Model 2-small pattern” and “Real abstaining Adaboosting.MH based on coordinate Model 2-big pattern”. We respectively define “Real abstaining Adaboosting.MH”, “Real abstaining Adaboosting.MH based on coordinate model-small pattern”, “Real abstaining Adaboosting.MH 2 based on coordinate model-small pattern”, “Real abstaining Adaboosting.MH based on coordinate model-big pattern”, “Real abstaining Adaboosting.MH 2 based on coordinate model-big pattern” and “Real abstaining Adaboosting.MH based on Common” as “Ada”, “Ada-csm-s”, “Ada-csm2-s”, “Ada-csm-b”, “Ada-csm2-b” and “Ada-c”. 4.2 Evaluation Measures Firstly, defining two functions:
exp ert (x, c j ) : if document x be classified under c j
(
)
by the human expert, it returns true, else it returns false; machine x, c j : if x be classified under c j by the computer it returns true, else it returns false. Classification effectiveness is usually measured in terms of precision
ρ . Precision π j
π
and recall
is the probability that if a random document x is classified under c j ,
this decision is correct; recall
ρj
is the probability that if a random document x ought
to be classified under c j , this decision is taken. Precision and recall can be obtained as [10]:
{x : machine(x, c ) = true ∧ exp ert (x, c ) = true} {x : machine(x, c ) = true} {x : machine(x, c ) = true ∧ exp ert (x, c ) = true} = {x : exp ert (x, c ) = true}
πj =
j
j
(11)
j
ρj
h
j
(12)
j
4.3 Test Corpora In our experiment, we obtain training set and test set from Reuters-21578 [11] which collected documents from Reuters newswire in 1987. We choose documents of which
Coordinate Model for Text Categorization
221
the “body” attribute is not empty. We put the documents into training set or test set according to whether its “LEWISSPLIT” attribute is “train” or “test”. “Function words” and “low document frequency words” which only occur in at most three training documents are removed. We use Exhaustive search method to search every possible pattern in Real abstaining Adaboosting.MH based on coordinate model, so its space or time requirement is very large. Because of this, we only use two small corpora in which there are hundreds of documents. 4.4 Experiment Using Corpora-1 Corpora-1 contains 608 documents (438 train documents and 170 test documents) which are classified under at least one of the five topics: “bop”, “gas”, “soybean”, “gold” and “oil”. The value of m*k is small, so our four algorithms only run 50 rounds. The results on this corpora-1 are listed in Table 1. Table 1. Results for the corpora-1
topics oil gold soybean gas bop topics oil gold soybean gas bop
Ada prec. reca. 1.0 0.972 0.962 0.893 0.793 0.793 0.953 0.953 0.964 0.964 Ada-csm2-s prec. reca. 1.0 0.972 0.962 0.893 0.778 0.724 0.953 0.953 0.964 0.964
Ada-csm-s prec. reca. 0.732 1.0 1.0 0.536 0.85 0.586 1.0 0.698 1.0 1.0 Ada-csm2-b prec. reca. 1.0 0.972 0.962 0.893 0.808 0.724 0.953 0.953 0.964 0.964
Ada-csm-b prec. reca. 0.732 1.0 1.0 0.536 0.895 0.586 1.0 0.698 1.0 1.0
Ada-c prec. 0.933 0.4 0.5 0.614 0
reca. 0.197 0.929 0.103 0.628 0
This experiment shows that some precisions and recalls of Ada-csm-s and Ada-csm-b are better than Ada’s, it tells us that some information can be gotten from coordinate model, but can not be gotten from vector space model. By comparing Ada-csm-X with Ada-c, we can find that the information mined from word’s position is different from that mined from words’ common occurrence. 4.5 Experiment Using Corpora-2 In this experiment, we selected five topics: “gnp”, “coffee”, “sugar”, “oilseed” and “supply”. There are 510 documents in training set and 161 documents in test set. We use this corpora to make two experiments. Firstly, we run our algorithms 100 round, and the results are listed in Table 2. Secondly, we try to increase the speed of Ada-csm-X and Ada-c. In every round of Ada-csm-X(X=s or b) and Ada-c, we first choose a word by the same way that
222
W. Jiang and L. Chen
Adaboosting.MH find the terms, and then use this word as the origin. In this way, we needn’t search every word to find the best origin, so we can save much time, we only can get a good solution, not the best solution though, in this way.These algorithms run 1000 round, the results are listed in Table 3. Table 2. Results for the corpora-2
topics gnp coffee sugar oilseed supply topics gnp coffee sugar oilseed supply
Ada prec. reca. 0.970 0.941 0.964 1.0 1.0 0.971 0.976 0.909 0.913 0.913 Ada-csm2-s prec. reca. 1.0 0.912 1.0 1.0 1.0 0.971 0.976 0.932 0.957 0.957
Ada-csm-s prec. reca. 0.967 0.853 1.0 0.889 1.0 0.657 0.759 1.0 0.909 0.870 Ada-csm2-b prec. reca. 1.0 0.912 1.0 1.0 1.0 0.971 0.976 0.932 0.957 0.957
Ada-csm-b prec. reca. 0.966 0.824 1.0 0.889 1.0 0.657 0.754 0.977 0.909 0.870
Ada-c Prec. 1.0 0.722 0.556 1.0 0.48
reca. 0.324 0.481 1.0 0.25 0.522
Table 3. Results of fast algorithm for corpora-2
topics gnp coffee sugar oilseed supply
Ada prec. 1.0 0.964 1.0 0.976 0.955
reca. 0.941 1.0 0.971 0.909 0.913
Fast-Ada-csm-s prec. reca. 0.968 0.882 1.0 0.852 1.0 0.686 0.774 0.932 0.88 0.957
Fast-Ada-csm-b prec. reca. 0.969 0.912 1.0 0.889 1.0 0.714 0.919 0.773 0.846 0.957
Fast-Ada-c prec. reca. 1.0 0.588 1.0 0.963 1.0 0.971 0.945 0.795 0.815 0.957
5 Conclusion The experiment in this paper shows that there is some information can be mined from words’ position. We should find more powerful methods to mine those hypothesizes from coordinate model, and find more efficient algorithms to fix on those origins.
References 1. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002) 2. Zhu, J.B., Wang, H.Z., Zhang, X.J.: Advances in Machine Learning Based Text Categorization. Journal of Software 17(9), 1848–1859 (2006) 3. Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1(1), 69–90 (1999)
Coordinate Model for Text Categorization
223
4. Bigi, B.: Using Kullback-Leibler distance for text categorization. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 305–319. Springer, Heidelberg (2003) 5. Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and system Sciences 55, 119–139 (1997) 6. Escudero, G., Màrquez, L., Rigau, G.: Boosting applied to word sense disambiguation. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 129–141. Springer, Heidelberg (2000) 7. Cai, L., Hofmann, T.: Text Categorization by Boosting Automatically Extracted Concepts. In: Proc. of the 26th Annual Int. ACM SIGIR Conference on Research and Development in Informaion Retrieval, Toronto, Canada, pp. 182–189. ACM Press, New York (2003) 8. Shi, L., Weng, M., Ma, X.M., Xi, L.: Rough Set Based Decision Tree Ensemble Algorithm for Text Classification. Journal of Computational Information Systems 6(1), 89–95 (2010) 9. Schapire, R.E., Singer, Y.: BoosTexer: A Boosting-based System for Text Categorization. Machine Learning, 135–168 (2000) 10. Yang, Y., Liu, X.: A re-examination of text categorization methods. In: The 22nd Annual International ACM SIGIR Conference on Research and Development in the Information Retrieval, pp. 42–49. ACM Press, New York (1999) 11. http://www.daviddlewis.com/resources/testcollections/ reuters21578
An Interface to Retrieve Personal Memories Using an Iconic Visual Language Rui Jesus2, Teresa Romão1, and Nuno Correia1 1
CITI, Departamento de Informática, Faculdade de Ciências e Tecnologia, FCT, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal 2 Multimedia and Machine Learning Group, Instituto Superior de Engenharia de Lisboa Rua Conselheiro Emdio Navarro nº1, Lisboa, Portugal
[email protected], {tir,nmc}@di.fct.unl.pt
Abstract. Relevant past events can be remembered when visualizing related pictures. The main difficulty is how to find these photos in a large personal collection. Query definition and image annotation are key issues to overcome this problem. The former is relevant due to the diversity of the clues provided by our memory when recovering a past moment and the later because images need to be annotated with information regarding those clues to be retrieved. Consequently, tools to recover past memories should deal carefully with these two tasks. This paper describes a user interface designed to explore pictures from personal memories. Users can query the media collection in several ways and for this reason an iconic visual language to define queries is proposed. Automatic and semi-automatic annotation is also performed using the image content and the audio information obtained when users show their images to others. The paper also presents the user interface evaluation based on tests with 58 participants. Keywords: Personal Memories, User Interfaces, Visual Languages, Image Retrieval.
1 Introduction Pictures are one of the richest ways to register and to preserve personal experiences. Currently, due to the advances in digital technology, their capture and storage are easier and very popular, even among people that are not very familiar with technology. Consequently, more experiences are being preserved but it also becomes more difficult to later retrieve media information of an event. Besides, these images are collected in a disorganized way without any type of annotation [5]. To recall this information, humans must remember something about that experience, e.g., the location, the date or the people around. Due to the diversity of the clues provided by our memory and the richness of the visual content, tools to search for personal media should provide mechanisms to use different types of queries including, keywords, images examples, sketches, or parts of maps. Additionally, these tools should not provide difficulties for users with low technological knowledge. Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 224–239, 2011. © Springer-Verlag Berlin Heidelberg 2011
An Interface to Retrieve Personal Memories Using an Iconic Visual Language
225
Query definition requires different annotation methods according to the type of query used. If a query is defined by words then media must be annotated with a set of keywords. In this way, a query by image requires media to be annotated using visual features. If the media items are annotated with keywords describing their content, searching for an event becomes an easier task. Unfortunately, inserting those keywords manually is rather time consuming. As a consequence, in a world pervaded by image production and consumption, ways to achieve automatic annotation are essential. Extracting visual content to train semantic concepts (keywords) and associate them with images [12] is one way to obtain automatic annotation. Indeed, keywords may be an intuitive way to search for information but there are some occasions where keywords are not appropriate to express the desired query. When words are not enough, automatically extracted low-level visual features and queries using images examples or sketches must be provided. This paper presents an interface to explore digital memories to be used when people retrieve their past experiences in domestic environments. This application was designed to support several types of query and for this reason a query language based on icons describing different types of items is proposed. Additionally, the application provides several ways to annotate images. Visual content is used to annotate images by semantic concepts, and audio information, captured when users talk about their media materials, is used to annotate images by recognized keywords. The paper is structured as follows: the next section presents related tools and the state of the art; the subsequent section gives an overview of the system and its functionalities; the following sections describe the interface details and the image retrieval system. The paper ends with experimental results, conclusions and directions for future work.
2 Tools and State of the Art To manage digital pictures everywhere, using several devices and by different users (e.g., with low technological expertise or elder people), tools to access this information in a familiar way are required. Several interfaces have been proposed to explore images, including desktop, web, mobile and tangible interfaces. Commercial applications (e.g, Picasa, Photofinder and Adobe Photoshop Album) provide ways to explore this information by means of directory navigation. They require manual annotation as a mechanism to support searching for pictures and allow visualizing those pictures chronologically. Photomesa [1] is an example of another interface for image browsing. It employs Treemap layout to view hierarchies of image directories and provides zoomable interfaces for navigation. Fotofile [6] provides a mechanism for quick annotation, which requires a minimal amount of user interaction. They also integrate both manual and automatic annotation based on image content (face recognition). Web applications like Flickr and Phlog [17] also use manual annotation, but they have different concerns, such as enabling media items to be shared among friends. Flickr makes use of the manual annotation provided by the community, in order to reduce, for each user, the time consumed by the individual annotation.
226
R. Jesus, T. Romão, and N. Correia
Applications described so far require manual annotation, but ethnographic studies [5] show that most people do not spend time annotating images. For this reason, it is important to use a method that simplifies this task. In order to automatically annotate images, content or context metadata should be used. For instance, context information like temporal or geographical data obtained at capture time and recorded on the EXIF (Exchangeable Image File). One interface that explores automatic annotation using GPS (Global Positioning System) information is WWMX [18]. This project uses GPS data information to organize images on a map according to their geographical location. Concerning visual content [14] many systems have been proposed [15]. One of the oldest is QBIC [4]. This system uses color, texture and shape to represent an image and allows defining queries by image examples or sketch using color or texture patterns. Other systems that use sketches to define queries are Retrievr [16] and Imagescape [2]. Retrievr is a Web application where the user produces a drawing to query the database whereas Imagescape in addition, permits creating visual queries using images that represent previously trained concepts. Content-based systems have one major problem, the semantic gap [9], in other words, the gap between semantic concepts and low level features, such as color and texture. To alleviate this problem, MediAssist [13] uses both context (GPS, date and time) and content information. MediAssist, also provides a mobile interface to explore personal pictures. However, this mobile version uses context data exclusively. Mobile applications are other types of interfaces used to explore personal memories. They allow temporal and spatial independency. Tangible interfaces, on the other hand, have the primary objective of making interactions between technology and users seem closer to natural practices. Personal digital historian [11] is a project that includes a tangible interface on a table. With this application, people can sit around and look for images, as well as show them to one another, using their hands and fingers.
Interface Visualization
Search
Thumbnails Slideshow Composition
Retrieval System Image processing
Annotation
Capture
Audio Webcam
Concepts
Multimedia Repository Image and Video files
Audio Analysis
Fig. 1. System Architecture
An Interface to Retrieve Personal Memories Using an Iconic Visual Language
227
3 System Overview This paper presents the desktop application of the Memoria project. The main goal of this project is to build tools to explore personal memories. The project includes several applications with interfaces adapted to different contexts of use, including an interface for desktop PC, a tangible interface and a user interface for mobile devices to explore personal collections related with physical locations. The project also includes a multimedia retrieval system. The application presented in this paper consists of an interface, a multimedia repository and a retrieval system. The interface has four modes of operation (see Fig. 1): • • • •
Capture – to capture personal memories using a webcam; Visualization – to browse and visualize images of the personal collection; Annotation – to annotate images with audio information; Search – to retrieve memories of a specific type, using the visual query language.
To handle these queries a multimedia retrieval system is included. Image processing, classification and audio analysis are among the operations that are supported by this system.
Fig. 2. User interface
4 User Interface The proposed interface makes possible for a user to manage personal memories with pictures. In this way, the interface supports the search and the visualization of a set of images from a multimedia repository, the annotation of these memories with audio elements and the capture of new images using a webcam. The proposed interface is organized in two main sections (see Fig. 2): the results section and the query section. The results section is located in the upper part of the
228
R. Jesus, T. Romão, and N. Correia
screen and is used essentially as a display and selection area, showing the query results or the images that belong to a particular directory. These images are organized in a variable number of thumbnails and in one larger image that exhibits the preview of the selected thumbnail or the preview of an image captured by the webcam. Also, a mini-slideshow can be run inside this area. The query section, in the lower part of the screen, contains a list of filters and a query box. Filters are criteria, such as operators or concepts, that a user can drag into the query box in order to specify a query. Within these areas various actions take place. The following sections describe the main actions that are available using the proposed interface. 4.1 Capture The interface by means of a webcam and a dialog permits the capture of images in a simple way. The acquired picture can enrich the personal collection, but it can also be used as input to look for similar images. For instance, the user can search for images with objects alike the one placed in front of the webcam or quickly search for photos of a friend that just arrived without typing names or browsing folders. This mode allows the user to search for pictures using physical objects, in the spirit of a tangible interface, and can be a way to easy recover personal experiences of people with lower technological knowledge. Fig. 3 shows an example of this mode of interaction with the multimedia repository. For example, if the user wants to find images of parties, by placing a bottle in front of the webcam some party images will be retrieved because bottles usually appear in these images.
Fig. 3. Image capture using a webcam
An Interface to Retrieve Personal Memories Using an Iconic Visual Language
229
4.2 Visualization Given an image folder selected by the user, thumbnails of these pictures are generated in real-time and presented to the user in the results section. These thumbnails can be sorted using different criteria such as name, time or size of the original file. When one of these thumbnails is selected a preview is displayed. Both visualizations are integrated on the interface (see Fig. 2), so the user has the perfect notion of the image that was selected at any time. Other visualization mode that the interface allows is the slideshow. In this mode, images are displayed in the preview window or in a full screen slideshow and advanced automatically at fixed periods of time. 4.3 Annotation One of the current methods used to classify images uses a set of words (tags) to describe an image. Usually, the user is responsible for manual insertion of these tags and many applications use this information as a basis for image retrieval. This is a regular practice in applications for managing personal photos. Our approach simplifies this task, allowing the user to annotate images in an automatic manner, using audio elements and visual features. When people share their photos with friends they use the slideshow and provide comments or stories about the photo that is been visualized to better explain the captured moment and the related context (see Fig. 4). These comments can be very useful to annotate images. Audio annotation is performed in two different ways: • With a microphone the user can describe by words an image that is being presented at the moment; • The user can select a set of images, start a slideshow and while she presents and describes those photos to someone, such as a friend or a fellow worker, audio is being recorded (see Fig. 4). Recorded audio is analyzed afterwards using an ASR (Automatic Speech Recognition) tool. Given an audio file and a dictionary, this tool recognizes the keywords that were said and annotates images using these words.
Fig. 4. Audio recording when commenting images
230
R. Jesus, T. Romão, and N. Correia
4.4 Search The proposed interface offers two approaches when considering visual information retrieval from a multimedia database. One is based on semantic concepts, often used to describe an image and the other consists of a composition of visual elements, that is, selections of parts of images. Both approaches can be used in order to query the database. We call these queries, query by concepts and query by composition. Both types of queries are built using the visual query language proposed. Concepts, logical operators, temporal and geographical items, and images parts are the elements used to build the query using the drag and drop technique to a query box. The following subsection presents the visual query language and explains how it is used to retrieve images.
Fig. 5. Drag and drop to a query box
4.4.1 Visual Query Language The visual query language uses the following elements to build queries: image parts, contextual items and concepts. To combine these elements, logical operators are used. At this stage, the operators used are defined by the set, Loperator = {AND, OR, NOT}. The AND operator expresses conjunction or intersection whereas the OR operator expresses disjunction or union. The NOT operator permits to address to all concepts counterparts (e.g., NOT face = No Face).
An Interface to Retrieve Personal Memories Using an Iconic Visual Language
231
The user can indicate what concepts are relevant for the search among a universe of pre-established set of concepts: Concepts = {Beach, Face, Indoor, ManMade, Party, Snow, People}. Using the operator NOT more concepts are defined: NOT Concepts = {No Beach, No Face, Outdoor, Nature, No Party, No Snow, No People}. Because contextual information is important to recover personal media, a set of contextual elements are also defined: Contextual = {Location, Time}. Image elements are obtained by cropping sections of different images. Recurring to these elements and operators we can produce queries like, NOT Indoor AND NOT ManMade. Such query can be translated to images of the exterior that include nature elements (Note: NOT ManMade = NATURE and NOT Indoor = Outdoor). The integration of these operators can be an asset for retrieving a specific subset from the multimedia database, turning it into more valuable memories. To retrieve images using the query by concepts method, the user indicates what concepts are relevant for the search among the set of predefined concepts. When establishing the concepts needed to formulate a query, the user denotes that she is looking for images that are somehow related to her choice. For instance, if the selection consists of, Outdoor AND No People AND Nature, the system returns a set of images with a high probability to simultaneously depict those concepts. These images can be visualized or serve as input for the other type of query (query by composition). In order to accomplish a query of this kind, concepts and operators are chosen by means of a drag and drop operation into a query box (see Fig. 5). As opposed to similar applications or standard database querying there is no typing at all. The user merely drags operators and concepts to the query box and presses the submit button. Also, elements present in the query box can be reordered or deleted at any time. This solution is easier and faster than having a user typing keywords. Fig. 5 shows an example of a query by concepts. The concepts People and Indoor, and the operators NOT and AND are in the query box to search for indoor images without people. Another method that this application offers is query by composition. This method allows composing an image with other images parts (see Fig. 6). Using queries by composition the user indicates that she wants images with visual properties similar to the objects that were selected. Rectangular or freehand selections
232
R. Jesus, T. Romão, and N. Correia
can be done to construct the composition and their location and size is defined by the user. Compositions can be submitted or saved for reutilization. A list of existing compositions is presented on the Compositions menu and the user can select one of them at anytime to resend that query to the database. Fig. 6 shows an example of a query by image composition. Two image parts and the operator AND are in the query box to define the query and search for similar images.
Fig. 6. Query by image composition
5 Image Retrieval System As mentioned before, the user interface described in this paper allows querying a multimedia database using two methods: query by concepts and query by image composition. The first method gives more information to the system because concepts, in general, are trained using hundreds of images and the second gives more freedom to personalize the query, but the information given to the system is weaker. The query by concepts is based on a set of semantic concepts obtained by training a binary classifier. To combine several concepts in a query, the sigmoid function is applied to the output of the classifier, as described in earlier work [7, 8]. The image composition query works in a similar manner to that of the query by sketch [2], but the query is defined using parts of other images. When the user navigates in the personal collection, sometimes she does not find images that express the intended search. However, there are parts of several images that together can form the desired query (see Figure 6). After the query is formed, the system extracts from each of these parts a set of visual “words” to represent the query, in a way that is similar to text retrieval techniques. Then, Latent Semantic Analysis (LSA) [3] is applied and the cosine distance is used to retrieve relevant images (see [7, 8]).
6 Evaluation To evaluate the proposed application two different types of evaluations should be accomplished: performance evaluation of the multimedia information retrieval system
An Interface to Retrieve Personal Memories Using an Iconic Visual Language
233
and usability evaluation of the user interface. The Memoria Desktop interface was built using an iterative design model based on a cyclic process of prototyping, testing, brainstorming and then refining the work developed. This paper presents the usability tests of the working prototype described above and some results obtained by the retrieval system. The evaluation of the retrieval system can be consulted in [7, 8]. 6.1 Image Retrieval System Evaluation This section presents the experiments performed to evaluate the multimodal image retrieval system based on semantic analysis. A subset of concepts selected from the set of the 449 LSCOM concepts, suitable for personal collections, were trained using a data set obtained from the Corel Stock Photo CDs, from the TRECVID2005 database and from Flickr. These concepts were tested in the three applications described above, with two different picture collections. The applications Memoria Desktop was evaluated using a personal collection of one of the authors and a database of pictures taken by several visitors of Quinta da Regaleira, a cultural heritage site in Sintra, Portugal. Table 1 presents the results obtained when applying the image retrieval method using only time and visual information to the personal collection. Table 2 shows the performance of the retrieval system in the Quinta da Regaleira database using audio, visual and spatial information. Table 1. Mean average precision (MAP) obtained for a set of concepts combining color moments with the Gabor filter features and a bag of color regions with SIFT descriptors Concepts People Face Outdoor Indoor Nature Manmade Snow Beach Party MAP
Color Moments + Gabor 0.69 0.58 0.91 0.59 0.45 0.61 0.17 0.26 0.14 0.49
Bag color + SIFT 0.75 0.44 0.87 0.57 0.57 0.71 0.09 0.34 0.22 0.51
Color moments, Gabor filter features and bags of color regions and SIFT descriptors [8] were the visual features used in the experiments per-formed. When using only visual information (color moments combined with Gabor filter features), the concept “Outdoor” presents good results in both databases, “People” concept is the best in the personal collection and “Nature” concept presents a better performance in the second database. When time information is included in the visual model, the MAP increases to 0.53 which is better than the values obtained with only visual information (see table 1). Audio information also performs better than visual information (see table 2).
234
R. Jesus, T. Romão, and N. Correia
Table 2. MAP obtained by a set concepts combining audio, visual and geographic information to select a region or a direction in relation to a point Concepts GPS 60m, Visual and Audio GPS dir, Visual and Audio Outdoor 0.97 0.64 Indoor 0.09 0.63 Nature 0.86 0.46 Manmade 0.71 0.86 People 0.20 0.50 Indoor + Manmade 0.16 0.48 Outdoor + Nature 0.86 0.51 MAP 0.55 0.58
6.2 Usability Evaluation To analyze the latest system’s functionality and assess its usability, we conducted several user tests. The main goals of this study are: • To produce detailed user feedback about the Memoria Desktop application that will guide future improvements of the system; • To refine understanding of the user’s needs concerning the tools to manage personal memories; • To evaluate our proposals to define queries and to visualize the corresponding results. 6.2.1 Methodology The tests were accomplished individually by each user, but several users were testing the application simultaneously. After being briefed about the objectives, users were given a questionnaire that guided them throughout the test. The questionnaire, as explained later in this paper, describes the tasks the users should accomplish and presents related questions that the users should answer after completing the corresponding task. Two facilitators/observers were supervising the tests, encouraging users to “think aloud”, helping them when it was essential, taking notes on the users’ performances and recording every problem users explicitly mention. Tests lasted for a minimum of 20 minutes and a maximum of 30 minutes each, depending on each user’s strategy to explore the application. We were mainly interested in: • • • • • •
Finding out if it is easy to learn how to use the interface; Analyzing the aesthetics of the user interface; Evaluating the utility of the interface; Finding out if it is easy to use the different search techniques; Estimating the icons intuitiveness; Evaluating the user’s satisfaction with the search results.
The information was gathered with the ultimate goal of further refining the Memoria Desktop interface according to the users’ feedback.
An Interface to Retrieve Personal Memories Using an Iconic Visual Language
235
6.2.2 Participants The tests were performed on a population of 58 voluntary participants, who were graduate students. Ten of them were female. The participants in this experiment ranged in age from 21 to 31 years old with a mean age of 23.5. Most of the participants own a digital camera and, on average, they claimed they used it 2.7 times per month and 24 times per year (mode = 1 and 12 respectively). When asked about the number of photos in their personal collection answers differ: average = 2592, maximum value = 10000 and mode = 1000. Only fourteen participants (24.1%) declared they usually annotate their photos. The majority of participants (67.2%) reported that, when they need to search their personal photo collections, they explore the folders in their file system, which are named after the events they correspond to. Some users (13.7%) also referred they use the date as a search criteria. Other users (8.6%) said they use specific software to manage their photo collections. The clues most used by participants to search for photos related to a past event are: date (60.3%), place (48.3%), event type (20.7%) and people involved (12.1%). All participants had their first contact with the application during the test and used it under similar conditions. 6.2.3 Questionnaire The questionnaire was composed of three different parts. The first one captured personal data. Besides age and gender, personal data included digital camera usage, dimension of photo collections and photo search methods. The second part guided the users through the interface exploration, explaining the tasks they should accomplish and capturing their experimental feedback. The questionnaire presented four different tasks the users should carry out: 1. Navigate the photo collection using the folder’s tree to analyze, in more detail, some images. 2. Visualize a group of images in slideshow mode (normal and full screen); 3. Search images by dragging & dropping concepts and logic operators to the query box; 4. Search images by composing a sketch with parts of other images in the database. After completing each task, the users had to answer several questions concerning their experience when using the interface. This part of the questionnaire is composed of two types of questions: open-ended questions and Likert scale questions. The last part of the questionnaire relates to the global evaluation of the interface. Each participant was presented with four statements related to their experience while using the interface. Each user then indicated her level of agreement with each statement by circling a response on a 5-point Likert-type scale. On this scale a response of 1 (one) means disagreement and a response of 5 (five) means agreement. 6.2.4 Results As stated before, the second part of the questionnaire guided the users through four different tasks. In the first task, after using the folders tree to navigate through the photo collection, the users were asked to point out the difficulties they faced while executing the task. Most participants reported no difficulty (74.1%).
236
R. Jesus, T. Romão, and N. Correia
Concerning the second task, which consisted in visualizing a group of images in slideshow mode, 36.2% of the subjects described no difficulty; 48.3% considered that the icons were not appropriate, making it difficult to find the right button to start the slideshow. Users suggested the use of labels or tooltips to reveal the function of each button. Several users also claimed for methods to control the slideshow, namely to stop and to move forward or backward. The third task consisted in building a search image query by dragging & dropping concepts and logic operators into the query box. To complete this task, users had to accomplish several image searches with different goals: 1. Photos with people; 2. Photos showing natural landscapes; 3. Outdoor photos, excluding beaches; 4. Photos showing people or natural landscapes. After each search, users had to classify the results of the created query using a 5-point scale, where 1 (one) means bad and 5 (five) means excellent. The users’ classification is summarized in table 3. In general, the results obtained by using this search method were good. Participants were then asked about their difficulties to build the queries that drove the searches. This was an open-ended question, so we got many different opinions. While 24% of the participants had no difficulties, the remaining ones pointed out some problems. The most referred one was the need to build the queries sequentially, which makes it difficult to change a query: if users need to insert a new concept in the middle of a query they need to rebuild the whole query (no undo command). They also reported some difficulties in using the operators, specially, because they do not know their priorities. Some of the participants would like to have more concepts available and they reported some difficulty in using the concept “nature” in the queries, requiring a specific icon for it. Table 3. Users classification of the searches results Queries Mean People 3.9 Nature 3.8 Outdoor AND No Beach 4 People Or Nature 3.5
Standard Deviation 1 1 1 1
Mode 3 4 4 4
We wanted to know if users considered the drag & drop method appropriate to define queries for image search. In a 5-point scale, where 1 (one) means inappropriate and 5 (five) means very appropriate, most users considered it appropriate (Mode = 4; Mean = 3.54, SD = 1.1). We also asked users if the icons appropriately represent the concepts: 81% of the participants considered the icons perceivable and 17.2% had the opposite opinion. The icon representing “People” was elected as the most comprehensible one (chosen by 23 participants), followed by the icon representing “Snow” (chosen by 20 participants). Most unintelligible icons were the ones representing “Manmade” items (selected by 40 participants) and “Beach” (selected by 38 participants).
An Interface to Retrieve Personal Memories Using an Iconic Visual Language
237
Moreover, we would like to know if the users consider the method to combine the concept icons to build a query and obtain the corresponding results appropriate. In a 5-point scale, where 1 (one) means inappropriate and 5 (five) means very appropriate, most users considered it appropriate (Mode = 4; Mean = 3.43, SD = 0.82). Users (34) choose “People” concept as the most useful one when searching images in personal collections. They also suggested new concepts, such as “Sports”, “Sea”, “Day” and “Night”, and seasons (e.g., “Summer” or “Winter”). The fourth task, searching images by composing a sketch with parts of other images stored in the database, comprises two sub-tasks. In the first subtask, to query the image database users had to make a sketch based on rectangular parts of others images. In the second subtask, participants used the freehand mode to select parts of other images to build a sketch. Image searches were then based on these sketches. After completing these tasks, users were demanded to classify the obtained results. In a 5-point scale, where 1 (one) means bad and 5 (five) means excellent, most participants choose 3 for the first subtask (Mode = 3; Mean = 2.59, SD = 0.97) and 2 for the second subtask (Mode = 2, Mean = 2.2; SD = 0.93). We also wish to figure out if users find this functionality useful when searching for an image in an image database. The majority of participants (67.2%) considered it useful, since it allows them to visually query the image database. Five users (8.6%) think it is particularly appropriate when searching for faces or people. However, 15.5% of the participants considered it useless. The last part of the questionnaire aims at capturing the participants overall opinions about the interface. Users were asked to evaluate the interface in what concerns its learnability, aesthetics and usefulness. Most participants considered that the information provided by the system is useful, but not with very strong feelings (Mode = 4, Mean = 3.53, SD = 0.79). The results were similar for the statement “it is easy to learn how to use the application” (Mode = 3, Mean = 3.34, SD = 0.83). Once again, the results were positive, but centered around a medium position, with most participants agreeing with the statement “I like the application aesthetics” (Mode = 3, Mean = 3.19, SD = 0.91) and also the statement “I would use the application to manage my personal photos” (Mode = 3, Mean = 3.12, SD = 1.08). Overall, the results of the usability tests were quite positive and provided useful feedback to inspire future improvements.
7 Conclusions and Future Work An interface to explore personal memories was presented. It allows browsing and visualizing memories, capturing new media items, the semi-automatic annotation of images using audio information, and the construction and reuse of queries using a visual query language to search for images on a multimedia repository. The described capabilities, in conjunction with a drag and drop operation into a query box, enable the management of personal memories in a way that is fast and also requires a minimal interaction from the user. The interface was evaluated with 58 users that confirm our main options. Most of the users considered appropriate the proposed technique to define queries based on an iconic visual language. They also like the search results (see Table 3). In [7, 8] we evaluated the semantic concepts with a Mean
238
R. Jesus, T. Romão, and N. Correia
Average Precision (MAP) of about 50%. As can be seen in Table 3, the users classified this result as “good” which means our image retrieval method performed well in this case. Future and ongoing work includes integrating the users suggestions for the proposed interface and providing more usability tests to analyze in depth some parts of the interface that were less explored, namely our strategies for image annotation with audio. Finally, we are also developing a tangible interface where physical objects are used to create queries.
References 1. Bederson, B.: PhotoMesa: A Zoomable Image Browser Using Quantum Treemaps and Bubblemaps. In: UIST 2001, ACM Symposium on User Interface Software and Technology, CHI Letters, vol. 3(2), pp. 71–80 (2001) 2. Buijs, J., Lew, M.: Visual Learning of Simple Semantics in ImageScape. In: Huijsmans, D.P., Smeulders, A.W.M. (eds.) VISUAL 1999. LNCS, vol. 1614, pp. 131–138. Springer, Heidelberg (1999) 3. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the Society for Information Science, 391–407 (1990) 4. Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., Yanker, P.: Query by Image and Video Content: The QBIC System. Computer 28(9), 23–32 (1995) 5. Frohlich, D., Kuchinsky, A., Pering, C., Don, A., Ariss, S.: Requirements for Photoware. In: Proc. of the ACM Conference on Computer Supported Cooperative Work, pp. 166–175 (2002) 6. Kuchinsky, A., Pering, C., Freeze, D., Creech, M., Serra, B., Gwizdka, J.: FFotoFile: A Con-sumer Multimedia Organization & Retrieval System. In: Proceedings of the ACM Conference on Human Factors in Computing Systems CHI 1999, pp. 496–503 (1999) 7. Jesus, R., Santos, E., Frias, R., Correia, N.: An interface to explore personal memories. In: 15th Portuguese Computer Graphics Group Conference, Porto Salvo, Portugal (2007) 8. Jesus, R., Dias, R., Frias, R., Abrantes, A., Correia, N.: Sharing personal experiences while navigating in physical spaces. In: ACM SIGIR Conference on Research and Development in Information Retrieval, Multimedia Information Retrieval Workshop, Amsterdam, The Netherlands (July 2007) 9. Lew, M., Sebe, N., Djeraba, C., Jain, R.: Content-based Multimedia Information Retrieval: State-of-the-art and Challenges. ACM Transactions on Multimedia Computing, Communication, and Applications 2(1), 1–19 (2006) 10. Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal Computer Vision, 91–110 (2004) 11. Moghaddam, B., Tian, Q., Lesh, N., Shen, C., Huang, T.: Visualization and User-Modeling for Browsing Personal Photo Libraries. Int. J. Comput. Vision 56(1-2), 109–130 (2004) 12. Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on dividing and vector quantizing images with words. In: Proceedings of the International Workshop on Multimedia Intelligent Storage and Retrieval Management (1999) 13. O’Hare, N., Gurrin, C., Jones, G., Smeaton, A.: Combination of Content Analysis and Context Features for Digital Photograph Retrieval. In: Proceedings of the 2nd European Workshop on the Integration of Knowledge, Semantic and Digital Media Technologies, pp. 323–328 (2005)
An Interface to Retrieve Personal Memories Using an Iconic Visual Language
239
14. Pereira, F., Koenen, R.: MPEG-7: A standard for multimedia content description. International Journal of Image and Graphics 1(3), 527–546 (2001) 15. Veltkamp, R., Tanase, M.: A Survey of Content-Based Image Retrieval Systems. In: Marques, O., Furht, B. (eds.) Content-Based Image and Video Retrieval, pp. 47–101. Kluwer, Dordrecht (2002) 16. http://labs.systemone.at/retrievr/ 17. http://phlog.net/ 18. http://wwmx.org/
VR-Based Basketball Movement Simulation Lin Zhang1,* and Ling Wang2 1
Department of Sports, School of Education, Zhejiang University, Hangzhou 210028
[email protected] 2 Physical Education Department, Business College of Hang Zhou, Zhejiang Gongshang University, Hangzhou, Zhejiang Province 310035
[email protected]
Abstract. Because modern competitive sport develops fast to high-level, complex and advanced phases, sports training begins to use modern scientific method. In order to excavate people's potential energy, modern sports depend on scientific technology. This article elaborated the virtual reality and the sports system simulation technology in the basketball teaching, the training application significance, As well as basketball skill, tactical simulation method and virtual reality design and realization. Keywords: virtual reality, Basketball skill tactical simulation, Interactive procedure design.
1 Introduction 1.1 Virtual Reality Virtual Reality (VR), sometimes called virtual reality environment, is a real-time display of the computer-simulated world. This world which consists of 3D graphs can be a true reappearance of special real-world or a purely imagined one. Operators can generate interactive immersion environment through visual sense, sense of hearing, sense of touch, force sensing and so on. Virtual Reality provides a new interactive media for human-computer interaction. Generally, VR system features various output forms (graphs, voices and texts), ability of processing different input devices. And it can also accomplish collision detection, viewpoint control and complex behavior modeling, etc. Now, VR technology is widely used in many areas such as military simulation, entertainment, games, education, tele-robots, sports[1,2,3] and so on. With the development of various research subjects (computer technology, computer graphics, network technology, pattern recognition, intelligent interface technology, physiology, multi-sensor technology, speech technology), VR technology research becomes a comprehensive technology. The continuous commercialization development of VR encourages more researchers' participation. VR can change the living style with the development of related technology [4]. *
Corresponding author.
Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 240–250, 2011. © Springer-Verlag Berlin Heidelberg 2011
VR-Based Basketball Movement Simulation
241
1.2 Sports System Simulation Sport's internal law is studied by means of integrated application of sports science related subject knowledge and system science (system simulation theory) method. Sports System Simulation belongs to experimental technology science. It realizes the explanation, analysis, prediction, organization and assessment of sports system by reconstruction of physical education teachers' teaching experiences, coaches' training intentions, administrators' organization schemes and athletes' training processes. Recently research hotspots about system simulation include object-oriented simulation method, qualitative simulation, distributed interactive simulation, visual simulation, multimedia simulation, intelligent simulation and simulation based on VR. The last method which is different from the others emphasizes various perception abilities, interactivity, and sense of immersion. Therefore, simulation based on VR has been applied widely, one of the developing platform is Virtools[5] and Cult3D[6]. 1.3 Basketball Technique and Tactics Based on VR As is known, modern basketball continuously improves during the mutual promotion and restriction between basketball technique and basketball rule. Teachers and coaches who are engaged in popularization and promotion of basketball sport cannot always demonstrate normative technical movement. Even for leading players, they might face the same problem with age increasing and degradation of technical and physical quality. In order to achieve correct technical movement, the feedback process accomplished by students’ repeated trainings and teachers’ guidance. Sometimes, due to complex tactic content, teachers cannot make out the movement directions and opportunities of tactics in lots of cross line graphs. Many teachers analyze tactic line graph through experiences. Basketball technique and tactics simulation based on VR means that apply VR technology to simulate basketball sport, where virtual human techniques are widely employed [7]. As a brand-new technology, it provides a new idea and teaching platform for modern basketball teaching and training. This simulation not only enables us to view and display scientific, reasonable, normative technical movement by breaking time-space limitation at any angle, but display tactical coordination line dynamically. All these can avoid the defects due to traditional teaching and training demonstration. 2 Construct Virtual Venue
2 Construct Virtual Venue 2.1 Virtual Venue Modeling The software based on 3D restores the venue according to venue architectural drawings, and that virtual venue designed must be paid attention to about proportions everywhere [8] See Fig.1 Secondly, to reduce the number of faces of a model as much as possible and delete invisible faces is necessary. A large number of models, such as seats and rails, should be realized by simple models and perspective textures for reducing the file size of venue models (See Fig.2).
(
)
242
L. Zhang and L. Wang
Fig. 1. Virtual basketball playground
Fig. 2. Simple model and texture for seats and rails
2.2 Virtual Athlete's Role As the most popular 3D graphics and animation software at present, 3dmax and Maya, which have powerful function of graphics and animation, are still tedious for character modeling. Therefore, we construct character models first with character modeling software, and then import them into 3dmax and Maya environment. The animation about character models is completed through the function of skeleton and joints. And the construction of those is the key to emulation animations. Moreover, the connection between character models and skeleton is influenced by weight setting which is directly related to deformation effect on character model surfaces imposed by skeleton movement. We can improve the effects of body skin and bindings through setting skin weight.(Fig.3) Because model optimization influences the file size and fluency of movement in the VR interactive environment concise and effective optimizing methods should be taken, such as reducing models' faces, organizing and then merging models and UV map textures.(Fig.4)
VR-Based Basketball Movement Simulation
243
Fig. 3. Virtual athlete modeled with character modeling software
Fig. 4. Merging models and UV map textures
3 Acquisition of Animation Data Optical motion capture technology is now the most widely-used and mature capturing technology [9,10]. However, due to the expensive rent of optical motion capture equipment and elite athletes, such as site environment (4-6m), avoiding light requirement, special clothing and mark's wearing, motion captures in many sports events are limited. Current capture and study of motion data of competitive sports are usually obtained by photo shooting, adding mark points on pictures, connecting these points, and then achieving motion trajectory and skeleton animation.
244
L. Zhang and L. Wang
The way of acquiring animation data of technique and tactics is as following: constructing skeleton with image, that is inter-comparison of skeleton and series maps of technical movement [11,12]. Firstly, we need collect and select normative image data of elite basketball players that adapted to teaching contents. Next, set skeleton animation at the key point of emulation images' technique and tactics through comparison to skeleton of virtual athlete's role. (Fig.5). Finally, activate the role with data of skeleton animation. (Fig.6).
Fig. 5. Skeleton of virtual athlete's role
Fig. 6. Activating the role with data of skeleton animation
VR-Based Basketball Movement Simulation
245
4 Interactive Programming 4.1 Main Program and Functional Interface Design (1) Program loading and animation display module Process: Activate appointed camera-->Activate Logo and time control module-->Activate scene-->Camera path animation module and time control module-->Activate host interface and menu (See Fig. 7).
Fig. 7. Loading and animation display module
Fig. 8. Start menu functional moudle
(2) Start menu functional module Waiting for instruction of start menu-->Activate start menu and menu position data-->Display classification and orientation button-->Calculate menu position data-->Activate branch program button-->Waiting-->Reduction (See Fig. 8).
246
L. Zhang and L. Wang
The function of main program is mainly classification and orientation. According to basketball training content and features of dynamic display, program composed of three main branch program navigation including technical movement, tactical coordination, judging method.(Fig.9) 4.2 Interactive Program Design of Technical Movement According to the classification of basketball technique, technical movement module includes five first-order classification navigation(Pass, Dribbling, Shooting, Breakthrough, Defense) and twenty-five second-order action activate buttons. Besides, this module can realize slow motion, frame-by-frame forward and backward, fast handoff during four visual angles and angles' arbitrary control to athletes' movement tracking.(Fig.10)
Fig. 9. Function of main program (UI)
Fig. 10. Arbitrary control to athletes' movement tracking
VR-Based Basketball Movement Simulation
247
(1) Functional module for activation of second-order animation button by first-order navigation Put technique classification on the state of waiting for information-->Select one class-->Shield interferential module-->Activate this button-->Animation curve module-->Achieve button position data and stop-->end of paragraph-->Waiting for returned information-->Return by another click. (2) Slow motion and frame-by-frame backward and forward module Frame-by-frame animation forward information-->Frame count to go forward-->Detect the current frame count-->Loop Frame-by-frame animation backward information-->Frame count to go backward-->Detect the current frame count-->Loop Slow motion animation information-->Judge branches-->Module for animation and detect current frame count of animation module-->Animation velocity control-->Loop Judge branches-->End of animation-->Reendow the role-->Cyclic waiting
Fig. 11. Loading and animation display module
Fig. 12. Start menu functional module
248
L. Zhang and L. Wang
(3) Function module for arbitrary perspective control and tracking athletes Setting the role and following target position-->Setting following target of camera-->Loop Information waiting module for camera control button-->Control module for movement and rotation of camera (See Figure 13). This setting can alleviate the action of camera. 4.3 Functional Module and Interactive Program Design for Tactical Coordination Tactical coordination module includes four first-order classification navigation (Basic Cooperation Combinations of Attack, Fundamental Defense Coordination, Half-court Attack Tactic, Half-court Defense Tactic)and twenty-two second-order tactical coordination activate buttons according to the classification of basketball tactics. Besides, this module can realize slow motion, frame-by-frame forward and backward, fast handoff during four visual angles and angles' arbitrary control to athletes' movement tracking.
Fig. 13. Arbitrary perspective control and tracking athletes
Fig. 14. module for tactical coordination animation
VR-Based Basketball Movement Simulation
249
(1) Functional module for tactical coordination animation Waiting for information-->Shield interferential program-->Judge branches-->Module for animation and detection of current frame count of animation-->Cyclic-waiting Judge branches-->End of animation-->Reendow the role module-->Cyclic-waiting (See Fig. 14) (2) Fast handoff during four perspectives and arbitrary visual angle control Send out the instruction through button-->Module for single-channel cyclic switch-->Module for circulation of camera-->Set the current camera 4.4 Functional Module and Interactive Program Design for Judging Method According to the classification of basketball competition rules and judging methods, module for judging method covers five first-order classification navigation (Regional division, Division and coordination, Common gesture, Violation gesture, Foul gesture) and twenty-two animation activate buttons. In addition, this module can realize slow motion, frame-by-frame forward and backward, fast handoff during four visual angles and angles' arbitrary control to athletes' movement tracking. Chief and accessory judges wait for the same information-->Judge branches-->Animation and judges detect animation frame respectively-->Loop Teamplayers' coordination animation judging Synchronize animation-->Reendow the role of chief and accessory judges-->Cyclic-waiting
5 Discussion Due to the deficiencies such as the expensive optical motion capture equipment, the limitation of site environment, special clothing and wearing of mark, motion data captured with these devices cannot reflect the real sports level of athlete. And there is an obvious deviation between application and research. Although 3D sports data is not restricted by the above deficiencies, they still cause deviations for z-axis data and y-axis rotation angle. We adopt the method of graphs to construct skeleton and inter-comparison of skeleton and series maps of technical movement. Because of the stationary skeleton width, the z-axis data of body is relatively correct, but not the z-axis data of body gravity. If erecting two camera with exact ninety degree, the series of graphs achieved by inter-comparison between skeleton and camera and reverted 3D motion data has higher reliability. The 3D motion data of this project depends on the author's painstaking work patience and experiences. Compared to only one piece of serial graph, remaining series of graph of skeleton can absorb and recognize automatically which will greatly improve the work efficiency [13]. When designing the interactive program, we still cannot solve the control of virtual role during the playing the continuous motion. Therefore, frame fixing of the action adopts frame-by-frame forward and backward method. Aim at virtual athlete who
250
L. Zhang and L. Wang
carries more animation data, this paper employs the method of shielding other classes in continuous motion for avoiding receiving suddenly changed instruction which cause confusion and mutual interference between motion data. The control in execution process of animation data is an interesting future work, awaiting for solution. Acknowledgments. This research is funded by 863 project (Grant NO: 2006AA01Z335).
References 1. Pan, Z., Xu, W., Huang, J., et al.: EasyBowling: Small Bowling Machine Based on Virtual Simulation. Computers & Graphics 10(2), 231–238 (2003) 2. Liu, M.: Three-dimensional game design-Virtools development tools, p. 5. Sichuan press, Sichuan (2005) 3. Chen, W., Zhang, M., Pan, Z., et al.: Animations, Games, and Virtual Reality for the Jing-Hang Grand Canal. IEEE Computer Graphics and Applications 29(5), 91–95 (2009) 4. Noh, Z., Sunar, M.S., Pan, Z.: A Review on Augmented Reality for Virtual Heritage System. Edutainment, 50–61 (2009) 5. http://www.virtools.com/ 6. Yun, R., Zhang, B., Pan, Z.: Research on using cult3D and java to realize virtual assembly. In: Chang, M., Kuo, R., Kinshuk, Chen, G.-D., Hirose, M. (eds.) Learning by Playing. LNCS, vol. 5670, pp. 363–370. Springer, Heidelberg (2009) 7. Cheng, X., Liu, G., Pan, Z., Tang, B.: Fragment-based responsive character motion for interactive games. The Visual Computer 25(5-7), 479–485 (2009) 8. http://www.autodesk.com.cn/adsk/servlet/index?siteID=1170359 &id=9905292 9. MotionBuilder75\Help\tutorials.htm 10. Pan, Z., Li, H., Zhang, M., Ye, Y., Cheng, X., Tang, A., Yang, R.: Photo Realistic 3D Cartoon Face Modeling Based on Active Shape Model. T. Edutainment 2, 299–311 (2009) 11. Tian, Z., Wang, Y.: Application and prospect of digital anthropical technology in the discipline of sports science. Sports Science 8(25), 83–87 (2005) 12. Tu, R.: System Simulation Methodologies Facing towards Information Era. Journal of System Simulation 11(5), 312–315 (1999) 13. Pan, Z., Chen, W., Zhang, M., Liu, J., Wu, G.: Virtual Reality in the Digital Olympic Museum. IEEE Computer Graphics and Applications 30(3), 84–88 (2010)
Mixed 2D-3D Information for Face Recognition Hengliang Tang*, Yanfeng Sun, Baocai Yin, and Yun Ge Beijing Key Laboratory of Multimedia and Intelligent Software Technology, College of Computer Science and Technology, Beijing University of Technology, 100124 Beijing, China {tanghengliang,gyun}@emails.bjut.edu.cn, {yfsun,ybc}@bjut.edu.cn
Abstract. Face recognition with assistance of 3D models has been a successful approach recently. In this paper, we develop a face recognition system fusing 2D and 3D face information. First, the HarrLBP representation is proposed to represent the 2D faces. Then, the 3D morphable model (3DMM) is employed to estimate the 3D shape for the given 2D face, and five kinds of 3D facial geometrical features are extracted from the virtual 3D facial meshes to assist the face recognition. Finally, we fuse the 2D HarrLBP and the five 3D features under a linear self-adaptive weight scheme to promote the final recognition efficiency. The experimental results on ORL and JAFFE2 face database show the good performance of the proposed fusion method, and demonstrate that our method is robust to the facial expressions and poses to a certain extent. Keywords: face recognition, 3D morphable model, feature extraction.
1 Introduction The recognition algorithms based on 2D face images are usually sensitive to facial variations and uncontrolled environment. While the 3D face contains more spatial information, which is inherent property of an object and robust to the uncontrollable environment. So many researchers have paid more attention to face recognition assisted 3D information or fusing 2D and 3D faces recently [1]. Chang et al. [2] used PCA on both 2D intensity images and 3D depth images, and fused 2D and 3D results to obtain the final performance. Lu et al. [3] used feature detection and registration with the ICP algorithm in the 3D domain and LDA in the 2D domain for multimodal face recognition. Mian et al. [4] fused the SIFT for 2D descriptor and SFR for 3D representation to form a rejection classifier, and then a modified ICP algorithm was employed in the final recognition. Zhang et al. [5] synthesized images from the 3D face models, investigated the effectiveness of face recognition using averaged images in different conditions and revealed the mechanism of averaging face images in face recognition. Wang et al. [6] estimated the 3D shape from the 2D face using 3D morphable model (3DMM), and the virtual faces of different views were generated from the 3DMM to assist face recognition. *
Corresponding author.
Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 251–258, 2011. © Springer-Verlag Berlin Heidelberg 2011
252
H. Tang et al.
Considering the 3D face acquisition is time-consuming and needs more humancomputer interaction in actual application, the 3DMM is employed in this paper to estimate the 3D shape from the given 2D face. Furthermore, we don't utilize the synthesized virtual 2D face images, but extract five kinds of 3D facial geometrical features to assist 2D face recognition. In addition, we propose a HaarLBP representation to depict the 2D faces. Finally, we fuse the 2D HaarLBP and the five 3D features under a linear weight scheme to promote the final recognition efficiency.
2 HaarLBP Representation Framework With the varieties of facial expression and pose, the face images are distorted to some extent, and the similarity of face images from the same individual with different pose or expression is decreased, which affects the face recognition performance greatly. To extract robust face features, we present a HarrLBP representation method (Fig. 1).
Fig. 1. The HaarLBP representation
Considering that face varieties distort facial images largely in spatial domain, we first transform the face images to frequency domain, to get more robust face information for classification. The 2D Haar wavelets [7] is employed with the advantages of simple concept and fast calculation. Under 2D Haar wavelet transform, the face image is decomposed into four-channel subimages: LL, LH, HL, HH. After that, the LBP operator is utilized to extract face features. The LBP operator, introduced by Ojala et al. [8], is a non-parametric operator which describes the local spatial structure of an image, and is a powerful means of texture description which
Fig. 2. The HarrLBP representation based on two-layer weighted scheme
Mixed 2D-3D Information for Face Recognition
253
gains much attention in face recognition field, and is also successfully applied on 2D face recognition [9][10][11][12]. So the LBP operator is utilized to extract face features after the Harr wavelet transform. First, the face image is divided into some regions, and then the LBP operator is applied on each region. Finally, the static histogram is adopted to extract the regional features and represent the face. Combining the Harr wavelet and LBP operator, we propose the HarrLBP representation method based on two-layer weighted scheme (Fig. 2) to extract robust facial features for classification.
HLBPH ( f ) = (V ⊗ H ( f )) iW
⎡ h11 L h41 ⎤ M ⎥⎥ H ( f ) = [ H1 H 2 H 3 H 4 ] = ⎢⎢ M O ⎢⎣ h1N L h4 N ⎥⎦ ⎡ v11 L v41 ⎤ M⎥⎥ V = [ v1 v2 v3 v4 ] = ⎢⎢ M O ⎢⎣v1N L v4 N ⎥⎦ W = [ w1 w2 w3 w4 ]
T
(1)
(2)
(3)
(4)
where f is the input face image, and HLBPH(f) denotes the two-layer weighted Haar LBP representation of f. The H(f) is the N×4 static histogram matrix for face image I, and each column denotes a LBP histogram statistical result for a subimage derived from the 2D Haar wavelets transform, and N is the number of divided block regions for subimages. H1 to H4 denote the histograms for the four-channel subimages of f, and each element hmn of the matrix H(f) represents the histogram of the n-th face block region for the subimage m. V is a N×4 weight matrix for face regions of the four-channel subimages, and balances the contributions of face block regions for the four-channel subimages. W is a weight vector with four elements for the four-channel subimages, and is used to fuse the four-channel face subimages by the linear weighted strategy.
3 3D Face Reconstruction and Feature Extraction 3.1 3D Face Reconstruction The image-based face features is largely affected by face varieties, so the 3D face information is employed to assist the final recognition. Because the acquisition of real 3D face data is time-consuming and needs more cooperation of participators in actual application, the 3DMM is employed to synthesize the virtual 3D faces (Fig. 3). The 3DMM is based on the linear 3D facial bases. We pick up 200 representative 3D faces from the BJUT-3D face database to construct the face subspace, and then the PCA is employed to form the bases of face subspace. A new 3D virtual face can be construct as
254
H. Tang et al.
Fig. 3. 3D face reconstruction N
N
S new = ∑ ai Si , Tnew = ∑ biTi i =1
i =1
N
(5)
N
∑ a = ∑b =1 i =1
i
i =1
i
where Si and Ti stand for the bases of facial shape and texture respectively. N is the number of facial bases. ai and bi are the coefficients of the shape and texture bases. Besides the shape and texture parameters, the 3DMM also involves the facial illumination and pose parameters to estimate a certain 3D face. For a given 2D face image, we adjust all the model parameters to make the projective image of the virtual 3D face in certain view fit the given image optimally. Then the virtual 3D face for the given 2D face can be reconstruct by the optimal model parameters. The genetic algorithm is adopted to solve the optimization problem in this paper. The optimization model for this problem can be defined as
EI = min ∑ I input ( x, y ) − I model ( x, y )
2
(6)
x, y
where Iinput (x, y) represents the given 2D face image, and Imodel (x, y) is the projective image of the virtual 3D face. 3.2 3D Face Features Extraction Assisted by the 3DMM, we can get the virtual 3D face of the given 2D face, and the virtual 3D face is composed of a great deal of vertices and the triangular patches with a collection of connected edges and vertices. Then, five kinds of 3D face geometrical features are extracted from the virtual 3D face to assist 2D face recognition. Face Area (3D-A): triangle areas of all the triangles on a facial mesh. The virtual 3D faces are arranged by discrete vertices and triangular patches, which provide us the vertices coordinates and adjoining vertices information of every vertex. So the face
Mixed 2D-3D Information for Face Recognition
255
surface area can be calculated by summing of all the areas for the triangular patches, and the Heron’s formula is used to calculate the area of every triangular patch. The area similarity measure for two faces can be represented by the absolute value of difference between their surface areas. This type of feature is pose-invariant, while not invariant to scaling, thus requires normalization. Face Volume (3D-V): the inclusive volume of face surface. Each triangular patch can be projected vertically to a reference plane α, and the projection of every triangular patch can form a column. Then the face volume can be estimated by summing the volumes of all the triangular columns. The volume similarity measure for two faces can be also represented by the absolute value of difference between their face volumes. This type of feature is also pose-invariant, and sensitive to scaling. Geodesic Distance (3D-G): the geodesic distance matrix based on some facial key points. First, a few key points are labeled on facial mesh, and then the geodesic distance between any two of them are calculated. Finally, we arrange all the geodesic distances as a vector to represent the 3D face. The geodesic distance similarity measure for two faces can be represented by the Euclidean distance between their geodesic distance vectors. This feature also need to be normalized under scaling. Normal Vector (3D-N): the normal vectors of some facial key points. For a triangular patch on a facial mesh, its normal can be calculated by its three vertices. So the normal at a certain key point can get by averaging the normals of all the triangular patches adjoined with the point. And we also form all the normals at key points as a vector to describe the 3D face. The normal vector similarity measure for two faces can be defined as the angle distance between their normal vectors. This type of feature is scale-invariant, but needs the normalization on pose. Facial Profile (3D-P): the length of facial profile. The vertical and horizontal profiles which pass through the nose tip are preferred in this paper, because they hold much more face features (forehead, nose, mouse, chin, check, and so on). We sum all the Euclidean distance between any pair of adjacent points on a facial profile to estimate its length. The facial profile similarity measure for two faces can be defined as the absolute value of difference between their face profiles. This feature is pose-invariant, and sensitive to scaling.
4 Experiments and Discussions We extract the HaarLBP feature from 2D faces, and also the five kinds of 3D geometrical features from 3D virtual faces. And we integrate them to address the final recognition task, and a linear weight strategy is adopted to fuse the six features as 6
Ω ( u ) = ∑ μ i Ω ( ui )
(7)
i =1
where Ω(ui) is the metric measure of the i-th feature for the face sample u. The final fusion representation is depend on Ω(u), and the nearest-neighbor classifier (NN) is employed to serve the recognition task. The weight coefficient μi is firstly set as the contributive efficiency of the i-th face feature for the recognition, and then is iteratively revised according to the recognition efficiency until we get the best result.
256
H. Tang et al.
4.1 Experiments on ORL There are 10 different images for each of 40 distinct subjects in ORL face database. For some of the subjects, the images were taken at different times, varying lighting slightly, facial expressions and facial details. For each subject, one frontal image is picked up for gallery, and the left nine for test. Table 1,2 and 3 shows the results tested on ORL face database. Table 1. The contribution of the four subimages tested on ORL database Subimage LL LH HL HH Contribution 0.44 0.24 0.22 0.10
Table 2. The recognition results for each feature tested on OR LBP HLBPH 3D-A 3D-V 3D-G 3D-N 3D-P 75.0% 80.0% 56.3% 56.3% 56.3% 56.3% 45.0%
Table 3. The fusing results for 2D and 3D face features tested on ORL database HLBP 1.00 0.74 0.77 0.45 0.51 0.79 0.40
3D-A --0.26 --------0.20
3D-V ----0.23 ------0
3D-G ------0.55 ----0
3D-N --------0.49 --0.30
3D-G ----------0.21 0.10
Rates 80.0% 88.8% 86.3% 90.0% 91.3% 85.0% 92.5%
4.2 Experiments on JAFFE2 There are 214 images of 10 distinct subjects in JAFFE2 database. The images for each subject is with seven expressions: angry, disgust, fear, neutral, happy, sad and surprise, and about 3 samples for each expression. All the neutral images are used for gallery, and the left for test. Table 4, 5 and 6 shows the results tested on JAFFE2 database. Table 4. The contribution of the four subimages tested on JAFFE2 database Subimage LL LH HL HH Contribution 0.49 0.19 0.25 0.07
Table 5. The recognition results for each feature tested on JAFFE2 LBP HLBPH 3D-A 3D-V 3D-G 3D-N 3D-P 81.8% 84.8% 34.8% 34.8% 31.5% 38.6% 32.1%
Mixed 2D-3D Information for Face Recognition
257
Table 6. The fusing results for 2D and 3D face features tested on JAFFE2 database HLBP 3D-A 3D-V 3D-G 3D-N 3D-G Rates 1.00 ----------- 84.8% 0.81 0.19 --------- 90.7% 0.90 --0.10 ------- 91.1% 0.88 ----0.12 ----- 89.7% 0.81 ------0.19 --- 91.6% 0.92 --------0.08 88.9% 0.75 0.05 0.10 0 0.05 0.05 93.0%
4.3 Discussions Firstly, Table 1 and 4 show that the four subimages hold different contribution for face classification. Usually, the contribution of subimage LL is better, because LL denotes the low frequency of the original image, and holds more detailed texture (appearance) information. But the LL is usually affected by facial variations and uncontrolled environment greatly, so the LL is not robust and dependable for face recognition. The HH denotes the high frequency of the original image, and mixes with much noises. So the HH contains less useful information, and often is discarded. The LH and HL, are in the middle frequency, and describe the texture grade of faces in horizon and vertical. They hold much more dependable face information, and are robust to a certain face variations and uncontrolled environment. Thus, the LH and HL hold some useful information, and so contribute much for recognition. Second, the results in Table 2 and 5 depict the discrimination ability of nine face features in 2D and 3D. By comparison, the feature of HaarLBP is more contributive for recognition, and gets the best face recognition rate of 80.0% for ORL and 84.8% for JAFFE2. Compared the performance of LBP and HaarLBP, the performances of LBP are worse than those of HaarLBP, which demonstrates that our HaarLBP representation is superior to the basic LBP method and can catch much more useful information. The five kinds of geometrical features extracted from 3D virtual faces are also active for classification and can provide much useful information, although the 3D faces are synthesized from 2D faces and with some reconstructed errors. The fusion results of 2D and 3D face features are shown in Table 3 and 6, where the frontal six columns give us the weight values for face features and the last column presents the final fusion results. We test the performances of 2D HarrLBP assisted with each 3D feature and also with all the 3D features, and get the best fusion results of 92.5% and 93.0% on ORL and JAFFE2. Whereas, the performances of basic LBP and HarrLBP on ORL are 75.0%, 80.0%, and 81.8%, 84.8% for JAFFE2. In addition, for face pose varieties tested on ORL (Table 3), we can get that the 3D face features weight more than those for face expressions tested on JAFFE2 (Table 6), because the 2D HLBPH representation is more robust to face expressions than poses, while the 3D faces are more sensitive to face expressions than poses.
258
H. Tang et al.
5 Conclusion and Future Work In this paper, we develop a HaarLBP representation for 2D faces, and also extract five geometrical features from the virtual 3D faces reconstructed from 2D face images in virtual of 3DMM. Finally, we fuse the 2D and 3D features under a linear weighted scheme to promote the recognition rate. The HaarLBP representation depicts the texture details of face images in frequency domain, and the 3D geometrical features provide the structure information as supplementary. Moreover, the linear weighted scheme integrates all the face features effectively. Possible future work includes incorporating other features into the descriptors and investigating their effectiveness. Acknowledgements. Project supported by the National Natural Science Foundation of China (No. 61033004, 60825203, 60973057, U0935004), National Key Technology R&D Program (2007BAH13B01), and 973 Program(2011CB302703).
References 1. Bowyer, K., Chang, K., Flynn, P.: A survey of approaches and challenges in 3d and multimodal 3d+2d face recognition. Computer Vision and Image Understanding 101(1), 1–15 (2006) 2. Chang, K.I., Bowyer, K.W., Flynn, P.J.: An evaluation of multimodal 2d+3d face biometrics. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(4), 619–624 (2005) 3. Lu, X., Jain, A.K., Colbry, D.: Matching 2.5d scans to 3d models. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(1), 31–43 (2006) 4. Mian, A.S., Bennamoun, M., Owens, R.A.: An efficient multimodal 2d-3d hybrid approach to automatic face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(11), 1927–1943 (2007) 5. Zhang, X., Zhao, S., Gao, Y.: On averaging face images for recognition under pose variations. In: Proc. of ICPR 2008, pp. 1–4 (2008) 6. Wang, L., Ding, L., Ding, X., Fang, C.: Improved 3d assisted poseinvariant face recognition. In: Proc. of ICASSP 2009, pp. 889–892 (2009) 7. Haar, A.: Zur theorie der orthogonalen funktionensysteme (German). Mathematische Annalen 69(3), 331–371 (1910) 8. Ojala, T., Pietikainen, M., Harwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recognition 29, 51–59 (1996) 9. Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(12), 2037–2041 (2006) 10. Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6), 915–928 (2007) 11. Vazquez, H., Reyes, E., Molleda, Y.: A new image division for lbp method to improve face recognition under varying lighting conditions. In: Proc. of ICPR 2008, pp. 1–4 (2008) 12. Shan, C., Gong, S., McOwan, P.: Facial expression recognition based on local binary patterns: a comprehensive study. Image and Vision Computing 27, 803–816 (2009)
Research on Augmented Reality Display Method of Scientific Exhibits Hui Yan1, Ruwei Yun1, Chun Liang2, Daning Yu3, and Baoyun Zhang1 1
Digital Entertainment Research Center, School of Education and Science, Nanjing Normal University, 210097, Nanjing, China 2 Telecommunication Engineering with Management, Beijing University of Posts and Telecommunications, Beijing, China 3 China Science and Technology Meseum, Beijing, China
[email protected]
Abstract. With the digital technology, the traditional display method is changing to the digital display method. This article is precisely launches the research under such background. First, we summarize the researches of the display methods of scientific exhibits in recent years. On the basis, we propose a display way of scientific exhibits based on mind mapping technique and augmented reality technology. And we also expound the implementation steps of the display way. Finally, we take the display of food chain for an example to illustrate the display method. Keywords: Augmented reality, Scientific exhibits, Display method.
1 Introduction With the development of computer technology and network technology, the display methods of scientific exhibits have great changes: changing from the showing methods of traditional text and pictures etc. to using virtual reality technology to dynamically display the science exhibits in the scientific and technological exhibition halls. And as the development of digital design technology, people’s demand for dynamic, interactive, three-dimensional visualization product is more and more urgent [1, 2]. This demand makes the display technology develop from the display walls and panels to the ways of story, plot and scene. Nowadays, with the increasing demand for science popularization and the changing of modern life style, simply using a small amount of scientific and technological exhibition halls to get scientific education can not meet the needs of many people [3]. But through the augmented reality display method, anyone can participate in the scientific education; without time limit, you can make full use of time; without geographical restrictions, you just need a personal computer. It can make the learners interested in participating in the scientific education. These excellent features make augmented reality technology popular in the fields of science popularization, teaching knowledge and ability training. It can make scientific education to have a sustainable and scientific development. Z. Pan et al. (Eds.): Transactions on Edutainment V, LNCS 6530, pp. 259–269, 2011. © Springer-Verlag Berlin Heidelberg 2011
260
H. Yan et al.
Based on the full applications of augmented reality technology, the augmented reality display method will be a new method of scientific education. Because augmented reality technology has the features of real-time interaction and combination of the virtual and reality, it has great advantages in improving the display interaction, visualizing the exhibits and stimulating the learners’ study interests. So this article draws augmented reality technology into the display of scientific exhibits. And we want to make use of the advantages of augmented reality technology to add new blood and vitality to the display of scientific exhibits.
2 Literature Review In recent years, the latest developments of computer graphics and interactive technology have improved the visual features and flexibility of the augmented reality technology and increased the vitality for the applications of augmented reality technology in the field of developing the teaching tools and other educational fields. The reason we applying augmented reality technology to school teaching is that it can provide a new way to show learning materials and more significant is that it also can bring a whole new way of interaction to us. Augmented reality technology has been widely used in developing teaching systems and tools, and a range of teaching products have been produced. In this situation, a large number of scholars devote themselves to this aspect of study. For example, Shelton and Hedley use augmented reality technology to display scientific exhibits. And through these scientific exhibits teachers can make undergraduates understand the relationship of nine planets. With augmented reality technology the display contents of can make the learners easily achieve learning interaction and perception feedback. Thus it can make the students easier to understand the display contents. They pointed out that: augmented reality display method can create powerful learning experience through a unique combination of visual and sensory information and it can make the students fundamentally change the way of understanding [4]. Illinois University has developed a virtual learning environment called “The Bee Dance Scientific-inquiry World”. With playing in the form of small bees in the virtual environment students can learn about the living habits of bees and other knowledge. In the process, students can record their observations and questions in order to discuss. Teachers can also give appropriate tips to enable students to observe [5, 6]. Billinghurst also researched that augmented reality display method has the unique advantages to be applied to instruction: (1) Using augmented reality display method can make learners interact with virtual objects fluently in a virtual and real environment; (2 ) Using augmented reality display method will create new teaching and learning strategies and it can make learners carry on the interactive study even if the students do not have any computer experience; (3) augmented reality display method enables learners to immerse themselves in learning content, and augmented reality changes the traditional learning way by which learners only face the static text [7]. Since 2005 ADETTI and SbH (Solutions by Heart) have developed a multimedia interactive book (miBook) [8]. The book uses augmented reality technology and
Research on Augmented Reality Display Method of Scientific Exhibits
261
multimedia technology to present audio-visual contents and can make readers interact with audio-visual contents. Dias also has some researches on the applications of augmented reality in miBook [9]. He has pointed out that: through the augmented reality technology miBook enables users to experience multiple sensory stimuli when they are appreciating the book; because visual text is easily to understand, it can enhance the learning process and make the readers quickly understand the knowledge. The “earth homeland exhibition hall” in Shanghai science center has an exhibit named “the refuse classification”. The audience can interact with the exhibits while observing them. Through the body movement of the audience seized by the camera transducer, the exhibit can make the audience classify the trash. This way can certainly strengthen the demonstration effect of the exhibits. This science method has the characteristic of vivacity and it also has much more affinity compared to the display way of pictures and words [10]. In recent years, related researches have been brought augmented reality technology into the display of scientific exhibits and also have been gradually introduced into the classroom display of scientific knowledge. But it is deficient in the combination of the display contents and the augmented reality technology. So in this article we use augmented reality technology to display the scientific knowledge, and at the same time we introduce mind mapping technique to organize the display contents. We try to use mind mapping to effectively organize the display contents. And we also want to try our best to make certain breakthroughs in the aspect of improving the learners’ divergent thinking ability, enhancing the display interactivity and stimulating the learners’ study interests and so on.
3 Augmented Reality Technology and Mind Mapping Technique 3.1 The Definition of Augmented Reality Technology Augmented reality is a technology which integrates the computer-generated scene to the real world. It provides a mixed scene of virtual information and real scene for users through the display devices such as head mounted display (HMD), glasses, projectors, general displays and even mobile phone screens. It can make users interact in a more natural way with the real and virtual objects in the environment. Augmented reality technology expands and supplements the real world rather than substitutes the real world completely. Ronald Azuma in HRL laboratory thought that augmented reality must have three characteristics——the combination of virtual and real world, real-time interaction, three dimensional register [11]. 3.2 The Hardware of Augmented Reality System The hardware of Augmented Reality system mainly has the following three parts, as shown in Fig.1: (1)
Camera: It is the input device of the augmented reality system. It takes the images from the real world for the computer to have some corresponding processing.
262
(2)
(3)
H. Yan et al.
Computer: It mainly processes the images of the real environment which are generated by the camera. It converts the virtual object to the corresponding virtual coordinate space. Then it adds the virtual object to the marker of the real environment through the computer monitor. Display Device: It is the output device of the augmented reality system. It is used to display the mixed images of virtual objects and real scenes. The display devices of the augmented reality system mainly include Head Mount Display (HMD), handheld display, computer monitor and projector, etc. [12].
3.3 The Development Tools of Augmented Reality System The difficulties of developing an augmented reality system are how to calculate the accurate position and orientation of the observer’s viewpoint relative to the real world and make virtual scenes seamlessly integrate with the real world. The popular three-dimensional registration methods include the three-dimensional registration based on the marker and the three-dimensional registration based on natural features. And currently the three-dimensional registration based on the marker is widely used in augmented reality system. The augmented reality system of this article registers the information by the three-dimensional registration based on the marker. The development tools of augmented reality system also include 3D Studio Max/Maya which is used to create three-dimensional models of the virtual environment and Photoshop CS3 which is used to make markers of the real environment.
Fig. 1. The Devices of Augmented Reality Environment
Research on Augmented Reality Display Method of Scientific Exhibits
263
3.4 The Introduction of Mind Mapping Technique Although the modern demonstration environment has provided various splendid exhibits for the audiences, certain audiences still could not raise the interest to the exhibition item. How to stimulate audience’s interest and enhance the efficiency of the demonstration has been the key point which the designer of the exhibit pays attention to. Therefore this article introduces mind mapping technique as the demonstration theory. We use mind mapping to organize the display contents. Letting the audience participate in the design and organization of the exhibits, we can enhance the interest of the audience and the efficiency of the demonstration. Mind mapping is invented by Tony Buzan in 60s of the 20th century. It is a thinking tool which can help people to learn and think. Tony Buzan believed that mind mapping had four essential features: (1) focus of the attention concentrates clearly on the central graph; (2) the branches of the subject radiate from the centre to all directions; (3) the branches consist of essential graphs or key words; (4) the branches form a node structure [13]. Mind mapping enables people not only to take note of the focal points clearly and quickly but also to think in a more clear and effective way. We can use lines, colors, arrows and branches to draw mind mappings. It can help us organize the complex ideas and enhance the ability of understanding. It has great advantages in cultivating the students’ learning interest, maintaining the students’ learning motivation, improving the students’ thinking ability and other aspects. It is the reason that we choose mind mapping technique to organize the display contents.
4 The Augmented Reality Display Method of Scientific Exhibits Applying augmented reality display method to the exhibition of scientific exhibits in the class includes two main processes: One is making mind mapping around the teaching topics, using mind mapping to organize the display contents; the other is showing the scientific contents through augmented reality technology according to the mind mapping. 4.1 The General Procedures to Organize Display Contents through Mind Mapping Technique The first step to show scientific contents through augmented reality technology is to make the necessary mind mapping of the related scientific contents. Tony Buzan pointed out that to conduct a mind mapping usually includes five steps: Step1. Draw the mind mapping on the centre of the paper. It can make the people’s mind radiate to all directions and make the people express their thoughts freely and naturally. Step2. Utilize a picture or a drawing to express your key thoughts. It can make the author have full imagination because a picture equals to 1000 words. Step3. Paint color when you are drawing. Color the same as images can make the brain excited. Color can add a sense of jump and vitality to mind mapping and a great energy to creative thinking.
264
H. Yan et al.
Step4. Connect the central image with the main branches. Then connect the main branches with the secondary branches, and so on. Step5. Make the branches of mind mapping curved naturally rather than a straight line, because the human brain will be tired of straight lines. Step6. Use a keyword on each line, in order to clearly express the relationship between information and topics. 4.2 The General Design Procedures of Augmented Reality Display Method After using mind mapping to organize the scientific contents, we will show the contents through augmented reality technology according to the mind mapping. The specific steps are as follows: Step1. Present the concepts of mind mapping in the way of augmented reality. First of all, we make 3D models by 3Dmax according to the true images of the concepts. Second, we make the markers of the concepts by Photoshop CS3. Finally, we write the program by Visual Studio.Net 2003 and ARToolKit. Through the program we can add 3D virtual objects in the virtual scene on the marker and display them on the monitor. Fig.2 is the augmented reality display of the lion. Step2. Make the lines of mind mapping. In the previous mind mapping, the lines are drawn by pencil. And in the augmented reality teaching system, we replace lines by solid plastic arrowheads. Step3. Make the key words of mind mapping. We write the key words on the paper and stick them on the arrowheads. Through this way we illuminate the relationship of the concepts in mind mapping. Step4. Make the animation of two related concepts or information of topics. We will present the relationship among concepts through animation.
Fig. 2. The Augmented Reality Display of the Lion
Research on Augmented Reality Display Method of Scientific Exhibits
265
4.3 Implement the Augmented Reality Display Method for the Food Chain Scientific Exhibits The elementary science is a curriculum which includes colorful natural phenomenon and vivid and direct-viewing experiments. The difficulty of teaching the elementary science is how to make students’ study from perceptual knowledge to the logical thinking. And we think the mind mapping invented by Tony Buzan is a feasible strategy to overcome the difficulty. Mind mapping is a kind of visual organization tool. The knowledge expression of mind mapping is fit for the instruction of the science. It can make the students easier to grasp the scientific knowledge structure, understand the abstract concepts, promote the logical thinking and strengthen the memory. We use the augmented reality display method to show the food chain scientific knowledge of science classes. The process of the design is divided into two parts: the first part is to make the mind mapping of food chain; the second part is to apply augmented reality technology to the mind mapping of food chain. Concrete steps are as follows: Step1. The teacher explains the method of making mind mapping. Mind mapping is a new learning strategy for the students. So before the main teaching the teacher should take a simple and concrete example to explain the process of making a mind mapping. It can make the students familiar with the method of making a mind mapping. It is the basis of the following teaching. Step2. The teacher explains the subject of the food chain. The teacher first has a simple explanation of the relations of the creatures belongs to the food chain. The teacher should request the students to imagine the thoughts of this subject and recall the related knowledge they have learned. And let them write the mind mapping on the paper. Then, the teacher uses the courseware to teach the content of the subject and writes the names of the creatures on the blackboard. Step3. Let the students to construct a simple food chain through mind mapping. After teaching the content of the subject, the teacher should ask the students to construct a simple food chain through mind mapping according to the creatures teacher has taught or the content of the textbook. The students should hand in the works to the teacher. Step4. Make the final food chain. The teacher collects the students’ works and makes some corresponding generalization and supplement. The final food chain is shown in Fig.3. Step5. Using augmented reality technology to display the mind mapping of food chain. In this instance the hardware of the augmented reality display method includes: CMOS general camera of acquisition rate of 30 frames per second in 1.3 million pixel resolution, 1G computer memory and general computer monitor. This instance uses Visual Studio.Net 2003 and ARToolKit as development tools. In the ARToolKit we develop a one-way interactive teaching tool through calibrating
266
H. Yan et al.
Sheep
Cattle Eat
Eat Grass
Eat
Eat
Lion
Fig. 3. The Food Chain
the camera, training the templates, programming and other steps; With 3DS Max 8.0 and OpenGL we product and render the virtual models into the scene; With Photoshop CS3 we make the real scene markers. After selecting the hardware and development tools of the augmented reality system, we use augmented reality technology to transform the food chain into augmented reality teaching system. Using augmented reality display technology, we can completely stimulate the learning interest of the students and enhance the students’ understanding of the knowledge. Specific conversion steps are as follows: (1)To make 3D models of the food chain (Table 1). First collect the pictures of the creatures. Then we make the 3D models by 3Dmax according to the collected pictures. (2)Although the marker of the augmented reality system can be displayed by many ways, the simplest way is to design a marker with patens on the cards. So in this augmented reality system we use graphics cards as the markers, as shown in Fig.4. (3)We replace the lines of the food chain by solid arrowheads which are made of solid plastic. (4)Make the key words on the lines. We write the key words on the paper and stick the paper on the solid arrowheads. Through this way we illuminate the relationship of the concepts in mind mapping. (5)Make the animation of two related concepts or information of topics. We will present the relationship of the creatures through animation. The animations of the food chain include: cattle eat grass, sheep eat grass and lions eat sheep and cattle. (6)The augmented reality teaching system is shown in Fig.5.
Research on Augmented Reality Display Method of Scientific Exhibits
267
This augmented reality display method's biggest merit is that it can let the students have cognizance of the marvelous food chain in the classroom without going outside. The 3D models and the animations of the augmented reality system can help the student easily understand the biological relations in the food chain. Thus it enables the students to maintain memory for a long time. And the animations can also enhance the students’ learning interest and the intrinsic motive.
Table 1. The 3D Models of the Creatures
Name
Grass
Cattle
Lion
Sheep
Picture
3D Model
268
H. Yan et al.
Fig. 4. The Markers of the Augmented Reality Teaching System
Fig. 5. The Augmented Reality Teaching System of the Food Chain
5 Conclusions and Future Work The augmented reality display method of this article is a fusion of mind mapping technique and the augmented reality technology. We use direct-viewing iconicity of the augmented reality display technology to present abstract concepts. And we also use mind mapping technique to organize the display contents. The augmented reality display method has great advantages in improving students’ divergent thinking, enhancing the classroom interactivity and stimulating students’ learning interest.
Research on Augmented Reality Display Method of Scientific Exhibits
269
Augmented reality is a very new display technology. At present there are few researches on the display patterns of the knowledge. How to give full play to the potential of augmented reality technology to be used in education and which kind of teaching the augmented reality technique is suitable for need to be deeply studied. Acknowledgments. Our research is supported by National Education and Science “Eleventh Five-Year Plan” Project No. EEA090388 and 2010th China Science and Technology Museum Research Project “The Research on Scientific knowledge Learning Based on Digital-game Display Mode”.
References 1. Wu, F., Pan, Z., Chen, T., Xu, B.: Distributed Prototype System of Furniture’s 3D Displaying and Customization. J. Computer Application, 78–81 (2003) 2. Li, X., Lu, C., Li, X.: Research on Interactive Virtual Presentation Technology Based on Web. J. Computer Engineering and Applications 145, 90–92 (2007) 3. Ren, F., Lei, Q.: Chinese Popular Science Report (2008) 4. Shelton, B., Hedley, N.: Using Augmented Reality for Teaching Earth-Sun Relationships to Undergraduate Geography Students. In: J. Augmented Reality Toolkit-The First IEEE International Workshop, pp. 84–115 (2002) 5. Madjid, M., Abdennour, E.R., Yuanyuan, S.: Interactive Storytelling: Approaches and Techniques to Achieve Dynamic Stories. J. Transactions on Edutainment I, 118–134 (2008) 6. Coombe, G., Salomon, B.: SKIT: A Computer-Assisted Sketch Instruction Tool. In: Pan, Z., Aylett, R.S., Diener, H., Jin, X., Göbel, S., Li, L. (eds.) Edutainment 2006. LNCS, vol. 3942, pp. 251–260. Springer, Heidelberg (2006) 7. Billinghurst, M.: Augmented Reality in Education, http://www.newhorizons.org/strategies/technology/ billinghurst.htm 8. SbH – Solutions by Heart, http://www.mibook.org, http://www.solutionsbyheart.com 9. Dias, R.: Technology Enhanced Learning and Augmented Reality: An Application on Multimedia Interactive Books. International Business & Economics Review, 69–79 (2009) 10. Li, C., Wu, L.: Research on Design and Display of Contemporary Science and Technology Museum. J. Decoration, 28–29 (2007) 11. Azuma, R., Baillot, Y., Behringer, R., Feiner, S., Julier, S., MacIntyre, B.: Recent Advances in Augmented Reality. In: Computer Graphics and Applications, pp. 34–47. IEEE, Los Alamitos (2001) 12. Fan, B., Liao, Y., Xue, W.: A Pilot Study of Game-Base Learning Systems with Augmented Reality. J. Science and Technology, 27–28 (2008) 13. Tony, B., Tony, D., Israel, R.: Brookfield. Gower, VT (1999)
Author Index
Ashraf, Golam
177
Mao, Keji 80 Mirri, Silvia 35
Cairns, Paul 1 Chen, Lei 214 Correia, Nuno 224
Niezen, Gerrit
Roccetti, Marco 35 Rom˜ ao, Teresa 224
Dong, Feng 158 Dong, Jun 112 Ersotelos, Nikolaos Feijs, Loe
158
35
Han, Guoqiang 189 He, Yuyong 202 Hodhod, Rania 1 Hu, Jun 132 Huang, Jie 62 Jesus, Rui Jiang, Wei
Salomoni, Paola 35 Sun, Yanfeng 251 Tang, Hengliang Tian, Qing 80
132
Gay, Gregory R. Ge, Yun 251
132
224 214
Kudenko, Daniel 1 Kwak, Matthijs 132 Li, Guiqing 189 Li, Jian 62 Li, Mo 177 Liang, Chun 259 Liang, Ronghua 80 Lin, Hongwei 122 Lin, Qiaomin 50 Lin, Yi 147 Liu, Yue 147 Lu, Min 71 Luo, HuiMin 90 Mao, Guohong 122 Mao, Jianfei 80
251
van der Vlist, Bram
132
Wang, JianMin 90 Wang, Ling 240 Wang, Ruchuan 50 Wu, Enhua 104 Xiao, Rui 62 Xiong, Yunhui 189 Xu, Dihua 50 Xue, Huanzhen 112 Yan, Hui 259 Yin, Baocai 251 You, Fang 90 Yu, Daning 259 Yu, Jinhui 122 Yun, Ruwei 259 Zhang, Baoyun 259 Zhang, Junsong 122 Zhang, Lin 240 Zhang, Mingmin 202 Zhang, Xianjun 112 Zhao, Haibo 71 Zhao, Qi 112 Zhao, Shujie 104 Zhao, Zhen 50 Zhou, Changle 122