Software Language Engineering: Second International Conference, SLE 2009, Denver, CO, USA, October 5-6, 2009 Revised Selected Papers (Lecture Notes ... Programming and Software Engineering)
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
5969
Mark van den Brand Dragan Gaševi´c Jeff Gray (Eds.)
Software Language Engineering Second International Conference, SLE 2009 Denver, CO, USA, October 5-6, 2009 Revised Selected Papers
13
Volume Editors Mark van den Brand Dept. of Mathematics and Computer Science, Software Engineering and Technology Eindhoven University of Technology Den Dolech 2, 5612 AZ Eindhoven, The Netherlands E-mail: [email protected] Dragan Gaševi´c School of Computing and Information Systems Athabasca University 1 University Drive, Athabasca, AB T9S 3A3, Canada E-mail: [email protected] Jeff Gray Department of Computer Science University of Alabama P.O. Box 870290, Tuscaloosa, AL, USA E-mail: [email protected]
Library of Congress Control Number: 2010922313 CR Subject Classification (1998): D.2, D.3, I.6, F.3, K.6.3 LNCS Sublibrary: SL 2 – Programming and Software Engineering ISSN ISBN-10 ISBN-13
0302-9743 3-642-12106-3 Springer Berlin Heidelberg New York 978-3-642-12106-7 Springer Berlin Heidelberg New York
We are pleased to present the proceedings of the Second International Conference on Software Language Engineering (SLE 2009). The conference was held in Denver, Colorado (USA) during October 5–6, 2009 and was co-located with the 12th IEEE/ACM International Conference on Model-Driven Engineering Languages and Systems (MODELS 2009) and the 8th ACM International Conference on Generative Programming and Component Engineering (GPCE 2009). The SLE conference series is devoted to a wide range of topics related to artificial languages in software engineering. SLE is an international research forum that brings together researchers and practitioners from both industry and academia to expand the frontiers of software language engineering. SLE’s foremost mission is to encourage and organize communication between communities that have traditionally looked at software languages from different, more specialized, and yet complementary perspectives. SLE emphasizes the fundamental notion of languages, as opposed to any realization in specific technical spaces. In this context, the term “software language” comprises all sorts of artificial languages used in software development, including general-purpose programming languages, domain-specific languages, modeling and meta-modeling languages, data models, and ontologies. Software language engineering is the application of a systematic, disciplined, quantifiable approach to the development, use, and maintenance of these languages. The SLE conference is concerned with all phases of the lifecycle of software languages; these include the design, implementation, documentation, testing, deployment, evolution, recovery, and retirement of languages. Of special interest are tools, techniques, methods, and formalisms that support these activities. In particular, tools are often based on, or automatically generated from, a formal description of the language. Hence, the treatment of language descriptions as software artifacts, akin to programs, is of particular interest—while noting the special status of language descriptions and the tailored engineering principles and methods for modularization, refactoring, refinement, composition, versioning, co-evolution, and analysis that can be applied to them. The response to the call for papers for SLE 2009 was quite enthusiastic. We received 79 full submissions from 100 initial abstract submissions. From those 79 submissions, the Program Committee selected 23 papers: 15 full papers, 6 short papers, and 2 tool demonstration papers, resulting in an acceptance rate of 29%. To ensure the quality of the accepted papers, each submitted paper was reviewed by at least three PC members. Each paper was discussed in detail during a week-long electronic PC meeting, as facilitated by EasyChair. The conference was quite interactive, and the discussions provided additional feedback to the authors. Accepted papers were then revised based on the reviews, in some cases a PC discussion summary, and feedback from the conference. The
VI
Preface
final versions of all accepted papers are included in this proceedings volume. The resulting program covered diverse topics related to software language engineering. The papers cover engineering aspects in different phases of the software language development lifecycle. These include the analysis of languages in the design phase and their actual usage after deployment. The papers also represent various tools and techniques used in language implementations, including different approaches to language transformation and composition. The organization of these papers in this volume reflects the sessions in the original program of the conference. SLE 2009 had two renowned keynote speakers: Jim Cordy (a joint keynote talk with GPCE 2009) and Jean B´ezivin. They each provided informative and entertaining keynote talks. Trying to address the problems of complexity, usability, and adoption of generative and transformational techniques, Cordy’s keynote suggested using generative and transformational techniques to implement domain-specific languages. B´ezivin’s keynote discussed the many different possibilities where model-driven research and practice can advance the capabilities for software language engineering. The proceedings begin with short papers summarizing the keynotes to provide a broad introduction to the software language engineering discipline and to identify key research challenges. SLE 2009 would not have been possible without the significant contributions of many individuals and organizations. We are grateful to the organizers of MODELS 2009 for their close collaboration and management of many of the logistics. This allowed us to offer SLE participants the opportunity to take part in two high-quality research events in the domain of software engineering. The SLE 2009 Organizing Committee and the SLE Steering Committee provided invaluable assistance and guidance. We are especially grateful to the Software Engineering Center at the University of Minnesota for sponsoring the conference and for all the support and excellent collaboration. We must also emphasize the role of Eric Van Wyk in making this arrangement with the Software Engineering Center possible and his great help in acting as the SLE 2009 Finance Chair. We are also grateful to the PC members and the additional reviewers for their dedication in reviewing the large number of submissions. We also thank the authors for their efforts in writing and then revising their papers, and we thank Springer for publishing the papers in the proceedings. We are grateful to the developers of EasyChair for providing an open conference management system. Finally, we wish to thank all the participants at SLE 2009 for the energetic and insightful discussions that made SLE 2009 such an educational and fun event. January 2010
Mark van den Brand Dragan Gaˇsevi´c Jeff Gray
Organization
SLE 2009 was organized by Athabasca University, Eindhoven University of Technology, and the University of Alabama. It was sponsored by the Software Engineering Center of the University of Minnesota.
General Chair Dragan Gaˇsevi´c
Athabasca University, Canada
Program Committee Co-chairs Jeff Gray Mark van den Brand
University of Alabama, USA Eindhoven University of Technology, The Netherlands
Organizing Committee Alexander Serebrenik Bardia Mohabbati Marko Boˇskovi´c Eric Van Wyk James Hill
Eindhoven University of Technology, The Netherlands (Publicity Co-chair) Simon Fraser University, Canada (Web Chair) Athabasca University, Canada University of Minnesota, USA (Finance Chair) Indiana University/Purdue University, USA (Publicity Co-chair)
Program Committee Colin Atkinson Don Batory Paulo Borba John Boyland Marco Brambilla Shigeru Chiba Charles Consel Stephen Edwards Gregor Engels Robert Fuhrer Martin Gogolla Giancarlo Guizzardi Reiko Heckel
Universit¨ at Mannheim, Germany University of Texas, USA Universidade Federal de Pernambuco, Brazil University of Wisconsin-Milwaukee, USA Politecnico di Milano, Italy Tokyo Institute of Technology, Japan LaBRI / INRIA, France Columbia University, USA Universit¨ at Paderborn, Germany IBM Research, USA University of Bremen, Germany Federal University of Espirito Santo, Brazil University of Leicester, UK
VIII
Organization
Fr´ed´eric Jouault Nicholas Kraft Thomas K¨ uhne Julia Lawall Timothy Lethbridge Brian Malloy Kim Mens Marjan Mernik Todd Millstein Pierre-Etienne Moreau Pierre-Alain Muller Richard Paige James Power Daniel Oberle Jo˜ ao Saraiva Alexander Serebrenik Anthony Sloane Mary Lou Soffa Steffen Staab Jun Suzuki Walid Taha Eli Tilevich Juha-Pekka Tolvanen Jurgen Vinju Eelco Visser Ren´e Witte
INRIA & Ecole des Mines de Nantes, France University of Alabama, USA Victoria University of Wellington, New Zealand University of Copenhagen, Denmark University of Ottowa, Canada Clemson University, USA Universit´e catholique de Louvain, Belgium University of Maribor, Slovenia University of California, Los Angeles, USA Centre de recherche INRIA Nancy - Grand Est, France University of Haute-Alsace, France University of York, UK National University of Ireland, Ireland SAP Research, Germany Universidad do Minho, Portugal Eindhoven University of Technology, The Netherlands Macquarie University, Australia University of Virginia, USA Universit¨ at Koblenz-Landau, Germany University of Massachusetts, Boston, USA Rice University, USA Virginia Tech, USA MetaCase, Finland CWI, The Netherlands Delft University of Technology, The Netherlands Concordia University, Canada
Additional Reviewers Marcel van Amstel Emilie Balland Olivier Barais Paul Brauner Behzad Bordbar Johan Brichau Alfredo Cadiz Sergio Castro Loek Cleophas Cristobal Costa-Soria Duc-Hanh Dang Adwoa Donyina Nicolas Drivalos
Jo˜ ao Fernandes Frederic Fondement Xiaocheng Ge Danny Groenewegen Lars Hamann Kees Hemerik Karsten Hoelscher Lennart Kats Paul Klint Dimitros Kolovos Mirco Kuhlmann Nicolas Loriant Markus Luckey
Arjan van der Meer Muhammad Naeem Diego Ordonez Fernando Orejas Nicolas Palix Fernando Silva Perreiras Maja Pesic Zvezdan Protic Alek Radjenovic Ant´ onio Nestor Ribeiro M´ arcio Ribeiro Louis Rose Christian Soltenborn
Organization
Daniel Spiewak Tijs van der Storm Leopoldo Teixeira
Massimo Tisi Sander Vermolen Nicolae Vintilla
Tobias Walter Andreas W¨ ubbeke Tian Zhao
Steering Committee Mark van den Brand James Cordy Jean-Marie Favre Dragan Gaˇsevi´c G¨ orel Hedin Ralf L¨ammel Eric Van Wyk Andreas Winter
Technische Universiteit Eindhoven, The Netherlands Queen’s University, Canada University of Grenoble, France Athabasca University, Canada Lund University, Sweden Universit¨at Koblenz-Landau, Germany University of Minnesota, USA Johannes Gutenberg-Universit¨ at Mainz, Germany
Sponsoring Institutions
IX
Table of Contents
I
Keynotes
Eating Our Own Dog Food: DSLs for Generative and Transformational Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . James R. Cordy If MDE Is the Solution, Then What Is the Problem? . . . . . . . . . . . . . . . . . Jean B´ezivin
II
1 2
Regular Papers
Session: Language and Model Evolution Language Evolution in Practice: The History of GMF . . . . . . . . . . . . . . . . . Markus Herrmannsdoerfer, Daniel Ratiu, and Guido Wachsmuth A Novel Approach to Semi-automated Evolution of DSML Model Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tihamer Levendovszky, Daniel Balasubramanian, Anantha Narayanan, and Gabor Karsai Study of an API Migration for Two XML APIs . . . . . . . . . . . . . . . . . . . . . . Thiago Tonelli Bartolomei, Krzysztof Czarnecki, Ralf L¨ ammel, and Tijs van der Storm
3
23
42
Session: Variability and Product Lines Composing Feature Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathieu Acher, Philippe Collet, Philippe Lahire, and Robert France VML* – A Family of Languages for Variability Management in Software Product Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Steffen Zschaler, Pablo S´ anchez, Jo˜ ao Santos, Mauricio Alf´erez, Awais Rashid, Lidia Fuentes, Ana Moreira, Jo˜ ao Ara´ ujo, and Uir´ a Kulesza Multi-view Composition Language for Software Product Line Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mauricio Alf´erez, Jo˜ ao Santos, Ana Moreira, Alessandro Garcia, Uir´ a Kulesza, Jo˜ ao Ara´ ujo, and Vasco Amaral
62
82
103
XII
Table of Contents
Session: Short Papers Yet Another Language Extension Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . Anya Helene Bagge
123
Model Transformation Languages Relying on Models as ADTs . . . . . . . . . Jer´ onimo Iraz´ abal and Claudia Pons
133
Towards Dynamic Evolution of Domain Specific Languages . . . . . . . . . . . . Paul Laird and Stephen Barrett
144
ScalaQL: Language-Integrated Database Queries for Scala . . . . . . . . . . . . . Daniel Spiewak and Tian Zhao
154
Integration of Data Validation and User Interface Concerns in a DSL for Web Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Danny M. Groenewegen and Eelco Visser Ontological Metamodeling with Explicit Instantiation . . . . . . . . . . . . . . . . Alfons Laarman and Ivan Kurtev
164 174
Session: Parsing, Compilation, and Demo Verifiable Parse Table Composition for Deterministic Parsing . . . . . . . . . . August Schwerdfeger and Eric Van Wyk
184
Natural and Flexible Error Recovery for Generated Parsers . . . . . . . . . . . . Maartje de Jonge, Emma Nilsson-Nyman, Lennart C.L. Kats, and Eelco Visser
204
PIL: A Platform Independent Language for Retargetable DSLs . . . . . . . . Zef Hemel and Eelco Visser
224
Graphical Template Language for Transformation Synthesis . . . . . . . . . . . Elina Kalnina, Audris Kalnins, Edgars Celms, and Agris Sostaks
244
Session: Modularity in Languages A Role-Based Approach towards Modular Language Engineering . . . . . . . Christian Wende, Nils Thieme, and Steffen Zschaler Language Boxes: Bending the Host Language with Modular Language Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lukas Renggli, Marcus Denker, and Oscar Nierstrasz Declarative Scripting in Haskell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tim Bauer and Martin Erwig
254
274 294
Table of Contents
XIII
Session: Metamodeling and Demo An Automated Process for Implementing Multilevel Domain Models . . . . Fr´ed´eric Mallet, Fran¸cois Lagarde, Charles Andr´e, S´ebastien G´erard, and Fran¸cois Terrier Domain-Specific Metamodelling Languages for Software Language Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Steffen Zschaler, Dimitrios S. Kolovos, Nikolaos Drivalos, Richard F. Paige, and Awais Rashid
Eating Our Own Dog Food: DSLs for Generative and Transformational Engineering James R. Cordy School of Computing, Queen’s University Kingston, Ontario, Canada [email protected]
Abstract. Languages and systems to support generative and transformational solutions have been around a long time. Systems such as XVCL, DMS, ASF+SDF, Stratego and TXL have proven mature, efficient and effective in a wide range of applications. Even so, adoption remains a serious issue - almost all successful production applications of these systems in practice either involve help from the original authors or years of experience to get rolling. While work on accessibility is active, with efforts such as ETXL, Stratego XT, Rascal and Colm, the fundamental big step remains - it’s not obvious how to apply a general purpose transformational system to any given generation or transformation problem, and the real power is in the paradigms of use, not the languages themselves. In this talk I will propose an agenda for addressing this problem by taking our own advice - designing and implementing domain specific languages (DSLs) for specific generative, transformational and analysis problem domains. We widely advise end users of the need for DSLs for their kinds of problems - why not for our kinds? And we use our tools for implementing their DSLs - why not our own? I will outline a general method for using transformational techniques to implement transformational and generative DSLs, and review applications of the method to implementing example text-based DSLs for model-based code generation and static code analysis. Finally, I will outline some first steps in implementing model transformation DSLs using the same idea - retaining the maturity and efficiency of our existing tools while bringing them to the masses by “eating our own dogfood”.
M. van den Brand, D. Gaˇ sevi´ c, J. Gray (Eds.): SLE 2009, LNCS 5969, p. 1, 2010. c Springer-Verlag Berlin Heidelberg 2010
If MDE Is the Solution, Then What Is the Problem? Jean Bézivin AtlanMod research team INRIA and EMNantes Nantes, France [email protected]
For nearly ten years, modern forms of software modeling have been used in various contexts, with good apparent success. This is a convenient time to reflect on what has been achieved, where we stand now, and where we are leading to with Model-Driven Engineering (MDE). If there is apparently some consensual agreement on the core mechanisms, it is much more difficult to delimitate the scope and applicability of MDE. The three main questions we have to answer in sequence are: 1. What is a model? 2. Where are models coming from? 3. What may models be useful for? There is now some consensus in the community about the answer to the first question. A (terminal) model is a graph conforming to another graph usually called its metamodel, and this terminal model represents a system. Terminal models and their metamodels are similarly organized and may be unified as abstract models, yielding a regular organization. In such an organization, some of the models (e.g., a transformation) may be executable. The relation of conformance between a terminal model and its metamodel provides most of the information on the first question. The second question about the origin of models is much more difficult to answer and is still the central challenge of computer science. This is more related to the representation relation between a terminal model and a system. Different situations could be considered here (e.g., a system derived from a model, a model derived from a system, or system and model co-existence), but there are basically two possibilities to create a model: by transformation or by observation of a system, the second one being much more important and much less understood. The discovery of a terminal model from a system is always made by an observer (possibly but rarely automated), with a goal and a precise metamodel. Making explicit this discovery process represents one of the most important and urgent open research issues in MDE. When we have answered the second question about model creation methodology, it is then easier to answer the third question about usability. There are three main categories of MDE application related to forward engineering (mainly software artifact production from models), to reverse engineering (primarily legacy code analysis) and to general interoperability problems (when two heterogeneous systems must interact). Instead of solving the direct interaction problems between the heterogeneous systems, it seems advantageous to represent these systems by models (possibly conforming to different metamodels) and to use generic Model-Driven Interoperability (MDI) techniques.
Language Evolution in Practice: The History of GMF Markus Herrmannsdoerfer1, Daniel Ratiu1 , and Guido Wachsmuth2 1
Institut f¨ ur Informatik Technische Universit¨ at M¨ unchen Boltzmannstr. 3, 85748 Garching b. M¨ unchen, Germany {herrmama,ratiu}@in.tum.de 2 Institut f¨ ur Informatik Humboldt-Universit¨ at zu Berlin Unter den Linden 6, 10099 Berlin, Germany [email protected]
Abstract. In consequence of changing requirements and technological progress, software languages are subject to change. The changes affect the language’s specification, which in turn affects language processors as well as existing language utterances. Unfortunately, little is known about how software languages evolve in practice. This paper presents a case study on the evolution of four modeling languages provided by the Graphical Modeling Framework. It investigates the following research questions: (1) What is the impact of language changes on related software artifacts?, (2) What activities are performed to implement language changes? and (3) What kinds of adaptations capture the language changes? We found out that the language changes affect various kinds of related artifacts; the distribution of the activities performed to evolve the languages mirrors the classical software maintenance activities, and most language changes can be captured by a small suite of operators that can also be used to migrate the language utterances.
1
Introduction
Software languages change [1]. A software language, as any other piece of software, is designed, developed, tested, and maintained. Requirements, purpose, and scope of software languages change, and they have to be adapted to these changes. This applies particularly to domain-specific languages that are specialized to a specific problem domain, as their specialization causes them to be vulnerable with respect to changes of the domain. But general-purpose languages like Java or the UML evolve, too. Typically, their evolution is quite slow and driven by heavy-weighted community processes. Software language evolution implicates a threat for language erosion [2]. Typically, language processors and tools do no longer comply with a changing language. But we do not want to build language processors and tools from scratch every time a language changes. Thus, appropriate co-evolution strategies are M. van den Brand, D. Gaˇ sevi´ c, J. Gray (Eds.): SLE 2009, LNCS 5969, pp. 3–22, 2010. c Springer-Verlag Berlin Heidelberg 2010
4
M. Herrmannsdoerfer, D. Ratiu, and G. Wachsmuth
required. In a similar way, language utterances like programs or models might become inconsistent with a changing language. But these utterances are valuable assets for language users making their co-evolution a serious issue. Software language engineering [3,4] evolves as a discipline to the application of a systematic approach to the design, development, maintenance, and evolution of languages. It concerns various technological spaces [5]. Language evolution affects all these spaces: Grammars evolve in grammarware [6], metamodels evolve in modelware [2], schemas evolve in XMLware [7] and dataware [8], ontologies evolve [9], and APIs evolve [10], too. In this paper, we focus on the technological space of modelware. There is an ever increasing variety of domain-specific modeling languages each developed by a small group of programmers. These languages evolve frequently to meet the requests of their users. Figure 1 illustrates the status quo: modeling languages come with a series of artifacts (e. g. editors, translators, code generators) centered around a metamodel that defines the language syntax. The ever increasing number of language users (usually decoupled from language developers) build many models by using these languages. As new features need to be incorporated, languages evolve, requiring the co-evolution of existing models.
Fig. 1. Development and evolution of modeling languages
In this paper, we investigate the evolution of modeling languages by reengineering the evolution of their metamodels and the migration of related software artifacts. Our motivation is to identify requirements for tools that support the (semi-)automatic coupled evolution of modeling languages and related artifacts in a way that avoids the language erosion and minimizes the handwritten code for migration. As a case study we investigated the evolution of the four modeling languages provided by the Graphical Modeling Framework (GMF). We focus on the following research questions: – RQ1) What is the impact of language changes on related software artifacts? As the metamodel is in the center of the language definition, we are interested to understand how other artifacts change, when the metamodel changes.
Language Evolution in Practice: The History of GMF
5
– RQ2) What activities are performed to implement language changes? We investigate the distribution of the activities performed to implement metamodel changes in order to examine the similarities between the evolution of programs and the evolution of languages. – RQ3) What kinds of adaptations capture the language changes? We are interested to describe the metamodel changes based on a set of canonical adaptations, and thereby to investigate the measure in which these adaptations can be used to migrate the models. Outline. In Section 2, we introduce the Graphical Modeling Framework as our case study. We present our approach to retrace the evolution of metamodels in Section 3. In Section 4, we answer the research questions from both a quantitative and qualitative point of view. We interpret and discuss the results of the case study in Section 5 by focusing on lessons learned and threats to the study’s validity. In Section 6, we present work related to the investigation of language evolution, before we conclude in Section 7.
2
Graphical Modeling Framework
The Graphical Modeling Framework (GMF)1 is a widely used open source framework for the model-driven development of diagram editors. GMF is a prime example for a Model-Driven Architecture (MDA) [11], as it strictly separates platform-independent models (PIM), platform-specific models (PSM) and code. GMF is implemented on top of the Eclipse Modeling Framework (EMF)2 and the Graphical Editing Framework (GEF)3 . 2.1
Editor Models
In GMF, a diagram editor is defined by models from which editor code can be generated automatically. For this purpose, GMF provides four modeling languages, a transformator that maps PIMs to PSMs, a code generator that turns PSMs into code, and a runtime platform on which the generated code relies. The lower part of Fig. 2 illustrates the different kinds of GMF editor models. On the platform-independent level, a diagram editor is modeled from four different views. The domain model focuses on the abstract syntax of diagrams. The graphical definition model defines the graphical elements like nodes and edges in the diagram. The tool definition model defines the tools available to author a diagram. In the mapping model, the first three views are combined to an overall view which maps the graphical elements from the graphical definition model and the tools from the tool definition model onto the domain model elements from the domain model. 1 2 3
see GMF website http://www.eclipse.org/modeling/gmf see EMF website http://www.eclipse.org/modeling/emf see GEF website http://www.eclipse.org/gef
6
M. Herrmannsdoerfer, D. Ratiu, and G. Wachsmuth
(0)
3,0
360 HFRUH
FRGH
PHWDPRGHO
FRQIRUPVWR GHSHQGVRQ
PRGHO
*0)$SSOLFDWLRQ
*0)/DQJXDJHV
*0)
WUDQVIRUPVWR JPIJUDSK
PDSSLQJV
WUDQVIRUPDWRU -DYD
JPIJHQ
JHQHUDWRU -(7;SDQG
-DYD
PDSSLQJ PRGHO
WUDQVIRUP
GLDJUDP JHQHUDWRU PRGHO
JHQHUDWH
GLDJUDP HGLWRU
WRROGHI
JUDSKLFDO GHILQLWLRQ PRGHO
GRPDLQ PRGHO
WRRO GHILQLWLRQ PRGHO
Fig. 2. Languages involved in the Graphical Modeling Framework
The platform-independent mapping model is transformed into a platformspecific diagram generator model. This model can be altered to customize the code generation. 2.2
Modeling Languages
We can distinguish two kinds of languages involved in GMF. First, GMF provides domain-specific languages for the modeling of diagram editors. Each of these languages comes with a metamodel defining its abstract syntax and a simple tree-based model editor integrated in Eclipse. The upper part of Fig. 2 shows the metamodels involved in GMF. These are ecore for domain models, gmfgraph for graphical definition models, tooldef for tool definition models, mappings for mapping models, and gmfgen for diagram generator models. The mappings metamodel refers to elements in the ecore, gmfgraph, and tooldef metamodels. This kind of dependency is typical for multi-view modeling languages. For example, there are similar dependencies between the metamodel packages defining the various sublanguages of the UML. Second, GMF itself is implemented in various languages. All metamodels are expressed in ecore, the metamodeling language provided by EMF. EMF is an implementation of Essential MOF which is the basic metamodeling standard proposed by the Object Management Group (OMG) [12]. Notably, the ecore metamodel conforms to itself. Additionally, the metamodels contain context constraints which are attached as textual annotations to the metamodel elements
Language Evolution in Practice: The History of GMF
7
to which they apply. These constraints are expressed in the Object Constraint Language (OCL) [13]. The transformator from a mapping model to a generator model is implemented in Java. For model access, it relies on the APIs generated from the metamodels of the GMF modeling languages. The generator generates code from a generator model. It was formerly implemented in Java Emitter Templates (JET)4 , which was later changed in favor of Xpand5 . The generated code conforms to the Java programming language, and is based on the GMF runtime platform. 2.3
Metamodel Evolution
With a code base of more than 600k lines of code, GMF is a framework of considerable size. GMF is implemented by 13 developers from 3 different countries using an agile process with small development cycles. Since starting the project, the GMF developers had to adapt the metamodels a significant number of times. As a number of metamodel changes were breaking the existing models, the developers had to manually implement a migrator. Figure 3 quantifies the metamodel evolution for the two release cycles we studied, each taking one year. The figures show the number of metamodel elements for each revision of each GMF metamodel. During the evolution from release 1.0 to release 2.1, the number of classes defined by all metamodels e. g. increased from 201 to 252. We chose GMF as a case study, because the evolution is extensive, publicly available, and well documented by means of commit comments and change requests. However, the evolution is only available in the form of revisions from the version control system, and its documentation is only informal.
3
Investigating the Evolution
Due to the considerable size of the GMF metamodels, we developed a systematic approach to investigate its evolution as presented in the following subsections. 3.1
Modeling the History
To investigate the evolution of modeling languages, we model the history of their metamodels. In the history model, we capture the evolution of metamodels as sequences of metamodel adaptations [14,15]. A metamodel adaptation is a well-understood transformation step on metamodels. We provide a metamodel for history models as depicted in Figure 4. The History of a modeling language is subdivided into a number of releases. A Release denotes a version of the modeling language which has been deployed, and for which models can thus exist. Modeling languages are released at a certain date, and are tagged by a certain version number. A Release is further subdivided into 4 5
see JET website http://www.eclipse.org/modeling/m2t see Xpand website http://www.openarchitectureware.org
8
M. Herrmannsdoerfer, D. Ratiu, and G. Wachsmuth
5HOHDVH
5HOHDVH
($QQRWDWLRQ
($QQRWDWLRQ
((QXP
(3DUDPHWHU (2SHUDWLRQ
(5HIHUHQFH
($WWULEXWH
((QXP/LWHUDO 1XPEHU
1XPEHU
((QXP/LWHUDO
((QXP (3DUDPHWHU
(2SHUDWLRQ (5HIHUHQFH ($WWULEXWH
(&ODVV
(&ODVV
(3DFNDJH
(3DFNDJH
(a) tooldef metamodel.
5HYLVLRQ
5HYLVLRQ
5HOHDVH
(b) gmfgraph metamodel.
5HOHDVH
1XPEHU
($QQRWDWLRQ
($QQRWDWLRQ
((QXP/LWHUDO
((QXP/LWHUDO
((QXP
((QXP
(3DUDPHWHU
(2SHUDWLRQ (5HIHUHQFH
1XPEHU
(3DUDPHWHU
(2SHUDWLRQ
(5HIHUHQFH
($WWULEXWH
($WWULEXWH
(&ODVV
(&ODVV
(3DFNDJH
(3DFNDJH
5HYLVLRQ
5HYLVLRQ
(c) mappings metamodel.
(d) gmfgen metamodel.
Fig. 3. Statistics of metamodel evolution &RPPLW 5HOHDVH +LVWRU\
UHOHDVHV
GDWH'DWH YHUVLRQ6WULQJ
FRPPLWV
GDWH'DWH YHUVLRQ6WULQJ FRPPHQW6WULQJ DXWKRU6WULQJ
DGDSWDWLRQV
$GDSWDWLRQ
Fig. 4. Modeling language history
a number of commits. A Commit denotes a version of the modeling language which has been committed to the version control system. Modeling languages are committed at a certain date, by a certain author, with a certain comment, and are tagged by a certain version number. A Commit consists of the sequence of adaptations which have been performed since the last Commit. 3.2
Operator Suite
The metamodel for history models includes an operator suite for stepwise metamodel adaptation. As is depicted in Figure 5, each operator subclasses the abstract class Adaptation. Furthermore, we classify each operator according to four different criteria: Granularity. Similar to [16], we distinguish primitive and compound operators. A Primitive supports a metamodel adaptation that can not be decomposed into
Language Evolution in Practice: The History of GMF
*UDQXODULW\
9
$GDSWDWLRQ
3ULPLWLYH
&RQWHQW3ULPLWLYH
&RPSRXQG
9DOXH3ULPLWLYH
0HWDPRGHO$VSHFW
$GDSWDWLRQ
6WUXFWXUDO$GDSWDWLRQ
&RQVWUDLQW$GDSWDWLRQ
/DQJXDJH([SUHVVLYHQHVV
&RQVWUXFWRU
0RGHO0LJUDWLRQ
'RFXPHQWDWLRQ$GDSWDWLRQ
$GDSWDWLRQ
'HVWUXFWRU
5HIDFWRULQJ
$GDSWDWLRQ
3UHVHUYLQJ$GDSWDWLRQ
&XVWRP0LJUDWLRQ
$3,$GDSWDWLRQ
PLJUDWLRQ
%UHDNLQJ$GDSWDWLRQ
&XVWRP$GDSWDWLRQ
&RXSOHG$GDSWDWLRQ
Fig. 5. Classification of operators for metamodel adaptation
smaller adaptation steps. In contrast, a Compound adaptation can be decomposed into a sequence of Primitives. The required kinds of Primitive operators can be derived from the meta-metamodel. There are two basic kinds of primitive changes: ContentPrimitives and ValuePrimitives. A ContentPrimitive modifies the structure of a metamodel, i. e. creates or deletes a metamodel element. We thus need ContentPrimitives for each kind of metamodel element defined by the meta-metamodel. For classes, e.g., we need ContentPrimitives to create a class in a package and to delete it from its package. A ValuePrimitive modifies an existing metamodel element, i. e. changes a feature of a metamodel element. We thus need ValuePrimitives for each feature defined by the meta-metamodel. For classes, e.g., we need a ValuePrimitive to rename a class, and we need ValuePrimitives to add and remove a superclass. The set of primitive operators already offers a complete operator suite in the sense that every metamodel adaptation can be described by composing them. Metamodel aspects. We classify an operator according to the metamodel aspect which it addresses. The different classes can be derived from the constructs provided by the meta-metamodel to which the metamodels have to conform. An
10
M. Herrmannsdoerfer, D. Ratiu, and G. Wachsmuth
operator concerns either the structure of models, constraints on models, the API to access models, or the documentation of metamodel elements. A StructuralAdaptation like extracting a superclass affects the abstract syntax defined by the metamodel. A ConstraintAdaptation adds, deletes, moves, or changes constraints in the metamodel. An APIAdaptation concerns the additional access methods defined in the metamodel. This includes volatile features and operations. A DocumentationAdaptation adds, deletes, moves, or changes documentation annotations to metamodel elements. Language expressiveness. According to [14], we can distinguish three kinds of operators with respect to the expressiveness of the modeling language. By expressiveness of a modeling language, we refer to the set of valid models we can express in the modeling language. Constructors increase this set, i. e. in the new version of the language we can express new models. In contrast, Destructors decrease the set, i. e. in the old version we could express models which we cannot express in the new version of the language. Finally, Refactorings preserve the set of valid models, i. e. we can express all models in the old and the new version of the language. Model migration. According to [17], we can determine for each operator to what extent model migration can be automated. PreservingAdaptations do not require the migration of models. BreakingAdaptations break the instance relationship between models and the adapted metamodel. In this case, we need to provide a migration for possibly existing models. For a CoupledAdaptation, the migration does not depend on a specific metamodel. Thus it can be specified as a generic couple of metamodel adaptation and model migration. In contrast, a CustomAdaptation is so specific to a certain metamodel that it cannot be composed of generic coupled adaptation steps. Consequently, it can only be covered by a sequence of adaptation steps and a reconciling CustomMigration6. As mentioned above, three of the criteria have its origin in existing publications, while metamodel aspects is kind of a natural criterion. There might be other criteria which are interesting in the context of modeling language evolution. Given the sequence of adaptations, it is however easy to classify them according to other criteria. The presented criteria are orthogonal to each other to a large extent. Granularity is orthogonal to all other criteria and vice versa, as we can think of example operators from each granularity for all these criteria. Additionally, language expressiveness and model migration are orthogonal to each other: the first concerns the difference in cardinality between the sets of valid models before and after adaptation, whereas the second concerns the correct migration of a model from one set to the other. However, language expressiveness and model migration both focus on the impact on models, and are thus only orthogonal to the 6
The categories from [17] were renamed to be more conforming to the literature: metamodel-only change was renamed to PreservingAdaptation, coupled change to BreakingAdaptation, metamodel-independent coupled change to CoupledAdaptation, and metamodel-specific coupled change to CustomAdaptation.
Language Evolution in Practice: The History of GMF
11
metamodel aspects StructuralAdaptation and ConstraintAdaptation. This is due to the fact that operators concerning APIAdaptation and DocumentationAdaptation do not affect models. Consequently, these operators are always Refactorings and PreservingAdaptations. The operator suite necessary for our case study is depicted in Figure 9. We classify each operator in the operator suite according to the categories presented before. For example, the operator Extract Superclass creates a new common superclass for a number of classes. This operator is a Compound, since we can express the same metamodel adaptation by the primitive operators Create Class and Add Superclass. The operator is a StructuralAdaptation, since it affects the abstract syntax defined by the metamodel. It is a Constructor, because we can instantiate the introduced superclass in the new language version. Finally, it is a PreservingAdaptation, since no migration of old models to the new language version is required. 3.3
Reverse Engineering the GMF History
Procedure. We applied the following steps to reconstruct a history model for GMF based on the available information: Step 1. Extracting the log: We extracted the log information for the whole GMF repository. The log information lists the revisions of each file maintained in the repository. Step 2. Detecting the commits: We grouped revisions of files which were committed together with high probability. Two revisions of different files were grouped, in case they were committed within the same time interval and with the same commit comment. Step 3. Filtering the commits: We filtered out all commits which do not include a revision of one of the metamodels. Step 4. Clustering the revisions: We clustered the files which were committed together into more abstract artifacts like metamodels, transformator, code generator, and migrator. This step was performed to reduce the information, as the implementation of each of the artifacts may be modularized into several files. The information available at this point can be used to answer RQ1. Step 5. Classifying the commits: We classified the commits according to the software maintenance categories (i. e. perfective, adaptive, preventive, and corrective) [18] based on the commit comments and change requests. The information available at this point can be used to answer RQ2. Step 6. Extracting the metamodel revisions: We extracted the metamodel revisions from the GMF repository. Step 7. Comparing the metamodel revisions: We compared subsequent metamodel revisions with each other resulting in a difference model. The difference model consists of a number of primitive changes between subsequent metamodel revisions. Step 8. Detecting the adaptation sequence: We detected the adaptations necessary to bridge the difference between the metamodel revisions. In contrast to the difference model, the adaptations also combine related primitive changes and are
12
M. Herrmannsdoerfer, D. Ratiu, and G. Wachsmuth
ordered as a sequence. To find the most plausible adaptations, we also analyzed commit comments, change requests, and the co-adaptation of other artifacts. The information available at this point can be used to answer RQ3. Step 9. Validating the adaptation sequence: We validated the resulting adaptation sequence by applying it to migrate the existing models for testing the handcrafted migrator. We set up a number of test cases each of which consists of a model before migration and the expected model after migration. Tool Support. We employed a number of helper tools to perform the study. statCVS7 was employed to parse the log information into a model which is processed further by a handcrafted model transformation (steps 1-4). The difference models between two subsequent metamodel revisions were generated with the help of EMF Compare8 (step 7). To bridge the difference between subsequent metamodel revisions, we employed the existing tool COPE9 [15] whose user interface is depicted in Figure 6 (step 8). COPE allows the language developer to directly execute the operators in the metamodel editor and automatically records them in a history model [19]. Generic CoupledAdaptations can be invoked through an operator browser which offers all such available operators. To perform a CustomAdaptation, a custom migration needs to be attached to metamodel changes recorded in the metamodel editor. For the study, we extended COPE to support its user in letting the metamodel converge to a target metamodel by displaying the difference model as obtained from EMF Compare. From the recorded history model, a migrator can be generated which was employed for validating the adaptation sequence (step 9). The handcrafted migrator that comes with GMF was used to generate the expected models for validation.
4
Result
In this section, we present the results of our case study in an aggregated manner. However, the complete history can be obtained from our web site10 . RQ1) What is the impact of language changes on related software artifacts? To answer this question, we determined for each commit which other artifacts were committed together with the metamodels. Figure 7 shows how many of the overall 124 commits had an impact on a certain artifact. The first four columns denote the metamodels that were changed in a commit, and the fifth column denotes the number of commits. For instance, row 6 means that the metamodels mappings and gmfgen changed together in 6 commits. The last three columns denote the number of commits in which other artifacts, like transformator, code generator and migrator, were changed. In the example row, 7 8 9 10
see statCVS website http://statcvs.sourceforge.net see EMF Compare website http://www.eclipse.org/emft/projects/compare Available as open source at http://cope.in.tum.de Available at http://cope.in.tum.de/pmwiki.php?n=Documentation.GMF
Language Evolution in Practice: The History of GMF
PHWDPRGHOHGLWRU
RSHUDWRUEURZVHU
GLIIHUHQFHPRGHO
13
WDUJHWPHWDPRGHO
Fig. 6. COPE User Interface
JPIJUDSK FKDQJHG FKDQJHG
0HWDPRGHOV PDSSLQJV JPIJHQ
WRROGHI FKDQJHG
FKDQJHG FKDQJHG
FKDQJHG FKDQJHG FKDQJHG
FKDQJHV FKDQJHG FKDQJHG FKDQJHG
7UDQVIRU PDWRU
*HQHUD WRU
0LJUDWRU
Fig. 7. Correlation between commits of metamodels and related artifacts
the transformator was changed 4 times, the generator 2 times, and the migrator had to be changed once. In a nutshell, metamodel changes are very likely to impact artifacts which are directly related to them. For instance, the changes to mappings and gmfgen propagated to the transformator from mappings to gmfgen, and to the generator from gmfgen to code. Additionally, metamodel changes are not always carried out on a single metamodel, but are sometimes related to other metamodels. RQ2) What activities are performed to implement language changes? To answer this question, we classified the commits into the well-known categories of maintenance activities, and we investigated their distribution over these categories. Figure 8 shows the number of commits for each category. Note that several commits could not be uniquely associated to one category, and thus had to be assigned to several categories. However, all commits could be classified into at least one of the four categories.
14
M. Herrmannsdoerfer, D. Ratiu, and G. Wachsmuth 3HUIHFWLYH 0RGHOQDYLJDWRU 5LFKFOLHQWSODWIRUP 'LDJUDPSUHIHUHQFHV 'LDJUDPSDUWLWLRQLQJ (OHPHQWSURSHUWLHV ,QGLYLGXDOIHDWXUHV
Fig. 8. Classification of metamodel commits according to maintenance categories
We classified 45 of the commits as perfective maintenance, i. e. add new features to enhance GMF. Besides a number of individual commits, there are a few features whose introduction spanned several commits. The generated diagram editor was extended with a model navigator, to run as a rich client, to set preferences for diagrams, to partition diagrams, and to set properties of diagram elements. We classified 33 of the commits as adaptive maintenance, i. e. adapt GMF to a changing environment. These commits were either due to the transition from JET to Xpand, adapted to changes to the constraints of ecore, were due to releasing GMF, or adapted the constraints to changes of the OCL parser. We classified 36 of the commits as preventive maintenance, i. e. refactor GMF to prevent faults in the future. These commits either separated concerns to better modularize the generated code, simplified the metamodels to make the transformations more straightforward, removed metamodel elements no longer used by transformations, or added documentation to make the metamodel more understandable. We classified 16 of the commits as corrective maintenance, i. e. correct faults discovered in GMF. These commits either fixed bugs reported by GMF users, corrected incorrectly spelled element names, reverted changes carried out earlier, or corrected invalid OCL constraints. In a nutshell, the typical activities known from software maintenance also apply to metamodel maintenance [18]. Furthermore, similar to the development of software, the number of perfective activities (34,6%) outranges the preventive (27,7%) and adaptive (25,4%) activities which are double the number of corrective activities (12,3%). RQ3) What kinds of adaptations capture the language changes? To answer this question, we classified the operators which describe the metamodel evolution. Figure 9 shows the number and classification of each operator occurred during the evolution of each metamodel. The operators are grouped by their granularity and the metamodel aspects to which they apply. Most of the changes could be covered by Primitive adaptations: we found 379 (51,8%) ContentPrimitive adaptations, 279 (38,2%) ValuePrimitive adaptations and 73 (10,0%) Compound adaptations. Only half of the adaptations affected the structure defined by a metamodel: we identified 361 (49,4%) StructuralAdaptations, 303 (41,5%) APIAdaptations, 36 (4,9%) DocumentationAdaptations, and 31 (4,2%) ConstraintAdaptations. Most of the changes are refactorings which do not change the expressiveness of the modeling language: we found 453 (62,0%) Refactorings, 194 (26,5%) Constructors, and 84 (11,5%) Destructors. Only very few changes cannot be covered by generic coupled operators which are able to
Language Evolution in Practice: The History of GMF
Fig. 9. Classification of operators occurred during metamodel adaptation
15
16
M. Herrmannsdoerfer, D. Ratiu, and G. Wachsmuth
automatically migrate models: we identified 630 (86,2%) PreservingAdaptations, 95 (13,0%) CoupledAdaptations, and 6 (0,8%) CustomAdaptations. As can be seen in Figure 9, a custom migration was necessary 4 times to initialize a new mandatory feature or a feature that was made mandatory. In these cases, the migration is associated to one Primitive, and consists of 10 to 20 lines of handwritten code. Additionally, 2 custom migrations were necessary to perform a complex restructuring of the model. In these cases, the migration is associated to a sequence of 11 and 13 Primitives, and consists of 60 and 70 lines of handwritten code. In a nutshell, a large fraction of changes can be captured by primitive changes or operators which are independent of the metamodel. A significant number of operations are known from object-oriented refactoring. Only very few changes were specific to the metamodel, denoting more complex evolution.
5
Discussion
We interpret and discuss the results of the case study by focusing on lessons learned and threats to the study’s validity. 5.1
Lessons Learned
Based on the results of our case study, we learned a number of lessons about the evolution of modeling languages in practice. Metamodels evolve due to user requests and technological changes. On the one hand, a metamodel defines the abstract syntax of a language, and thereby metamodels evolve when the requirements of the language change. In GMF, user requests for new features imposed many of such changes to the GMF modeling languages. On the other hand, an API for model access is intimately related to a metamodel, and thereby metamodels evolve when requirements for model access change. In GMF, particularly the shift from JET to XPand as the language to implement the generator imposed many of such changes in the gmfgen metamodel. Since a metamodel captures the abstract syntax as well as the API for model access, language and API evolution interact. Changes in the abstract syntax clearly lead to changes in the API. But API changes can also require to change the abstract syntax of the underlying language: in GMF, we found several cases where the abstract syntax was changed to simplify model access. Other artifacts need to be migrated. The migration is not restricted to models, but also concerns other language development artifacts, e. g. transformators and code generators. During the evolution of GMF, these artifacts needed to be migrated manually. In contrast to models, these artifacts are mostly under control of the language developers, and thereby their migration is not necessarily required to be automated. However, automating the migration of these artifacts would further reduce the effort involved in language evolution. The model-driven development of metamodels with EMF facilitated the identification of changes
Language Evolution in Practice: The History of GMF
17
between two different versions of the metamodel. In contrast, the specification of transformators and code generators as Java code made it hard to trace the evolution. We thus need a more structured and appropriate means to describe the other artifacts depending on the metamodels. Language development could benefit from the same advantages as model-driven software development. Language evolution is similar to software evolution. This hypothesis was postulated by Favre in [1]. The answers to RQ2 and RQ3 provide evidence that the hypothesis holds. First, the distribution of activities performed by the developers of GMF to implement language changes mirrors the distribution of classical software maintenance activities (i. e. perfective and adaptive maintenance activities being the most frequent) [18]. Second, many operators to adapt the metamodels (Figure 9) are similar to operators known from object-oriented refactoring [20] (e. g. Extract Superclass). Like software evolution, the time scale for language evolution can be quite small. In the first year of the investigated evolution of GMF, the metamodels were changed 107 times, i. e. on average every four days. However, in the second year the number of metamodel changes decreased to 17, i. e. the stability of GMF increased over time. Apparently, the time scale in which the changes happen increases with the language’s maturity. The same phenomenon applies to the relation between the metamodels and the meta-metamodel, as the evolution of ecore required the migration of the GMF metamodels. However, the more abstract the level, the less frequent the changes: we identified two changes in the meta-metamodel of the investigated evolution of GMF. Operator-based coupled evolution of metamodels and models is feasible. The developers of GMF provided a migrator to automatically migrate the already existing models. This migrator allows the GMF developers to make changes that are not backward compatible, and are essential as the kinds and number of built models is not under control of the language developers. We reverse engineered the evolution of the GMF metamodels by sequencing operators. Most of the metamodel evolution can be covered by operators which are independent of the specific metamodel. Only a few custom operators were required to capture the remaining changes. The employed operators can be used to migrate the models as well. In addition, the case study provides evidence for the suitability of operator-based metamodel evolution in forward engineering like proposed in [14,15]. Operator-based forward engineering of modeling languages documents changes on a high level of abstraction which allows for a better understanding of language evolution. 5.2
Threats to Validity
We are aware that our results can be influenced by threats to construct, internal and external validity. Construct validity. The results might be influenced by the measurement we used for our case study. For our measurements, we assumed that a commit represents exactly one language change. However, a commit might encapsulate
18
M. Herrmannsdoerfer, D. Ratiu, and G. Wachsmuth
several language changes, and one language change might be implemented by several commits. This interpretation is a threat to the results for both RQ1 and RQ2. Other case studies are required to investigate these research questions in more detail, and to increase the confidence and generality of our results. However, our results are consistent with the view that languages evolve like software, which was postulated and tacitly accepted as a fact [1]. Internal validity. The results might be influenced by the method applied for investigating the evolution. The algorithm to detect the commits (step 2) might miss artifacts which were also committed together. To mitigate this threat, we have manually validated the commits by looking into the temporal neighborhood. By filtering out the commits which did not change the metamodel (step 3), we might miss language changes not affecting the metamodel. Such changes might be changes to the language semantics defined by code generators and transformators. However, the model migration defined by the handcrafted migrator could be fully assigned to metamodel adaptations. We might have misclassified some commits, when classifying the commits according to the maintenance categories (step 5). However, the results are in line with the literature on software evolution [18]. When detecting the adaptation sequence (step 8), the picked operators might have a different intention than the developers had when performing the changes. To mitigate this threat, we have automatically validated the model migration by means of test cases. Furthermore, we have manually validated the migration of all artifacts by taking their co-adaptation into account. External validity. The results might be influenced by the fact that we investigated a single data point. The modeling languages provided by GMF are among the many modeling languages that are developed using EMF. The relevance of our results obtained by analyzing GMF can be affected when analyzing languages developed with other technologies. Our results are however in line with the literature on grammar evolution [21,6], and this increases our confidence on the fact that the defined operators are valid for many other languages. Furthermore, our past studies on the evolution of metamodels [17,15] revealed similar results.
6
Related Work
Work related to language evolution can be found in several technological spaces of software language engineering [5]. This includes grammar evolution in grammarware, metamodel evolution in modelware, schema evolution in dataware, and API evolution. Grammar evolution has been studied in the context of grammar engineering [3]. L¨ammel proposes a comprehensive suite of grammar transformation operators for the incremental adaptation of context-free grammars [16]. The proposed operators are based on sound, formal preservation properties that allow reasoning about the relationship between grammars. The operator suite proved to be valuable for semiautomatic recovery of the COBOL grammar from an informal specification [21].
Language Evolution in Practice: The History of GMF
19
Based on similar operators, L¨ammel proposes a lightweight verification method called grammar convergence for establishing and maintaining the correspondence between grammars ingrained in different software artifacts [22]. Grammar convergence proved to be useful for establishing the relationship between grammars from different releases of the Java grammar [6]. The approach presented in this paper transfers these ideas to the technological space of modelware. In contrast to the Java case study, the GMF case study provides us with intermediate revisions of the metamodels. Taking these revisions into account allows us to investigate how languages changes are actually implemented. Metamodel evolution has been mostly studied from the angle of model migration. To specify and automate the migration of models, Sprinkle introduces a visual graph-transformation-based language [23,24]. However, this language does not provide a mechanism to reuse migration specifications across metamodels. To reuse migration specifications, there are two kinds of approaches: differencebased and operator-based. Difference-based approaches try to automatically derive a model migration from the difference between two metamodel versions. Gruschko et al. classify primitive metamodel changes into non-breaking, breaking resolvable and unresolvable changes [25,26]. Based on this classification, they propose to automatically derive a migration for non-breaking and resolvable changes, and envision to support the developer in specifying a migration for unresolvable changes. Cichetti et al. go even one step further and try to detect compound changes in the difference between metamodel versions [27]. However, Sprinkle et al. claim that in the general case it is undecidable to automatically synthesize a model migration that preserves the semantics of the models [28]. To avoid the loss of intention during evolution, we follow an operator-based approach where the developers can perform the operators encapsulating the intended model migration [14,15]. The GMF case study continues and extends our earlier studies [17,15] which focused solely on the automatability of the model migration. Beyond that, the presented study shows that an operator-based approach can be useful in a reverse engineering process to reveal and document the intention of language evolution on a high level of abstraction. Furthermore, it provides evidence that operator-based metamodel adaptation should be used in forward engineering in order to control and document language evolution. In contrast, difference-based approaches still lack a proof of concept by means of real-life case studies both for forward and reverse engineering. Schema evolution has been a field of study for several decades, yielding a substantial body of research [29,30]. For the ORION database system, Banerjee et al. propose a fixed set of change primitives that perform coupled evolution of the schema and data [31]. While reusing migration knowledge in case of these primitives, their approach is limited to local schema restructuring. To allow for non-local changes, Ferrandina et al. propose separate languages for schema and instance data migration for the O2 database system [32]. While more expressive, their approach does not allow for reuse of coupled transformation knowledge. In order to reuse recurring coupled transformations, SERF – as proposed by Claypool et al. – offers a mechanism to define arbitrary new high-level primitives
20
M. Herrmannsdoerfer, D. Ratiu, and G. Wachsmuth
[33], providing both reuse and expressiveness. However, the last two approaches never found their way into practice, as it is difficult to perform complex migration without taking the database offline. As a consequence, it is hard to find real-world case studies which include complex restructuring. Framework evolution can be automated by refactorings which encapsulate the changes to both the API and its clients [20]. Dig and Johnson present a case study to investigate how object-oriented APIs evolve in practice [10]. They found out that a significant number of API changes can be covered by refactoring operators. In the GMF case study, we found that metamodel evolution is not restricted to the syntax of models, but also includes evolution of APIs to access models. For the migration of client code relying on those APIs, existing work on framework evolution should provide a good starting point.
7
Conclusion
In this paper, we presented a method to investigate the evolution of modeling languages. Our approach is based on retracing the evolution of the metamodel as the central artifact of the language. For this purpose, we provide an operator suite for the stepwise transformation of metamodels from old to new versions. The operators allow us to state clearly the changes made to the language metamodel on a high level of abstraction, and to capture the intention behind the change. Furthermore, these operators can be used to accurately describe the impact of the metamodel changes on related models, and to hint at the possible effects on the related language development artifacts. Thus, we can qualify a certain change with respect to its impact on the other artifacts. This can be in turn used to predict, detect, and prevent language erosion. In the future, the operators could also support the (semi-)automatic migration of co-evolving artifacts other than models. There is an increasing amount of related work proposing alternative approaches to metamodel evolution and model co-evolution. Real-life case studies are needed to evaluate these approaches. In [17], we presented an industrial case study for operator-based metamodel adaptation. However, the studied evolution is not publicly available due to a non-disclosure agreement. In this paper, we studied the evolution of metamodels in GMF as another extensive case study. GMF’s evolution is publicly available through a version control system. The evolution is well-documented in terms of commit comments made by developers, and change requests made by users. Consequently, GMF is a good target to study different approaches to metamodel evolution either on its own (as we did in this paper) or in camparison to each other. But GMF is not only a case study for metamodel evolution. We consider it as a general case study on software language evolution and the integration of different technological spaces in software language engineering. Not only evolve the modeling languages provided by the framework, but also do APIs. We revealed that a huge amount of GMF metamodel changes were changes to the
Language Evolution in Practice: The History of GMF
21
API for accessing GMF editor models. Further work is needed to investigate the relationship between metamodel evolution and API evolution in frameworks. Another interesting topic for future work would be a comparison of operatorbased approaches in software language engineering. As mentioned in the section on related work, there are many operator-based approaches to software language engineering in different technological spaces, e. g. for grammar evolution, metamodel evolution, schema evolution, and API evolution. It’s worth to investigate their common properties, facilities, and restrictions. Acknowledgement. The work of the first two authors is supported by grants from the BMBF (Federal Ministry of Education and Research, Innovationsallianz SPES 2020), and the work of the third author is supported by grants from the DFG (German Research Foundation, Graduiertenkolleg METRIK).
References 1. Favre, J.M.: Languages evolve too! changing the software time scale. In: IWPSE 2005: 8th Int. Workshop on Principles of Software Evolution, pp. 33–44. IEEE, Los Alamitos (2005) 2. Favre, J.M.: Meta-model and model co-evolution within the 3D software space. In: ELISA: Workshop on Evolution of Large-scale Industrial Software Applications, pp. 98–109 (2003) 3. Klint, P., L¨ ammel, R., Verhoef, C.: Toward an engineering discipline for grammarware. ACM Trans. Softw. Eng. Methodol. 14(3), 331–380 (2005) 4. B´ezivin, J., Heckel, R.: Guest editorial to the special issue on language engineering for model-driven software development. Software and Systems Modeling 5(3), 231– 232 (2006) 5. Kurtev, I., B´ezivin, J., Aksit, M.: Technological spaces: An initial appraisal. In: CoopIS, DOA 2002 Federated Conferences, Industrial track (2002) 6. L¨ ammel, R., Zaytsev, V.: Recovering Grammar Relationships for the Java Language Specification. In: 9th Int. Working Conference on Source Code Analysis and Manipulation. IEEE, Los Alamitos (2009) 7. L¨ ammel, R., Lohmann, W.: Format Evolution. In: RETIS 2001: 7th Int. Conference on Reverse Engineering for Information Systems. [email protected], OCG, vol. 155, pp. 113–134 (2001) 8. Meyer, B.: Schema evolution: Concepts, terminology, and solutions. IEEE Computer 29(10), 119–121 (1996) 9. Flouris, G., Manakanatas, D., Kondylakis, H., Plexousakis, D., Antoniou, G.: Ontology change: Classification and survey. Knowl. Eng. Rev. 23(2), 117–152 (2008) 10. Dig, D., Johnson, R.: How do apis evolve? a story of refactoring: Research articles. J. Softw. Maint. Evol. 18(2), 83–107 (2006) 11. Kleppe, A.G., Warmer, J., Bast, W.: MDA Explained: The Model Driven Architecture: Practice and Promise. Addison-Wesley, Reading (2003) 12. Object Management Group: Meta Object Facility, Core Spec., v2.0 (2006) 13. Object Management Group: Object Constraint Language, Spec., v2.0 (2006) 14. Wachsmuth, G.: Metamodel adaptation and model co-adaptation. In: Ernst, E. (ed.) ECOOP 2007. LNCS, vol. 4609, pp. 600–624. Springer, Heidelberg (2007) 15. Herrmannsdoerfer, M., Benz, S., Juergens, E.: COPE - automating coupled evolution of metamodels and models. In: Drossopoulou, S. (ed.) ECOOP 2009. LNCS, vol. 5653, pp. 52–76. Springer, Heidelberg (2009)
22
M. Herrmannsdoerfer, D. Ratiu, and G. Wachsmuth
16. L¨ ammel, R.: Grammar adaptation. In: Oliveira, J.N., Zave, P. (eds.) FME 2001. LNCS, vol. 2021, pp. 550–570. Springer, Heidelberg (2001) 17. Herrmannsdoerfer, M., Benz, S., Juergens, E.: Automatability of coupled evolution of metamodels and models in practice. In: Czarnecki, K., Ober, I., Bruel, J.-M., Uhl, A., V¨ olter, M. (eds.) MODELS 2008. LNCS, vol. 5301, pp. 645–659. Springer, Heidelberg (2008) 18. Lientz, B.P., Swanson, E.B.: Software Maintenance Management. Addison-Wesley, Reading (1980) 19. Herrmannsdoerfer, M.: Operation-based versioning of metamodels with COPE. In: CVSM 2009: Int. Workshop on Comparison and Versioning of Software Models, pp. 49–54. IEEE, Los Alamitos (2009) 20. Fowler, M.: Refactoring: improving the design of existing code. Addison-Wesley, Reading (1999) 21. L¨ ammel, R., Verhoef, C.: Semi-automatic grammar recovery. Softw. Pract. Exper. 31(15), 1395–1448 (2001) 22. L¨ ammel, R., Zaytsev, V.: An introduction to grammar convergence. In: Leuschel, M., Wehrheim, H. (eds.) IFM 2009. LNCS, vol. 5423, pp. 246–260. Springer, Heidelberg (2009) 23. Sprinkle, J.M.: Metamodel driven model migration. PhD thesis, Vanderbilt University, Nashville, TN, USA (2003) 24. Sprinkle, J., Karsai, G.: A domain-specific visual language for domain model evolution. J. Vis. Lang. Comput. 15(3-4), 291–307 (2004) 25. Becker, S., Goldschmidt, T., Gruschko, B., Koziolek, H.: A process model and classification scheme for semi-automatic meta-model evolution. In: MSI 2007: 1st Workshop MDD, SOA und IT-Management, pp. 35–46. GiTO-Verlag (2007) 26. Gruschko, B., Kolovos, D., Paige, R.: Towards synchronizing models with evolving metamodels. In: Int. Workshop on Model-Driven Software Evolution (2007) 27. Cicchetti, A., Ruscio, D.D., Eramo, R., Pierantonio, A.: Automating co-evolution in model-driven engineering. In: EDOC 2008: 12th Int. IEEE Enterprise Distributed Object Computing Conference, pp. 222–231. IEEE, Los Alamitos (2008) 28. Sprinkle, J., Gray, J., Mernik, M.: Fundamental limitations in domain-specific language evolution (2009), http://www.ece.arizona.edu/∼sprinkjm/wiki/uploads/Publications/ sprinkle-tse2009-domainevolution-submitted.pdf 29. Li, X.: A survey of schema evolution in object-oriented databases. In: TOOLS 1999: 31st Int. Conference on Technology of Object-Oriented Language and Systems, p. 362. IEEE, Los Alamitos (1999) 30. Rahm, E., Bernstein, P.A.: An online bibliography on schema evolution. SIGMOD Rec. 35(4), 30–31 (2006) 31. Banerjee, J., Kim, W., Kim, H.J., Korth, H.F.: Semantics and implementation of schema evolution in object-oriented databases. In: SIGMOD 1987: ACM SIGMOD Int. conference on Management of data, pp. 311–322. ACM, New York (1987) 32. Ferrandina, F., Meyer, T., Zicari, R., Ferran, G., Madec, J.: Schema and database evolution in the O2 object database system. In: VLDB 1995: 21th Int. Conference on Very Large Data Bases, pp. 170–181. Morgan Kaufmann, San Francisco (1995) 33. Claypool, K.T., Jin, J., Rundensteiner, E.A.: SERF: schema evolution through an extensible, re-usable and flexible framework. In: CIKM 1998: 7th Int. Conference on Information and knowledge management, pp. 314–321. ACM, New York (1998)
A Novel Approach to Semi-automated Evolution of DSML Model Transformation Tihamer Levendovszky, Daniel Balasubramanian, Anantha Narayanan, and Gabor Karsai Vanderbilt University, Nashville, TN 37203, USA {tihamer,daniel,ananth,gabor}@isis.vanderbilt.edu
Abstract. In the industrial applications of Model-Based Development, the evolution of modeling languages is an inevitable issue. The migration to the new language involves the reuse of the existing artifacts created for the original language, such as models and model transformations. This paper is devoted to an evolution method for model transformations as well as the related algorithms. The change description is assumed to be available in a modeling language specific to the evolution. Based on the change description, our method is able to automate certain parts of the evolution. When automation is not possible, our algorithms automatically alert the user about the missing semantic information, which can then be provided manually after the automatic part of the interpreter evolution. The algorithms have been implemented and tested in an industrial environment. The results indicate that the semi-automated evolution of model transformations decreases the time and effort required with a manual approach.
1
Introduction
The use of model-based software development techniques has expanded to a degree where it may now be applied to the development of large heterogeneous systems. Due to their high complexity, it often becomes necessary to work with a number of different modeling paradigms in conjunction. Model-based development tools, to a large extent, meet this challenge. However, short turnover times mean that only a limited time can be spent defining meta-models for these modeling paradigms before users begin creating domain-specific models. Deficiencies, inconsistencies and errors are often identified in the meta-models after the development is well underway and a large number of domain models have already been created. Changes may also result from an improved understanding of the domain over time, along with other modifications in the domain itself. Newer versions of meta-models must therefore be created, and these may no longer be compatible with the large number of existing models. The existing models must then be recreated or manually evolved using primitive methods, adding a significant cost to the development process. The problem is especially acute in the case of multi-paradigm approaches [MV04], where multiple modeling languages are used and evolved, often concurrently. M. van den Brand, D. Gaˇ sevi´ c, J. Gray (Eds.): SLE 2009, LNCS 5969, pp. 23–41, 2010. c Springer-Verlag Berlin Heidelberg 2010
24
T. Levendovszky et al.
2
Problem Statement
The general solution for model migration is to allow the migrator to specify a general model transformation to perform the necessary migration operations. A general method has been contributed in [Spr03]. Creating a general model transformation is not an easy task; it is often quite challenging even for a domain expert. Thus, our objective is to provide an evolution method usable by domain experts and more specific to the evolution than the general approach. Our migration method is based on the following observation motivated by our experience. In most of the practical cases, modeling language evolution does not happen as an abrupt change in a modeling language, but in small steps instead. This also holds for UML: apart from adding completely new languages to the standard, the language has been changing in rather small steps since its first release. This assumption facilitates further automation of the model evolution by tools for metamodeled visual languages [BvKK+ 08]. The main concepts of a step-bystep evolution method are depicted in Fig. 1.
Fig. 1. Step-By-Step Evolution Concepts
The backbone of the diagram is a well-known DSL scenario depicted in the upper half of the figure. When a domain-specific environment is created, it consists of a meta-model ( M Msrc ), which may have an arbitrary number of instance models (SM1 , SM2 , ...,SMn . The models need to be processed or transformed (”interpreted”), therefore, an interpreter is built. The interpreter expects that its input models are compliant with M Msrc . In parallel, the output models of the interpreter must be compliant with the target meta-model M Mdst . The inputs to the interpreter are M Msrc, M Mdst and an input model SMi , and the interpreter produces an output model DMi . The objective is to migrate the the existing models and interpreters to the evolved language. The evolved counterparts are denoted by adding a prime to the original notation. In the evolution process, we create the new (evolved) meta model (M Msrc ). We assume that the changes are minor enough both in size and nature, such that they are worth being modeled and processed by a tool, rather
A Novel Approach to Semi-automated Evolution
25
than writing a transformation from scratch to convert the models in the old language to models in the evolved language. This is a key point in the approach. Having created the new language by the evolved meta-model, we describe the changes in a separate migration DSL (Model Change Language, MCL). The MCL model is denoted by Δsrc , and it represents the differences between M Msrc and M Msrc . Besides the changes, this model contains the actual mappings from the old models to the evolved ones, providing more information that describes how to evolve the models of the old language to models of the new language. Given (M Msrc ), (M Msrc ) and the M CL model, a tool can automatically migrate the models of the old language to models of the evolved language. The concepts are similar on the destination side. Evolving the models with MCL is described in [BvKK+ 08] [NLBK09]. Based on the (M Msrc ), (M Msrc ), (M Mdst ), and M CL model, it is possible to evolve the model interpreter, which is the main focus of this paper. Practically, this means evolving the model transformation under the following set of assumptions. (i) The change description is available and specific to evolution. In our implementation, this is an MCL model, but it could be any model/textual representation with at least the same information content about the changes. (ii) The model elements left intact by the evolution should be interpreted in the same way as they were by the original interpreter. If the intent is different, manual correction is required. In our experience, this occurs rarely. Furthermore, we treat the unambiguously changed elements (such as renamed classes) in the same way when it is possible. (iii) The handling of missing semantic information is inevitable. It cannot be expected that methods to process the new concepts added by the evolution can be invented without human interaction. Therefore, a tool cannot achieve more than to produce an initial version of the evolved interpreter only, and show the missing semantic information. (iv) We assume that the interpreter is specified by graph rewriting rules. Our implementation is based on GReAT [AKNK+ 06], but the algorithms can be used with any tool or theoretical framework specifying the transformation by rewriting rules such as AGG [Tae04], FUJABA [NNZ00], ViATRA [BV06], VMTS [AALK+ 09] tools, or frameworks of the single or double pushout (SPO, DPO) [Roz97] approaches or the High-Level Replacement Systems [EEPT06].
3
Related Work
Providing methods for semi-automated evolution of graph-rewriting -based model transformations for DSLs is a fairly new area. Existing solutions to this problem are more or less ad-hoc techniques that often resort to directly specifying the alterations in terms of the storage format of the models. One such approach is the use of XSL transformations to evolve models stored in XML. Database schema migration techniques have been applied to the migration of models stored as relational data. These approaches are often nothing more than pattern based replacement of specific character strings, and they do not capture
26
T. Levendovszky et al.
the intent driving a meta-model change [Kar00]. When dealing with complex meta-models covering multiple paradigms, comprehension is quickly lost when capturing meta-model changes using these methods. Although semi-automated evolution of model transformation is a novel approach, it incorporates the transformation of graph rewriting rules. In [EE08], the authors assume the case in which model transformations preserve the behavior of the original models. In this framework, the behavior of the source system and the target system can be defined by transformation rules. In translating from a source to target model, one wants to preserve the behavior of the original model. A model transformation is semantically correct if for each simulation of the source, a corresponding target simulation is obtained, and the transformation is semantically complete if the opposite holds. The authors use graphs to represent models, and graph transformation rules to represent the operational behavior and the transformation from source to target model. The operational rules of the source are also input to the transformation from source to target, and conditions are defined for model and rule transformation correctness. Our approach makes it possible to handle semantic evolution, where this constraint does not hold, and most of our evolution case studies fell into this category. The paper gives a formal description of transforming the DPO transformation rules in an exhaustive manner. Our approach does not enforce DPO rules or exhaustive rule control. The paper [PP96] deals with multilevel graphs, where some parts of the graphs are hidden, but the hidden information can be restored by rules from another, known graph. The authors claim that in many applications, it is useful to be able to represent a graph in terms of particular subgraphs and hide the details of other structures that are only needed in certain conditions. If one repeats this hiding of details, it leads to representations on more than one level of visibility. A graph representation consists of a graph and productions to restore hidden information. If the restrictions that one needs in order to make the restoring of the productions are suitable, then one can produce several graphs and thus have a graph grammar. The paper defines morphisms between graph grammars, and shows that the grammar and their morphisms form a finitely cocomplete category, along with other properties. The paper makes a distinction between two cases: (i) global grammar transformation, when a subgrammar is replaced with another grammar, and (ii) local transformation, when the rules are modified. Our interpreter evolution method takes the latter approach. Using the DPO approach as a theoretical framework, the author defines the rewriting of a rule by another rule. The actual rewriting takes place on the interface graph, and only these changes are ”propagated” to the left-hand side and the right-hand side of the rules to make them consistent. The main results of the paper deal with the applicability and the satisfaction of the DPO gluing conditions. In our approach, GReAT and the underlying UDM framework [MBL+03] do not allow dangling edges and non-injective matches, and constantly validate the graphs and the rules at run-time.
A Novel Approach to Semi-automated Evolution
4
27
Case Study
Our case study is based on a hierarchical signal flow paradigm. An example model is depicted in Fig. 2.
Fig. 2. An example for hierarchical signal flow model
A signal flow may contain the following elements. An InputSignal represents a signal that is processed by a signal processing unit. An OutputSignal is the result of the processing operation. Signal processing components can be organized into hierarchies, which reduces the complexity of the model. A signal processing unit can either be Primitive or Compound. A Primitive can contain only elementary elements, while a Compound can also contain Primitive processing units. In our example model, Preprocessing and Controller are compound processing units, whereas Filter1, Filter2, ControlAlgorithm, and DAC elements are primitive signal processing components. The input signals and the output signals cannot be connected directly: they need an intermittent LocalPort. Our case study begins with a hierarchical signal flow modeling language and defines a transformation targeting a non-hierarchical signal flow language. This transformation may be useful for several reasons, but the main motivation is usually implementation-related: if one wants to generate a low-level implementation for a signal flow, some of the simulation engines do not support the concept of hierarchies. While having investments in a form of hierarchical signal flow models, we realize certain weak points in our language, and there are additional features and clarifications that require modifications in the original language. We then modify the original hierarchical language in several ways typical of meta-model changes, including class renamings, attribute renamings, and the introduction of new meta-classes. We would like preserve our investments, therefore, we would like to transfer the existing models to the new, evolved language. In order to migrate the now invalidated models and transformations, we define MCL rules that describe the relationships between elements in the old and new meta-models. Using these rules, our MCL language is able to migrate models,
28
T. Levendovszky et al.
and our interpreter evolver is able to create a new version of the transformation that translates from models conforming to the new meta-model to the same target meta-model (Mdst = Mdst ). We begin by describing the original hierarchical language and the target nonhierarchical language, along with the transformation between the two. We then describe the updated version of the hierarchical language and the MCL rules used to migrate models corresponding to the old meta-model so that they conform to the updated meta-model. We then give details about the updated interpreter that is automatically produced using our interpreter evolver tool, including the number of rules requiring hand-modification. 4.1
Hierarchical Signal Flow
Fig. 3 shows the meta-model of the original signal flow language.
Fig. 3. The original meta-model
The Component class represents some unit of functionality performed on an input signal and contains a single integer attribute named SignalGain. The CompoundComponent class does not represent any functionality performed on signals, rather it is used to hierarchically organize both types of components. Signal s are passed between components using ports; the Port class has a single
A Novel Approach to Semi-automated Evolution
29
Boolean attribute that is set to true if an instance is an input port and false if it is an output port. The LocalPort class is contained only in CompoundComponents and is used to buffer signals between Component s (i.e., the LocalPort buffers between the units of functionality). Because the ports share no common base class, four types of connections are defined to represent the possible connections between each type. This is an inefficient design typically made by beginner domain experts. The evolved meta-model can improve upon this. Fig. 2 shows an example model that represents a simple controller. The top of the figure represents a high level view of the system. The Preprocessing and Controller elements are both CompoundComponents; the internals of both are shown in the bottom of the figure. The Preprocessing block contains two Component s that represent filters that are applied to the input signal, while the Controller block contains one Component for implementing the control algorithm and other Component to convert from a digital back to an analog signal, which is then passed out of the system through output ports. All of the ports named Forwarder are LocalPort elements representing a buffering element in between functional elements. 4.2
Original Transformation
The target meta-model of the transformation is a “flat” actor-based language without hierarchy, shown in Fig. 4.
Fig. 4. Target meta-model
The Actor class represents basic units of functionality and corresponds to the Component s in the hierarchical signal flow language. The Receiver and
30
T. Levendovszky et al.
Transmitter classes, respectively, are used to send signals to and from, respectively, an Actor. The Queue class corresponds to the LocalPort class in the hierarchical language, and acts as a local buffering element between Actor s. The overall goal of the transformation is to create an Actor in the target model for each Component in the input model. Receivers and Transmitter s should be created inside each Actor for each Port inside the corresponding Component. The CompoundComponent s in the input model are present only for organizational purposes, so their effect will be removed in the output model. Fig. 5 shows the full transformation, with two hierarchical blocks expanded to show their full contents. The first two transformation rules (shown at the top of Fig. 5) create a RootContainer element and top level Queues for Ports. The block that is recursively called to flatten the hierarchy is expanded on the second line of rules in Fig. 5. The first rule on the second line creates top level Queues for each LocalPort in the input model. The third line of rules in Figure Fig. 5 is responsible for creating temporary associations so that the hierarchy can be flattened. The transformation rule named FilterPrimitives is a conditional block that sends nested CompoundComponent s back through the recursive rule and sends all of the regular Components to the final row of rules. This final row of rules is responsible for creating the Actors in the output model, along with their Receivers, Transmitters and the connections between them. Note that because of the several types of connection classes in the original meta-model, four rules are needed to deal with translating these into the target model, which are the first four rules in the third row of Fig. 5. The transformation contains a total of twelve transformation rules, two test cases, and one recursive rule. Fig. 6 shows the transformation rule that creates a Queue in the output model for each Port in the top-level CompoundComponents. This rule indicates that for each Port contained inside the CompoundComponent, a Queue should be created in the RootContainer of the output model (the check mark on the lower right hand corner of the Queue indicates that it will be newly created), along with a temporary association between the Port and its corresponding Queue. The temporary association is created so that later in the transformation, other rules can find the Queue that was created in correspondence with a given Port. Also note that this transformation has an AttributeMapping block, which contains imperative code to set the name attribute of the newly created Queue. This imperative code uses the IsInput attribute of the Port class, which will be deleted in the evolved meta-model. Fig. 7 shows the transformation rule that creates an Actor in the output model. The rule indicates that for each Component, an Actor should be created (again, the small check-mark on the Actor indicates it should be newly created). This rule also contains an AttributeMapping block, which allows imperative code to be written for querying and setting an element’s attribute values. The code inside this block is also shown in the figure. Note that this code uses
A Novel Approach to Semi-automated Evolution
Fig. 5. Entire Transformation
Fig. 6. Transformation rule to create a Queue for each Port
31
32
T. Levendovszky et al.
Fig. 7. Transformation rule to create Actor
the SignalGain attribute on Component ; this will be referenced later during the evolution. 4.3
MCL Rules and Evolved Transformation
The evolved meta-model, shown in Fig. 8, contains several changes typical of meta-model evolutions, including the following migration operations. 1. Component has been renamed to PrimitiveComponent. 2. The IsInput attribute of Port has been removed from InputPort and OutputPort. 3. The attribute SignalGain on Component has been renamed to Gain on PrimitiveComponent. 4. Port has been subtyped into InputPort and OutputPort. 5. InputPort, OutputPort and LocalPort all now share a common base class. 6. All of the connection classes have been replaced with a single connection class named Signal. Fig. 9 shows the MCL rules to accomplish the first four points above. Component is connected to PrimitiveComponent with a MapsTo connection, which deals with the first point above. The second point above is addressed by setting the IsInput attribute to “Delete” (the delete option is not visible in the figure). Similarly, the SignalGain attribute on Component is connected to the Gain attribute on PrimitiveComponent via a MapsTo connection, which accomplishes the third point above. The Port class is connected to both InputPort
A Novel Approach to Semi-automated Evolution
33
Fig. 8. Evolved meta-model
and OutputPort with two separate MapsTo connections. A Port should become an InputPort if its IsInput attribute is true, and should become an OutputPort otherwise. This conditional mapping is accomplished by including mapping conditions on the connections (not visible in the figure). The fifth item above, the introduction of a common base class, is accomplished implicitly. The last point is accomplished with four MCL rules that are all similar to the one shown in Fig. 10. This rule migrates PortToLocal connections to Signal connections. For each PortToLocal connection found in the input model, its source and destination are located, as well as the elements in the destination model to which they were mapped. Then, a Signal connection is created between these two elements.
Fig. 9. Migration rules for ports and components
34
T. Levendovszky et al.
Fig. 10. Migration rule for local ports
5
Contributions
In addition to existing models, we have also invested time and effort in the transformation described above, and we would like to save as much from the original transformation as possible. However, the solution is not so straightforward as it is in case of model migration, since the MCL rules have been designed for model migration, and in most cases they do not hold all the information necessary to migrate the interpreter. Accordingly, we use three distinct categories to describe the availability of information. There are operations, such as renaming a meta-model element or an attribute. These are fully automated transformation operations. For example, in Fig. 9, SignalGain is renamed to Gain. This means that we must set all the references of the original meta-model class SignalGain to the evolved meta-model class Gain in the transformation, and we must tokenize the attribute mappings, and substitute the symbol name SignalGain with Gain. If we would like to delete an attribute, we are lacking information. If the attribute appears in a rule, we do not know what the attribute computation involving the deleted attribute should be substituted with. We can mark the deleted attribute in the attribute mapping code of the transformation, but it is still necessary to have some corrections from the transformation developer. This category is referred to as partially automated transformation operations. Among the transformation operations, additions mean the greatest problems. The original transformation does not include any cues how the added elements should be processed, and while the MCL rules sometimes contain attribute mapping to set the values of new attributes, this still does not describe how these should be introduced in the evolved transformation. Whereas in case of partially automated operations the transformation developer needed to contribute only the part of the migration based on the semantic information he has about the new model, if additions are preformed, the full semantic description of the added elements is required. Without that, these operations cannot be automated. We call these operations fully semantic transformation operations. Currently, we do not treat fully semantical operations.
A Novel Approach to Semi-automated Evolution
35
Accordingly, the automated pass is performed, which is completely automatic. Secondly, a manual pass is required, where the migrator performs manual tasks that involves completing the transformation with the code and other DSML constructs for the new elements and adjusting it for the modified elements. 5.1
Automated Pass
The MCL rules discussed in Section 4.3 are given as input to the interpreter migration tool, which creates an updated version of the interpreter according to the algorithm in Section 5.3. This updated interpreter automatically reflects the first meta-model change described above: references to the Component class are now references to the PrimitiveComponent class in the new meta-model. The second meta-model change is handled semi-automatically: the IsInput attribute of Port has been removed from InputPort and OutputPort. This attribute was used in the attribute mapping code shown in Fig. 6 to set the values of attributes in the output model, and this imperative code cannot be migrated without user input because the attribute was deleted. Therefore, all uses of this attribute in the imperative code are commented out, and a warning is emitted to the user. The third change (SignalGain renamed to Gain), is handled automatically because it involves only renaming an attribute. The tool can automatically migrate any imperative attribute mapping code that uses this attribute. Another example of how the transformation is evolved in response to the migration rules is shown in Fig. 11. This is the evolved version of the original transformation rule shown in Fig. 7. Note that this rule reflects two changes: (i) Component now has type PrimitiveComponent, and (ii) the imperative attribute mapping code now uses the Gain attribute of PrimitiveComponent, which was previously named SignalGain. The fifth rule is handled implicitly, and the final rule (Fig. 10), which maps all connections in the original meta-model to a single type of connection in the new meta-model, is handled automatically. 5.2
Handling Missing Semantic Information
As mentioned, a typical source of missing semantic information is addition. In MCL, one can specify the addition of (i) classes, (ii) attributes, (iii) associations. The detection of these elements is simple: they can be identified by either comparing the original and the evolved meta-models or analyzing the MCL models. From the interpreter evolution’s point of view, this means that interpreter rules/rule parts for these elements must be added in the manual pass phase. The nodes and edges in a transformation rule reference the meta-model elements. When the transformation rules are migrated, these references must be adapted to the evolved meta-models (M Msrc and M Mdst ). Referenced but deleted elements mean missing semantic information for the rules. The simplest solution is to delete these nodes and edges from the rules. Our experience has shown that the topology (structure) of the rules is lost in this case, which is not the desired behavior, since the topology is usually preserved or modified subtly.
36
T. Levendovszky et al.
Fig. 11. Evolved migration rule for creating actors
Therefore, such nodes are set to null reference, which means the preservation of the rule structure, but losing the type information. Fig. 12 shows an example of how different parts of a rule can be evolved in varying degrees. This rule is the evolved version of the original transformation rule shown in Fig. 6. There are two things to note. First, the use of the IsInput attribute of Port is automatically commented out of the attribute mapping and a warning is issued to the user. Second, the Port class from the original metamodel is still present. This is because the mapping from Port to either InputPort or OutputPort is a conditional MCL rule, and thus there is no way to automate this part of the transformation rule. The main strength of MCL is that it not only specifies primitive operations, such as deletion, addition, and modification, but also mappings to express causal dependencies. We can use these mappings to replace certain elements with their evolved counterparts. Frequently, these mappings are split: depending on an attribute value, a concept evolves into two or more distinct concepts. This implies an ambiguous mapping. In this case it cannot be assumed that the evolved elements can be processed the same way as their predecessors, meaning that the interpretation logic must be added manually. In our case study, mapping a Port to InputPort and OutputPort is such a situation (Fig. 9). Therefore, the fourth meta-model change, the sub-typing of Port into InputPort and OutputPort, is a fully semantic change and cannot be handled by the algorithm. This is because the MCL rules describe how a given instance of a Port will be migrated to either an InputPort or OutputPort in an instance model, but do not give enough information to decide how the meta-class Port should be evolved in a transformation. In general, this cannot be decided without user intervention.
A Novel Approach to Semi-automated Evolution
37
Fig. 12. Evolved migration rule for creating queues
The warnings emitted by the evolver tool reflect the treatment of the missing semantic information well. The most important warning categories are as follows. If a model element or an attribute has been removed, then the user has to substitute the elements by hand, since the automatic deletion might lead to unexpected behavior either in the pattern matching or in the actual rewriting process. The other important warning group is generated by ambiguous situations. When the evolver tool cannot make a decision, typically in case of multiple migration mappings decided by conditions, a warning is emitted. In the case study, the evolved transformation consisted of the same number rewriting rules. Four pairs were then manually combined due to the newly introduced common base class for InputPort and OutputPort. Another rule was split into two rules to deal with the introduction of InputPort and OutputPort. The deletion of the IsInput attribute of Port required changing the imperative attribute mapping code of one rule. The introduction of a common base class for InputPort, OutputPort and LocalPort required modifying four rules to use the new base class. Overall, three of the rules and both of the test blocks were migrated entirely automatically with no manual changes. A warning was issued about a deleted attribute in one block, which required a manual change because imperative code had been written which referenced that deleted attribute. The rest of the rules were evolved semi-automatically. Manual changes were required in all rules which used the Port class because of the conditional nature of its mapping in the MCL rules as described above.
38
5.3
T. Levendovszky et al.
Implementation and Algorithm
The high-level outline of the algorithm for evolving the transformation is described as follows. ProcessRule(Rule r) for all (PatternClass p in r) do if (p.ref() is in removeClassCache) then DeleteAttributeReferences(p) p.ref()=null else if (p.ref() is in migrateCache and has an outgoing mapsTo) then MigratePatternClass(p) else if (Class c = evolvedMetamodelCache.find(p.ref())) then patternClass.ref()=c else DeleteAttributeReferences(p) p.ref()=null end if if (r has changed) then MarkForChanges(r) end if end for In order to accelerate the algorithm, the migration model, the evolved metamodel, the target meta-model of the transformation and the source meta-model are cached, along with the references to temporary model elements in the transformation. Moreover, the elements that are not in the target model and/or denoted as to be deleted in the migration model are also cached. After the caching, a traversal of the transformation is performed, which takes each rule, and executes the ProcessRule algorithm. The structural part of the rule is composed of (i) pattern classes that are references to meta-model classes in the input and output meta-models of the transformation, (ii) connections referencing the associations in the input and output meta-model, and (iii) references to temporary classes and associations that store non-persistent information during the transformation. Moreover, the rules can contain attribute transformations, which query and set the attributes of the transformed model. The attributes and their types are determined by the meta-model classes referenced by the pattern classes. The algorithm takes each pattern class, and distinguishes four cases. (i) If the meta-model class referenced by the pattern class is to be deleted, then the attribute transformations are scanned, and if they reference the attributes provided by the removed class, they are commented out and a warning is emitted. (ii) If the referenced class is in the migration model, the class must be migrated as described in Section 3. If there is only one mapsTo relationship, we redirect the references to the new class, and we update the attribute transformations according to the migration rule. If there are multiple mapsTo relationships originating from the class to be migrated, we cannot resolve this ambiguous situation in the rule, thus, we emit a warning. If there are only wasMappedTo relationships, we fall back on the next case. (iii) If we can transfer the reference to the new
A Novel Approach to Semi-automated Evolution
39
model with name-based match, we do it, emitting a warning that the assignment should be described in the migration model. (vi) If none of the cases above solve the migration, we treat the referenced class as if it were to be deleted, emitting a warning that this should also be a rule in the migration model. Note that we never delete a pattern class, because it would lose the structure of the original rule. On deletion of the referenced class, the referencing pattern class is made to point to null. Because the transformation references the meta-model elements, the references in the source meta-model should be changed to point to the elements of the evolved meta-model. This is also the simplest scenario: if the source metamodel and the evolved meta-model are models with different locations, but containing the same model elements, the references are redirected to the evolved meta-models. This redirection is performed by matching the names of the model elements. Because the algorithm traverses the rules, if a meta-model element that is not referenced by the rules is added, we will not give a warning that it should be included in the evolved transformation.
6
Conclusion
There are several reasons why DSMLs evolve. With the evolution of the language the infrastructure must also evolve. We have developed a method for cases in which the modeling language evolves in small steps, as opposed to sudden, fundamental changes. Interpreters are huge investments when creating a DSML-based environment. In this paper, we contributed a method for interpreter evolution under certain circumstances. The discussed transformation operations and their categories are depicted in Table 1. Table 1. Summary of the Evolved Transformation Steps Fully Automated
Partially Automated
Fully Semantic
Rename an element Delete class Add new element Change stereotype Delete connection Add attributes Rename attribute Subtyping Change attribute type Delete attribute
We investigated avionics software applications, and we found that these circumstances hold for the industrial use cases. The algorithms have been implemented in the GME/GReAT toolset, and have been tested in an industrial environment. The drawbacks of the method include the following. Sometimes the changes might be too abrupt for MCL. In this case our tool set still provides the fall back
40
T. Levendovszky et al.
to the general model transformation method. If the interpretation semantics of the existing elements change, the transformation created by the automatic pass must be modified. When too many new elements are added to the transformation, it means a significant amount of manual work. Future work is devoted to providing tool support for the addition of the missing semantic information. Firstly, we identify the most prevalent scenarios, and collect them into a pattern catalog. Secondly, we create a tool that detects the applicability of the pattern and offers its application. Obviously, human interaction is always needed in the discussed cases, but the effort can be minimized by offering complete alternatives for the most frequent use cases.
References [AALK+ 09]
[AKNK+ 06]
[BV06]
[BvKK+ 08]
[EE08]
[EEPT06]
[Kar00] [MBL+03]
[MV04]
Angyal, L., Asztalos, M., Lengyel, L., Levendovszky, T., Madari, I., Mezei, G., M´esz´ aros, T., Siroki, L., Vajk, T.: Towards a fast, efficient and customizable domain-specific modeling framework. In: Proceedings of the IASTED International Conference, Innsbruck, Austria, February 2009, vol. 31, pp. 11–16 (2009) Agrawal, A., Karsai, G., Neema, S., Shi, F., Vizhanyo, A.: The design of a language for model transformations. Software and Systems Modeling 5(3), 261–288 (2006) Balogh, A., Varr´ o, D.: Advanced model transformation language constructs in the VIATRA2 framework. In: ACM Symposium on Applied Computing — Model Transformation Track (SAC 2006), pp. 1280–1287. ACM Press, New York (2006) Balasubramanian, D., van Buskirk, C., Karsai, G., Narayanan, A., Neema, S., Ness, B., Shi, F.: Evolving paradigms and models in multiparadigm modeling. Technical Report ISIS-08-912, Institute for Software Integrated Systems (December 2008) Ehrig, H., Ermel, C.: Semantical correctness and completeness of model transformations using graph and rule transformation. In: Ehrig, H., Heckel, R., Rozenberg, G., Taentzer, G. (eds.) ICGT 2008. LNCS, vol. 5214, pp. 194–210. Springer, Heidelberg (2008) Ehrig, H., Ehrig, K., Prange, U., Taentzer, G.: Fundamentals of Algebraic Graph Transformation. Monographs in Theoretical Computer Science. An EATCS Series. Springer, Heidelberg (2006) Karsai, G.: Why is XML not suitable for semantic translation. Research Note, Nashville, TN (April 2000) Magyari, E., Bakay, A., Lang, A., Paka, T., Vizhanyo, A., Agrawal, A., Karsai, G.: Udm: An infrastructure for implementing domain-specific modeling languages. In: The 3rd OOPSLA Workshop on DomainSpecific Modeling, OOPSLA 2003, Anahiem, California (October 2003) Mosterman, P.J., Vangheluwe, H.: Computer automated multiparadigm modeling: An introduction. Simulation: Transactions of the Society for Modeling and Simulation International 80(9), 433–450 (2004); Special Issue: Grand Challenges for Modeling and Simulation
A Novel Approach to Semi-automated Evolution [NLBK09]
[NNZ00]
[PP96]
[Roz97]
[Spr03] [Tae04]
41
Narayanan, A., Levendovszky, T., Balasubramanian, D., Karsai, G.: Automatic domain model migration to manage metamodel evolution. In: Sch¨ urr, A., Selic, B. (eds.) MODELS 2009. LNCS, vol. 5795, pp. 706– 711. Springer, Heidelberg (2009) Nickel, U., Niere, J., Z¨ undorf, A.: The fujaba environment. In: ICSE 2000: Proceedings of the 22nd international conference on Software engineering, pp. 742–745. ACM, New York (2000) Parisi-Presicce, F.: Transformation of graph grammars. In: 5th Int. Workshop on Graph Grammars and their Application to Computer Science, pp. 428–492 (1996) Rozenberg, G. (ed.): Handbook of graph grammars and computing by graph transformation. Foundations, vol. I. World Scientific Publishing Co., Inc., River Edge (1997) Sprinkle, J.: Metamodel Driven Model Migration. PhD thesis, Vanderbilt University, Nashville, TN 37203 (August 2003) Taentzer, G.: AGG: A graph transformation environment for modeling and validation of software. In: Pfaltz, J.L., Nagl, M., B¨ ohlen, B. (eds.) AGTIVE 2003. LNCS, vol. 3062, pp. 446–453. Springer, Heidelberg (2004)
Study of an API Migration for Two XML APIs Thiago Tonelli Bartolomei1, Krzysztof Czarnecki1, Ralf L¨ammel2, and Tijs van der Storm3 1 Generative Software Development Lab Department of Electrical and Computer Engineering University of Waterloo, Canada 2 Software Languages Team Universit¨at Koblenz-Landau, Germany 3 Software Analysis and Transformation Team Centrum Wiskunde & Informatica, The Netherlands
Abstract. API migration refers to adapting an application such that its dependence on a given API (the source API) is eliminated in favor of depending on an alternative API (the target API) with the source and target APIs serving the same domain. One may attempt to automate API migration by code transformation or wrapping of some sort. API migration is relatively well understood for the special case where source and target APIs are essentially different versions of the same API. API migration is much less understood for the general case where the two APIs have been developed more or less independently of each other. The present paper exercises a simple instance of the general case and develops engineering techniques towards the mastery of API migration. That is, we study wrapper-based migration between two prominent XML APIs for the Java platform. The migration follows an iterative and test-driven approach and allows us to identify, classify, and measure various differences between the studied APIs in a systematic way.
1 Introduction APIs are both a blessing and a curse. They are a blessing because they enable domainspecific reuse. They are a curse because they lock our software into concrete APIs. Each API is quite specific, if not idiosyncratic, and accounts effectively for a form of ‘software asbestos’ [KLV05]. That is, it is difficult to adapt an application with regard to the APIs it uses. We use the term API migration for the kind of software adaptation where an application’s dependence on a given API (the source API) is eliminated in favor of depending on an alternative API (the target API) with the source and target APIs serving the same domain. API migration may be automated, in principle, by (i) some form of source- or bytecode transformation that directly replaces uses of the source API in the application by corresponding uses of the target API or (ii) some sort of wrapping, i.e., objects of the target API’s implementation are wrapped as objects that comply with the source API’s interface. In the former case, the dependence on the source API is eliminated entirely. In the latter case, the migrated application still depends on the source API but no longer on its original implementation. M. van den Brand, D. Gaˇsevi´c, J. Gray (Eds.): SLE 2009, LNCS 5969, pp. 42–61, 2010. c Springer-Verlag Berlin Heidelberg 2010
Study of an API Migration for Two XML APIs
43
Incentives for API Migration One incentive for API migration is to replace an aged (less usable, less powerful) API by a modern (more usable, more powerful) API. The modern API may in fact be a more recent version of the aged API, or both APIs may be different developments. For instance, a C# 3.0+ (or VB 9.0+) developer may be keen to replace the hard-touse DOM API for XML programming by the state-of-the-art API ‘LINQ to XML’. The above-mentioned transformation option is needed in this particular example; the wrapping option would not eradicate DOM style in the application code. Another incentive is to replace an in-house or project-specific API by an API of greater scope. For instance, the code bases of several versions of SQL Server and Microsoft Word contain a number of ‘clones’ of APIs that had to be snapshotted at some point in time due to alignment conflicts between development and release schedules. As the ‘live’ APIs grow away from the snapshots, maintenance efforts are doubled (think of bug fixes). Hence one would want to migrate to the live APIs at some possible synchronization point—either by transformation or by wrapping. The latter option may be attractive if the application should be shielded against evolution of the live API. Yet another incentive concerns the reduction of API diversity in a given project. For instance, consider a project that uses a number of XML APIs. Such diversity implies development costs (since developers need to master these different APIs). Also, it may imply performance costs (when XML trees need to be converted back and forth between the different object models of the APIs). Wrapping may mitigate the latter problem whereas transformation mitigates both problems. There are yet more incentives. For instance, API migration may also be triggered by license, copyright and standardization issues. As an example, consider a project where the license cost of a particular API must be saved. If the license is restricted to the specific implementation, then a wrapper may be used to reimplement the API (possibly on top of another similar API), and ideally, the application’s code will not be disturbed. The ‘Difficulty Scale’ of API Migration Consider API evolution of the kind where the target API is a backwards-compatible upgrade of the source API. In this case, API migration boils down to the plain replacement of the API itself (e.g., its JAR in the case of Java projects); no code will be broken. When an API evolves, one may want to obsolete some of its methods (or even entire types). If the removal of obsolete methods should be enforced, then API migration must replace calls to the obsoleted methods by suitable substitutes. In the case of obsoletion, the transformation option of API migration boils down to a kind of inlining [Per05]. The wrapping option would maintain the obsolete methods and implement them in terms of the ‘thinner’ API. Now consider API evolution of the kind where the target API can be derived from the source API by refactorings that were accumulated on an ongoing basis or automatically inferred or manually devised after the fact. The refactorings immediately feed into the transformation option of API migration, whereby they are replayed on the application [HD05, TDX07]. The refactorings may also be used to generate adapter layers (wrappers) such that legacy applications may continue to use the source API’s interface implemented in terms of the target API [S¸RGA08, DNMJ08].
44
T.T. Bartolomei et al.
Representing the evolution of an API as a proper refactoring may be hard or impossible, however. The available or conceivable refactoring operators may be insufficient. The involved adaptations may be too invasive, and they may violate semantics preservation in borderline situations in a hard to understand manner. Still, there may be a systematic way of co-adapting applications to match API evolution. For instance, there is work [PLHM08, BDH+ 09] that uses control-flow analysis, temporal logic-based matching, and rewriting in support of evolving Linux device drivers. Ultimately, we may consider couples of APIs that have been developed more or less independently of each other. Of course, the APIs still serve the same domain. Also, the APIs may agree, more or less, on features and the overall semantic model at some level of abstraction. The APIs will differ in many details however. We use the term API mismatch to refer to the resulting API migration challenge—akin to the impedance mismatch in object/relational/XML mapping [Amb06, Tho03, LM07]. Conceptually, an API migration can indeed be thought of as a mapping problem with transformation or wrapping as possible implementation strategies. The ‘Risk’ of API Migration The attempted transformations or wrappers for API migration may become prohibitively complex and expensive (say in terms of code size and development effort)—compared to, for example, the complexity and costs of reimplementing the source API from scratch. Hence, API migration must balance complexity, costs, and generality of the solution in a way that is driven by the actual needs of ‘applications under migration’. Vision API migration for more or less independently developed APIs is a hard problem. Consider again the aforementioned API migration challenge of the .NET platform. The ‘LINQ to XML’ API is strategically meant to revamp the platform by drastically improving the productivity of XML programmers. Microsoft has all reason to help developers with the transition from DOM to ‘LINQ to XML’, but no tool support for API migration has ever been provided despite strong incentive. Our work is a call to arms for making complex API migrations more manageable and amenable to tool support. Contributions 1. We compile a diverse list of differences between several APIs in the XML domain. This list should be instrumental in understanding the hardness of API migration and sketching benchmarks for technical solutions. 2. We describe a study on wrapper-based API migration for two prominent XML APIs of the Java platform. This migration is unique and scientifically relevant in so far that the various differences between the chosen APIs are identified, classified, and measured in a systematic way. The described process allows us to develop a reasonably compliant wrapper implementation in an incremental and test-driven manner.1 1
We provide access to some generally useful parts of the study on the paper’s website: http://www.uni-koblenz.de/˜laemmel/xomjdom/
Study of an API Migration for Two XML APIs
45
Limitations We commit to the specifics of API migration by wrapping, without discussing several complications of wrapping and hardly any specifics of transformation-based migration. We commit to the specifics of XML, particular XML APIs, and Java. We only use one application to validate the wrapper at hand. Much more research and validation is needed to come up with a general process for API migration, including guarantees for the correctness of migrated applications. Nevertheless, we are confident that our insights and results are substantial enough to serve as a useful call to arms. Road-Map §2 takes an inventory of illustrative API differences within the XML domain. §3 introduces the two XML APIs of the paper’s study and limits the extent of the source API to what has been covered by the reported study on API migration. §4 develops a simple and systematic form of wrapper-based API migration. §5 discusses the compliance between source API and wrapper-based reimplementation, and it provides some engineering methods for understanding and improving compliance. §6 describes related work, and §7 concludes the paper.
2 Illustrative Differences between XML APIs We identify various differences between three major APIs for in-memory XML processing on the Java platform: DOM, JDOM and XOM. The list of differences is by no means exhaustive, but it clarifies that APIs may differ considerably with regard to sets of available features, interface and contracts for shared features, and design choices. API migration requires different techniques for the listed differences; we allude to those techniques in passing only. In the following illustrations, we will be constructing, mutating and querying a simple XML tree for a (purchase) order such as this: <product>4711 <customer>1234
2.1 This-Returning vs. Void Setters Using the JDOM API, we can construct the XML tree for the order by a nested expression (following the nesting structure of the XML tree): // JDOM −− nested construction by method chaining Element order = new Element("order"). addContent(new Element("product"). addContent("4711")). addContent(new Element("customer"). addContent("1234"));
46
T.T. Bartolomei et al.
This is possible because setters of the JDOM API, e.g., the addContent method, return this, and hence, one can engage in method chaining. Other XML APIs, e.g., XOM, use void setters instead, which rule out method chaining. As a result, the construction of nested XML trees has to be rendered as a sequence of statements. Here is the XOM counterpart for the above code. // XOM −− sequential construction Element order = new Element("order"); Element product = new Element("product"); product.appendChild("4711"); order.appendChild(product); Element customer = new Element("customer"); customer.appendChild("1234"); order.appendChild(customer);
It is straightforward to transform XOM-based construction code to JDOM because this-returning methods can be used wherever otherwise equivalent void methods were used originally. In the inverse direction, the transformation would require a flattening phase—including the declaration of auxiliary variables. A wrapper with JDOM as the source API could easily mitigate XOM’s lack of returning this. 2.2 Constructors vs. Factory Methods The previous section illustrated that the XOM and JDOM APIs provide ordinary constructor methods for XML-node construction. Alternatively, XML-node construction may be based on factory methods. This is indeed the case for the DOM API. The document object serves as factory. Here is the DOM counterpart for the above code; it assumes that doc is bound to an instance of type Document. // DOM −− sequential construction with factory methods Element order = doc.createElement("order"); Element product = doc.createElement("product"); product.appendChild(doc.createTextNode("4711")); order.appendChild(product); Element customer = doc.createElement("customer"); customer.appendChild(doc.createTextNode("1234")); order.appendChild(customer);
It is straightforward to transform factory-based code into constructor-based code because the extra object for the factory could be simply omitted in the constructor calls. In the inverse direction, the transformation would be challenged by the need to identify a suitable factory object as such. A wrapper could not reasonably map constructor calls to factory calls because the latter comprise an additional argument: the factory, i.e., the document. 2.3 Identity-Based vs. Position-Based Replacement All XML APIs have slightly differing features for data manipulation (setters, replacement, removal, etc.). For instance, suppose we want to replace the product child of an order. The XOM API provides the replaceChild method that directly takes the old and the new product:
Study of an API Migration for Two XML APIs
47
// XOM −− replace product of order order.replaceChild(oldProduct, newProduct);
The JDOM API favors index-based replacement, and hence the above functionality has to be composed by first looking up the index of the old product, and then setting the content at this index to the new product. Thus: // JDOM −− replace product of order int index = order.indexOf(oldProduct); order.setContent(index, newProduct);
It is not difficult to provide both styles of replacements with both APIs. (Hence, a wrapper can easily serve both directions of API migration.) However, if we expect a transformation to result in idiomatic code, then the direction of going from position-oriented to identity-oriented code is nontrivial because we would need to match multiple, possibly distant method calls simultaneously as opposed to single method calls. 2.4 Eager vs. Lazy Queries Query execution returns some sort of collection that may differ—depending on the API—with regard to typing and the assumed style of iteration. Another issue is whether queries are eager or lazy. Consider the following XOM code that queries all children of a given order element and detaches (i.e., removes) them one-by-one in a loop: // XOM −− detach all children of the order element Elements es = order.getChildElements(); for (int i=0; i<es.size(); i++) es.get(i).detach();
The above XOM code is operational because XOM’s queries are eager, and hence the query results are fully materialized before the corresponding collection can be processed. Here is the apparent JDOM counterpart: // JDOM −− illegal detachment loop for (Object k : order.getChildren()) ((Element)k).detach();
Alas, the execution of this code will throw an exception because getChildren returns essentially a lazy iterator on the actual content list of order; changing that list invalidates the iterator. Hence, an operational JDOM counterpart must explicitly ‘snapshot’ the query result, say, in an extra object array as follows: // JDOM −− detachment loop with up−front snapshot Object[] es = order.getChildren().toArray(); for (Object k : es) ((Element)k).detach();
Arguably, this difference can be mitigated both by a transformation or in a wrapper. Of course, such semantic differences may go unnoticed for some time, and schemes of snapshotting may lead to noteworthy performance penalties.
48
T.T. Bartolomei et al.
2.5 Un-/Availability of API Capabilities When XML is used as a model in an MVC/GUI application, then an event system is likely needed. For instance, the DOM API allows us to register event listeners with different kinds of events. The following code fragment registers a listener with the order element, which invokes its handler for any sort of node insertion: // DOM −− register a listener for node insertion ((EventTarget)order).addEventListener( "DOMNodeInserted", // mutation type new EventListener() { public void handleEvent(Event evt) { // ... handle event ... } }, false);
Neither JDOM nor XOM provide an event system. More generally, we may face API couples where the target API misses some (nontrivial) capability of the source API. In some cases, the capability may be added by extension techniques (e.g., subclasses). In other cases, conservative extension techniques may be insufficient. For instance, the addition of an event system to an XML API would crosscut a considerable part of the API. 2.6 Less vs. More Strict Pre-conditions Typically, XML APIs make an effort to quietly handle exceptional situations as long as well-formedness of XML trees is not jeopardized and no other blatant programming error would go unnoticed. Still the APIs differ as to where to draw the line. Consider the following JDOM code fragment, which attempts to remove the product child of order twice: // JDOM −− exercise borderline case for node removal order.removeContent(product); // properly removes. order.removeContent(product); // quietly completes.
The above code will execute quietly because JDOM’s pre-condition is weak here: it does not insist that the argument node must be in the container on which removal is performed. In contrast, the following XOM code throws an (unchecked) exception: // XOM −− exercise borderline case for node removal order.removeChild(product); // properly removes. order.removeChild(product); // throws!
Such differences in pre-conditions (likewise for post-conditions) are challenging in API migration. If these differences are simply addressed by defensive programming techniques, then code bloat and inefficiency may be the result. In particular, in the case of the transformation option of API migration, it is not straightforward to produce idiomatic (concise) code.
3 The API Couple of the Study The reported study on API migration concerns the XOM and JDOM APIs, with the goal of reimplementing XOM in terms of JDOM.2 That is, JDOM is wrapped as XOM, 2
We use the current versions of those APIs: XOM 1.2.1 and JDOM 1.1.
meaning that types with the original XOM interfaces are implemented as wrappers with JDOM objects as wrappees. XOM and JDOM are two prominent XML APIs for the Java platform. They have been developed independently, say, by different software architects, in different code bases, and based on different design rationales.3 The main reason why our study considers migrating from XOM to JDOM, rather than v.v., is the availability of a comprehensive API test suite for XOM. Although wrapping an older API (JDOM) as a newer one (XOM) might appear counter-intuitive at first, such scenario is plausible in practice since migration drivers such as legal issues do not necessarily follow technical criteria. In the sequel, we present some basic metrics and architectural details about the two APIs. We also describe the scope and some limitations of the migration and the available means for test-driven development. 3.1 API Package Structure Table 1 lists XOM’s and JDOM’s packages. For each package, the second column gives the total number of declared types (i.e., classes and interfaces) except any descendants of Throwable. The third column is concerned with the latter, i.e., it gives the number of exception classes. The last column lists NCLOC (‘Non-Comment Lines of Code’) per package as an indication of the size (code complexity) of the packages and the APIs. Let us look at XOM’s packages first. The nu.xom package is XOM’s core package (the core API). All the other packages cover specialized feature themes: canonical XML, DOM and SAX interoperability, XInclude support, and XSLT integration. Our study only covers the core API; we omit the discussion of all other themes (packages) in the present paper. JDOM’s core resides in the org.jdom package; it matches roughly the types and features of XOM’s core, but we will discuss the correspondence more precisely below. The remaining packages cover, again, specialized feature themes: DOM interoperability, content filters for query functionality, advanced de-/serialization support, and XSLT and XPath integration.
3
See http://www.artima.com/intv/jdom.html for background on the design rationales.
50
T.T. Bartolomei et al. Table 2. Metrics on the core XOM/JDOM classes
nu.xom #Implementations Attribute 20 Attribute.Type 4 Builder 15 Comment 9 DocType 18 Document 15 Element 38 Elements 2 Namespace 9 Node 8 NodeFactory 11 Nodes 8 ParentNode 8 ProcessingInstruction 11 Serializer 35 Text 9 XPathContext 5 Core Total 225
org.jdom #Implementations Attribute 29 CDATA 6 Comment 6 Content 9 Document 41 Element 76 JDOMFactory 25 Namespace 7 ProcessingInstruction 15 Text 12 input.SAXBuilder 39 output.XMLOutputter 47 Core Total 312
3.2 Core API Features Table 2 lists all types of XOM’s core and the corresponding JDOM types that were needed for XOM’s reimplementation. XOM’s core is mainly matched by JDOM’s core, but two additional types from the packages org.jdom.input and . . . .output are needed; c.f., the right-hand side of Table 2. This is mainly because de-/serialization is part of XOM’s core, whereas JDOM has designated packages for these functions. We omit exception types as well as package-private types in the table entirely. For each type (row), we show the number of methods that the type explicitly implements. This metric can be seen as a proxy for the effort needed in API migration. In our study, for example, each such implementation required roughly one corresponding method implementation in the wrapper. In some situations, we may want to consider additional metrics, however. One such example is an interface complexity metric, defined as the number of methods a type understands (possibly including inherited or abstract methods). The inclusion of abstract methods is of particular interest to framework APIs, which may declare operations with no framework-provided implementations. Yet other metrics could take into account the fact that polymorphic implementations of the source API may need to be migrated differently depending on the specific receiver type. For instance, a given method implementation of the source API may have different pre- and post-conditions for different receiver types. Also, a given method declaration of the source API may be implemented on a base type, whereas the target API’s class hierarchy requires implementations on derived types. Such issues break the regularity of a wrapper’s implementation. In the study, the impact of these issues was limited. The #Implementations numbers of Table 2 give an idea of the feature complexity of the core API and the relative contribution of the different API types. It is immediately obvious that XOM has fewer methods than JDOM. In fact, JDOM is known to
The list of test classes maps roughly to the core API classes. There are 685 additional test cases for the omitted themes of the XOM API. The TestCases are JUnit test classes with the shown number of test methods. Each test method tends to involve a small number of tests as evident from the number of assertions. Finally, we should mention that XOM also comes with a separate harness of basic benchmarks to test the speed and memory footprint of XOM programs. We have not used these benchmarks in any manner, but it would be interesting to systematically compare XOM’s performance with the one of a wrapper-based reimplementation.
provide many ‘convenience methods’, which explains this difference. Interestingly, the NCLOC numbers of the core packages in Table 1 clarify that XOM is substantially more complex than JDOM (in terms of code size). This difference involves several factors— also including incidental ones such as programming style. Most importantly, however, XOM is known to make a considerable effort to guarantee XML well-formedness. It pursues this goal by means of heavy checking, which directly affects the NCLOC metric. 3.3 XOM’s Test Suite The study uses test-driven development to push for compliance of the wrapper-based reimplementation of XOM with the original XOM API. We use the excellent XOM test suite to this end. JDOM’s test suite does not have any role in this effort. Table 3 describes XOM’s test suite in more detail.
4 Wrapper-Based API Migration We will describe a simple and systematic form of wrapper-based API migration. In particular, we reimplement XOM in terms of JDOM. Hence, application code can be completely preserved because it may continue to depend on the interface of XOM. 4.1 API Mapping We begin a wrapper-based API migration by mapping each source type and method to a suitable target type and method. Such mapping requires domain knowledge; types and methods are compared at the level of domain concepts and their operations.
52
T.T. Bartolomei et al. Table 4. Metrics on the XOM/JDOM mapping
The table misses one core type; see Table 2 for the full list. That is, Namespace is omitted because it is only used by the original XOM implementation.
When mapping source types, we distinguish regular vs. irregular types. We say that a type is regular if it corresponds to a single target type; otherwise, the type is irregular. Indeed, some source types may need to be associated with multiple target types; yet other source types may lack a counterpart. When mapping source methods, again, we distinguish regular vs. irregular methods. We say that a method is regular if it corresponds to a single target method provided by (one of) the target type(s); otherwise, the method is irregular. Table 4 summarizes the API mapping for the XOM/JDOM study. We obtained the mapping posteriori by inspecting the wrapper types and methods. 75% of all source methods provided by the wraper are regular. There are 4 irregular source types. For instance, JDOM does not provide a common base class like XOM’s Node; some of its polymorphic methods have their counterparts implemented in multiple JDOM types instead. Please note that the number of source methods per type in Table 4 slightly deviates from Table 2 because the wrapper places some of the method implementations at different levels in the class hierarchy when compared to the original XOM implementation. 4.2 Wrapper Implementation We begin with an ‘empty’ reimplementation of the source API as follows. Each interface of the source API is reused as is by the reimplementation. Each class of the source API is reimplemented with the same interface, but with ‘empty’ (exception-throwing) method implementations. This empty reimplementation is compilable by construction, and any application of the API’s original implementation remains compilable. Applications can be redirected to the new implementation by replacement of the API’s JAR, by aspect-oriented programming, or by (manually) changing package references.
Study of an API Migration for Two XML APIs
53
The next step is to turn the empty types into proper wrapper types. Here we systematically apply the design pattern for object adapters, where we implement the API mapping (c.f., §4.1) as follows. Each wrapper class (i.e., each class of the reimplementation of the source API) is set up, if possible, as an object adapter with an object of the target API as the adaptee (also called the wrappee). For instance, the different Element types of XOM and JDOM would engage in a corresponding wrapper class as follows: package nu.xom; public class Element { private org.jdom.Element wrappee; // implement interface of wrapper in terms of wrappee }
A few special cases should be mentioned in passing. First, abstract wrapper types may not need any wrappee type. Second, when we implement the wrapper class for a source type with multiple associated target types, the wrappee type might need to be an imprecise upper bound, such as Object, and methods may need to perform type dispatch (e.g., via instanceof) to invoke methods on the wrappee. We speak of a minor wrapping disorder if a single wrappee object per wrapper object is fundamentally insufficient for reimplementation. This could happen, for example, if the source API intrinsically assumes a richer state than the target API. For instance, a reimplementation of DOM in terms of XOM or JDOM would need to maintain extra state in order to provide an event system; c.f., §2.5. Such disorders may be encountered late during implementation efforts, and they may trigger amendments of the API mapping; c.f., §4.1. We speak of a major wrapping disorder if method invocations on the source API (handled by the wrapper) may need to be deferred or even rejected because there is yet state missing for the corresponding invocations on the target API. For instance, a reimplementation of XOM or JDOM in terms of DOM is challenging because XOM/JDOM’s constructors are not implementable in terms of DOM’s factory methods; c.f., §2.2. The XOM/JDOM study involves only one minor wrapping disorder. The type nu.xom.Serializer receives a writer through a constructor argument, whereas the associated type org.jdom.output.XMLOutputter receives the writer through method calls. Hence, the XOM type must store the writer throughout. 4.3 Levels of Adaptation Ir-/regularity of a source method is based solely on the number of its associated target methods. There is a richer scale of adaptation levels that usefully classifies reimplemented methods, however. In the following, we define the different adaptation levels for a given source method m. Adaptation level 1. m is a regular method with m as the associated target method. The reimplementation of m only performs basic delegation of m to m on the wrappee (including wrapping and unwrapping). Argument positions may also be filled in by defaults. this-returning may be turned into void methods and v.v.; c.f., §2.1. Adaptation level 2. Additional adaptations are involved in comparison to level 1. That is, arguments may be pre-processed (converted or checked); results may be
54
T.T. Bartolomei et al. Table 5. Adaptations per level for XOM/JDOM
nu.xom Attribute Attribute.Type Builder Comment DocType Document Element Elements Node NodeFactory Nodes ParentNode ProcessingInstruction Serializer Text XPathContext
Basic delegation (level 1) suffices for a bit less than half of all methods; more than a quarter requires some pre-/post-processing (level 2); the remainder needs to be composed from other methods (level 3) or developed from scratch (level 4). It turns out, however, that all level 4 methods were not at all complex and could be implemented without problems. There are a few methods of the Serializer class that are not associated with an adaptation level. These methods were not implemented because there was no straightforward way of doing so, and the sample application used in the study did not exercise these methods.
post-processed (c.f., §2.4); exceptions may be translated; error codes may be converted into exceptions and v.v.; the delegation may also be conditional, subject to simple tests of the arguments; c.f., §2.6. Adaptation level 3. m is an irregular method. Its implementation may invoke any number of target methods, but without reimplementing any functionality of the target API. In informal terms, a level 3 method is one that is effectively missing in the target API but which can be recomposed from other methods of the target API. Adaptation level 4. The level 3 condition of ‘not reimplementing any methods of the target API’ must be violated. In informal terms, level 4 methods violate the ‘intention of reuse’ for reimplementing the source API in terms of the target API. Table 5 shows the methods per type and adaptation level for the study. We have assigned these levels manually (by categorizing the implementation) and recorded them through method annotations on the wrapper types. The shown numbers depend on a ‘judgement call’ for the required compliance of the wrapper as discussed in the next section. The more one pushes for full compliance, the more methods would be pushed upwards on the level scale; also, the more complex some method implementations would get. We would like to generally avoid method implementations at the adaptation level 4. That is, any substantial violation of the ‘intention of reusing’ the target API runs fundamentally counter the motivation of API migration. Likewise, we would like to avoid complicated or inefficient method implementations at the adaptation levels 2–3.
5 API Compliance In simple terms, the wrapper-based reimplementation of the source API should be ‘fully compliant’ with the original (implementation of the) source API. Compliance could
Study of an API Migration for Two XML APIs
55
be interpreted in the sense of contract-based equivalence for the original implementation and the wrapper. In practice, APIs often lack comprehensive contracts (pre-/postconditions and invariants). Hence, test-based methods are needed. Using such test-based methods, ‘compliance issues’ are gradually discovered, and possibly resolved. In the following, we clarify the process for discovering compliance issues; we categorize these issues; and we defend the idea that some issues may remain unresolved. The XOM/JDOM study continues to serve as the running example. 5.1 Test Suite-Based Compliance A strong test suite for the source API appears to be a reasonable tool in establishing compliance of the original API and the wrapper-based reimplementation. However, an important insight of our work is that it may be prohibitively expensive to achieve full compliance with regard to such a test suite (because it may approximate contract-based compliance at a very detailed, idiosyncratic level). Indeed, in the study, we have ultimately accepted partial compliance with approx. 40 % of all test cases not producing the expected result with the wrapper: – # XOM test suite – all test cases: 697 – # XOM test suite – compliant test cases: 417 – # XOM test suite – non-compliant test cases: 280 In general, a strong test suite for the source API may be the initial driver in pushing the wrapper towards some basic compliance. Such a test suite is even more useful if it clearly identifies mainstream API-usage scenarios that must not be disturbed by noncompliance. To limit effort, one would initially concentrate on a smaller core API and important API-usage scenarios, indeed. In the study, initially, we used a considerably smaller core of XOM. For instance, we left out Serializer because XOM has already a serialization capability through its toXml method. Also, we left out DocType (i.e., DTD) support because it seemed difficult to provide such support in the view of JDOM’s lack of comprehensive DocType support. Ultimately, API migration is driven by the actual ‘application under migration’. The application may call for an extension of the initially covered API and for the inclusion of more API-usage scenarios. In the study, we picked an application under migration by searching the SourceForge repository for an application that both makes substantial use of XOM and references XOM in (say, JUnit-based) test cases. The best fit was CDK.4 In general, one needs to push the wrapper towards full compliance with the application’s test suite—potentially balancing the wrapper development effort and the degree of automation of migration. In the study, we reached full compliance without any need for manual adaptations of the application except for 3 test cases whose dependence on the order of XML attributes had to be relaxed. The following numbers only cover CDK’s test cases that use XOM. 4
Chemistry Development Kit (CDK) is a Java library for structural chemo- and bioinformatics; c.f., http://sourceforge.net/apps/mediawiki/cdk/. The used checkout of CDK does not pass all of its test suite even with the original XOM implementation. We have only looked into compliance for test cases that passed with the original XOM implementation.
56
T.T. Bartolomei et al.
– # CDK test suite – all test cases: 752 – # CDK test suite – compliant test cases: 752 – # CDK test suite – non-compliant test cases: 0 One of the reasons of compliance with the application’s test suite vs. non-compliance with the API’s test suite is of course that any given application will exercise the source API only in a limited manner. However, this may be even true for a reasonable test suite of an API. Consider the following numbers that we determined in the study: – # all implementations of the wrapper: 277 – # XOM test suite – exercised method implementations: 156 – # CDK test suite – exercised method implementations: 35 Hence, about 3/5 of all method implementations where exercised by the API’s test suite, and only about 1/10 were exercised by the application’ test suite. Inspection reveals that the API’s test suite specifically misses many of the more trivial methods (such as getters and setters and diversely overloaded constructors). 5.2 Compliance Levels It is now a central question whether or not the application runs into any of the compliance issues manifested by the API’s test suite. The following method can be applied in this context. Each API method can be associated with a compliance level relative to any test suite as follows: – – – –
always: it is exercised in compliant test cases only. sometimes: it is exercised in both compliant and non-compliant test cases. never: it was exercised but never in compliant test cases. unused: it is not exercised at all in any test cases.
The status of each method with regard to the application’s test suite can now be compared with its status with regard to the API’s test suite. This comparison is visualized for the study in Table 6. The table illustrates that several methods with compliance Table 6. Compliance levels in the XOM/JDOM study nu.xom #always #sometimes #never #unused Attribute 13 / 3 [ ,11] 4 [1, 3] 11 / 25 Attribute.Type 3 [ ,3] 1/4 Builder 1 / 2 [ ,1] 7 [2, 5] 7 / 13 Comment 7 [ ,7] 2 [0, 2] 4 / 13 DocType 8 [ ,8] 5 [0, 5] 9 / 22 Document 7 / 1 [ ,7] 12 [1, 11] 8 / 26 Element 15 / 21 [ ,9] 28 [13, 15] - 8 / 30 [2, ] Elements 0 / 2 [ ,0] 2 [2, 0] Node 2/2 NodeFactory 4 [0, 4] 1 [0, 1] 6 / 11 Nodes 2 / 2 [ ,2] 3 [2, 1] 4/7 ParentNode ProcessingInstruction 9 [ ,9] 1 [0, 1] 7 / 17 Serializer 3 / 3 [ ,3] 8 [3, 5] 3 [0, 3] 2 / 13 Text 7 [ ,7] 1 [0, 1] 5 / 13 XPathContext 0 / 1 [ ,0] 5 / 4 [1, ] Total 75 / 35 [ ,67] 77 [24, 53] 4 [0, 4] 79 / 200 [3, ]
XOM/CDK: The first number in each cell shows the compliance level for XOM’s test suite. The number after the slash (if any) shows the compliance level for CDK’s test suite. Note that all CDK test cases succeed; hence there are no methods at levels #sometimes or #never. [moves to #always, moves to #unused]: The numbers in square brackets (if any) describe the moves between the levels with the ‘initial’ position defined by XOM’s test suite and the ‘final’ position defined by CDK’s test suite. For example, Attribute had 11 methods moved from #always to #unused, 1 from #sometimes to #always, and 3 from #sometimes to #unused.
Study of an API Migration for Two XML APIs
57
Table 7. Samples of compliance issues in the XOM/JDOM study Type Methods Attribute toXML()
Issue type Domain Status Post Serialization resolved
Attribute Attribute(String,String) Pre
Element detach()
Invariant
Element addAttribute(Attribute) Throws
Element setBaseURI(String)
Pre
BaseURI
Element getBaseURI()
Post
BaseURI
Comment JDOM’s escaping is different from XOM’s resolved XOM allows colonized names in the first argument whereas JDOM does not resolved A root element must always remain attached. resolved XOM throws MultipleParentException if argument is parented whereas JDOM throws IllegalAddException unresolved XOM agressively checks URI for well-formedness and throws accordingly unresolved In XOM the result is absolutized and converted from IRI to URI if needed
issues with regard to the API’s test suite are used without problems in the application. Incidentally, there are even implementations that were not exercised by the API’s test suite but are exercised (and found compliant) by the application’s test suite. (See the numbers in bold face in the table for both of these effects.) 5.3 Discovery of Compliance Issues In the test-driven process of pushing the wrapper towards compliance, one could simply focus on the number of compliant test cases. However, such plain focus would provide little insight into the underlying causes for failing test cases and the actual API mismatch. Also, it would provide no guidance with regard to the prioritization of non-compliant test cases. Instead, test-driven development is to be refined such that non-compliant test cases are incrementally examined and some API method is to be ‘blamed’ to have a compliance issue. Table 7 shows a few samples of documented compliance issues in the study. The format of these entries will be clarified gradually. All discovered issues are recorded by means of method annotations on the wrapper types. As an issue is discovered, a decision must be made whether or not effort is to be spent (immediately) on its resolution. If the issue was discovered through an ambitious test suite for an API, then it may be reasonable to refuse resolution—because the issue is considered either a) less relevant for actual applications, or b) too complicated for an automated approach, calling for a case-by-case migration instead. Table 8 summarizes all resolved and unresolved issues in the study. This relatively small number of issues was indeed discovered incrementally, and about half of the issues remained unresolved, while the ‘application under migration’ is still fully compliant. 5.4 Generic Compliance Issues Compliance issues can be caused by differences in pre-/post-conditions, invariants, and throwing behavior. We call these issues generic in the sense that they are meaningful for APIs of any domain. The following definitions assume two APIs α and α with identical
58
T.T. Bartolomei et al. Table 8. Number of resolved and unresolved XOM/JDOM issues (a) #resolved
(b) #unresolved
Type #Pre #Post #Inv #Throws Attribute 3 1 4 Attribute.Type Builder Comment 2 DocType 1 Document 6 4 Element 5 1 8 Elements Node NodeFactory Nodes ParentNode ProcessingInstruction Serializer Text 2 XPathContext 16 1 2 18
Type #Pre #Post #Inv #Throws Attribute Attribute.Type Builder 5 1 7 Comment 1 DocType 7 1 1 Document 1 Element 3 4 Elements Node NodeFactory 1 Nodes ParentNode ProcessingInstruction Serializer Text 1 XPathContext 1 15 10 1 8
interface. In the wrapping context, α is the original implementation of the source API, whereas α is the wrapper (at a given stage of development). We say that method m has a PRE issue if its pre-condition is stronger in α than in α. If we think of α as the intended replacement of α, then such an issue violates designby-contract rules. The opposite situation also needs to be considered: we also say that m has a PRE issue if its pre-condition is weaker in α than in α. In this case, no violation of design-by-contract rules is present, but α is more (too) permissive than α. In the latter case, the issue can be addressed by adding extra checked assertions to the too permissive implementation. In the former case, a more complex implementation may be needed. Table 7 shows two examples of PRE issues in the study. In fact, the one on Attribute is about a too strong pre-condition (because JDOM rejects colonized names where XOM does not); the one on Element is about a too weak pre-condition (because JDOM checks less for well-formedness than XOM). As it is clear from the table, one of the issues was not resolved—well-formedness checking is particularly difficult to add to JDOM without leading to code bloat and possibly adaption level 4. Likewise, we say that m has a POST issue if its post-condition in α is weaker than the one in α. Further, we say that class c has an INV issue if the invariant of c in α does not imply the one in α. Both kinds of issues violate design-by-contract rules. Yet another kind of generic compliance issue concerns exceptions. We say that m has a THROWS issue if for the case that the implementations α and α agree on whether or not to throw, the thrown exceptions are different (in terms of their types or observable content). This kind of issue happens when source and target APIs use API-specific exception types or differ in the use of reusable exception types. 5.5 Domain-Specific Compliance Issues The generic categories are designed to fully cover all possible compliance issues. In any given API migration project, one may be able to categorize the nature of an issue at the
Study of an API Migration for Two XML APIs
59
domain level. This categorization might help in stating arguments in favor of or against resolving certain issues, based on the given category’s relevance to the application being migrated. In the sequel, we sketch two of the categories of domain-specific issues that we discovered in the study; c.f., Table 7 for illustrations. Serialization. XML can be serialized in different, semantically equivalent ways. In particular, XOM and JDOM may produce serialization results that are equivalent under XML’s infoset semantics but different in terms of string-based comparison. These differences in serialization behavior are hard to neutralize by a wrapper or a transformation, but it is often easy to make applications (and their test cases) robust to such details by applying a sort of canonicalization or refraining from string-based comparison. BaseURI. XOM’s ‘base URI’ handling is considerably more advanced than JDOM’s handling. A full reproduction of XOM’s semantics on top of JDOM would account for complex method implementations. However, base URI handling is rarely used in XML processing code.5
6 Related Work Wrapping is an established technique, in software re-engineering in particular [SM98]; legacy software is often wrapped for use in a new architecture, such as SOA [CFFT08]. We make a contribution to wrapping in so far that we leverage an API-type mapping and classification schemes for method implementations and compliance issues. In the introduction, we already referred to related work on API migration, and our discussion was meant to reveal that all such previous work focused on API evolution in the sense of migrating from one version of an API to the next version. There has been effort to facilitate refactoring in API evolution [HD05, Per05, TDX07, S¸RGA08, DNMJ08]. Some of these approaches use wrapping (adapters) as an implementation technique [S¸RGA08, DNMJ08]. Those wrappers are straightforwardly derived from refactorings; in contrast, our wrappers are the actual representations of relatively heterogeneous API mappings. Several approaches go beyond the limits of refactoring by providing some general means of transformation [CN96, KH98, BTF05, PLHM08]. Again, the showcases for all these approaches concern API evolution or migration between very much similar APIs. For instance, [BTF05] describes a rewriting-based approach for API migration that has been applied to the types Vector and ArrayList of the Java Core API, where the latter type is essentially a ‘careful redesign’ of the former. Nevertheless, the transformation techniques from such previous work are important ingredients of a general approach to API migration. Our efforts to gather metadata about APIs, such as API-type mappings or compliance issues, are well in line with other recent efforts on understanding APIs at an ontology level [RJ08]. We are also inspired by other related uses of metadata in program comprehension, reverse engineering and re-engineering [BCPS05, BGGN08]. 5
Among all of the 43 SourceForge projects that use Subversion as repository and that use XOM, there is apparently only a single project that performs nontrivial base URI handling.
60
T.T. Bartolomei et al.
7 Conclusion We have researched API migration with specific interest in couples of source and target APIs that were developed independently of each other. We have engineered the process of API migration in this context and reported on one study concerning two popular XML APIs of the Java platform. The various differences between the chosen APIs were identified, classified, and measured in a systematic way. Our work shows that API migration for independently developed APIs may be manageable. Despite the many semantical and contractual differences, despite different features and designs, one can construct a reasonably compliant wrapper for API migration in a systematic, incremental, and test-driven manner. The use of a strong test suite for the API and a useful test suite for the application under migration are indeed critical. Our experiments substantiate that a wrapper-based reimplementation of an API may lack full compliance with the API’s test suite, while it can be still fully compliant with the test suite of the application under migration. One area of future work concerns the provision of a more general wrapping technique that can deal with all forms of subtyping, callbacks, and extensions points in APIs (and frameworks). We also need to generalize the described approach by applying it to other domains such as GUI or database programming. Further, we would like to abstract from the low-level approach of specifying API migrations as metadata-annotated wrapper implementations. That is, we seek an appropriate transformation language that can perhaps even be executed in two manners: either as a source-code transformation or as a wrapper generator. Finally, any resolved issue, say for a given method m, adds complexity to the API migration. A wrapper seems to hide that complexity ‘inside’, except perhaps for the implied performance penalty. Worse, the transformation option of API migration incurs the added complexity for every call to m. Hence, it is important to find an effective way of deciding on whether or not a given compliance issue needs to be dealt with for a given source location that calls m. Acknowledgements. This work is partially supported by IBM Centers for Advanced Studies, Toronto.
References [Amb06] [BCPS05]
[BDH+ 09]
[BGGN08]
Ambler, S.W.: The Object-Relational Impedance Mismatch (2006), http://www.agiledata.org/essays/impedanceMismatch.html Bruno, M., Canfora, G., Di Penta, M., Scognamiglio, R.: An Approach to support Web Service Classification and Annotation. In: 2005 IEEE International Conference on e-Technology, e-Commerce, and e-Services (EEE 2005), Proceedings, pp. 138–143. IEEE Computer Society, Los Alamitos (2005) Brunel, J., Doligez, D., Hansen, R.R., Lawall, J.L., Muller, G.: A foundation for flow-based program matching: using temporal logic and model checking. In: Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2009, pp. 114–126. ACM, New York (2009) Br¨uhlmann, A., Gˆırba, T., Greevy, O., Nierstrasz, O.: Enriching Reverse Engineering with Annotations. In: Czarnecki, K., Ober, I., Bruel, J.-M., Uhl, A., V¨olter, M. (eds.) MODELS 2008. LNCS, vol. 5301, pp. 660–674. Springer, Heidelberg (2008)
Study of an API Migration for Two XML APIs [BTF05]
[CFFT08]
[CN96]
[DNMJ08]
[HD05]
[KH98] [KLV05] [LM07]
[Per05]
[PLHM08]
[RJ08]
[SM98]
[S¸RGA08]
[TDX07]
[Tho03]
61
Balaban, I., Tip, F., Fuhrer, R.: Refactoring support for class library migration. In: OOPSLA 2005: Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming, systems, languages, and applications, pp. 265–279. ACM, New York (2005) Canfora, G., Fasolino, A.R., Frattolillo, G., Tramontana, P.: A wrapping approach for migrating legacy system interactive functionalities to Service Oriented Architectures. Journal of Systems and Software 81(4), 463–480 (2008) Chow, K., Notkin, D.: Semi-automatic update of applications in response to library changes. In: ICSM 1996: Proceedings of the 1996 International Conference on Software Maintenance, p. 359. IEEE Computer Society, Los Alamitos (1996) Dig, D., Negara, S., Mohindra, V., Johnson, R.: Reba: refactoring-aware binary adaptation of evolving libraries. In: ICSE 2008: Proceedings of the 30th International Conference on Software Engineering, pp. 441–450. ACM, New York (2008) Henkel, J., Diwan, A.: CatchUp!: capturing and replaying refactorings to support API evolution. In: ICSE 2005: Proceedings of the 27th International Conference on Software Engineering, pp. 274–283. ACM, New York (2005) Keller, R., H¨olzle, U.: Binary component adaptation. In: Jul, E. (ed.) ECOOP 1998. LNCS, vol. 1445, pp. 307–329. Springer, Heidelberg (1998) Klusener, A.S., L¨ammel, R., Verhoef, C.: Architectural modifications to deployed software. Science of Computer Programming 54(2-3), 143–211 (2005) L¨ammel, R., Meijer, E.: Revealing the X/O impedance mismatch (Changing lead into gold). In: Backhouse, R., Gibbons, J., Hinze, R., Jeuring, J. (eds.) SSDGP 2006. LNCS, vol. 4719, pp. 285–367. Springer, Heidelberg (2007) Perkins, J.H.: Automatically generating refactorings to support API evolution. In: PASTE 2005: Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program Analysis for Software Tools and Engineering, pp. 111–114. ACM, New York (2005) Padioleau, Y., Lawall, J.L., Hansen, R.R., Muller, G.: Documenting and automating collateral evolutions in linux device drivers. In: Proceedings of the 2008 EuroSys Conference, pp. 247–260. ACM, New York (2008) Ratiu, D., Juerjens, J.: Evaluating the Reference and Representation of Domain Concepts in APIs. In: 16th International Conference on Program Comprehension (ICPC 2008), pp. 242–247. IEEE Computer Society, Los Alamitos (2008) Sneed, H.M., Majnar, R.: A case study in software wrapping. In: International Conference on Software Maintenance (ICSM 1998), Proceedings, pp. 86–93. IEEE Computer Society, Los Alamitos (1998) S¸avga, I., Rudolf, M., G¨otz, S., Aßmann, U.: Practical refactoring-based framework upgrade. In: GPCE 2008: Proceedings of the 7th international conference on Generative Programming and Component Engineering, pp. 171–180. ACM, New York (2008) Taneja, K., Dig, D., Xie, T.: Automated detection of API refactorings in libraries. In: ASE 2007: Proceedings of the twenty-second IEEE/ACM international conference on Automated Software Engineering, pp. 377–380. ACM, New York (2007) Thomas, D.: The Impedance Imperative: Tuples + Objects + Infosets = Too Much Stuff! Journal of Object Technology 2(5), 7–12 (2003)
Composing Feature Models Mathieu Acher1 , Philippe Collet1 , Philippe Lahire1 , and Robert France2 1
University of Nice Sophia Antipolis, I3S Laboratory (CNRS UMR 6070), 06903 Sophia Antipolis Cedex, France {acher,collet,lahire}@i3s.unice.fr 2 Computer Science Department, Colorado State University, Fort Collins, CO 80523, USA [email protected] Abstract. Feature modeling is a widely used technique in Software Product Line development. Feature models allow stakeholders to describe domain concepts in terms of commonalities and differences within a family of software systems. Developing a complex monolithic feature model can require significant effort and restrict the reusability of a set of features already modeled. We advocate using modeling techniques that support separating and composing concerns to better manage the complexity of developing large feature models. In this paper, we propose a set of composition operators dedicated to feature models. These composition operators enable the development of large feature models by composing smaller feature models which address well-defined concerns. The operators are notably distinguished by their documented capabilities to preserve some significant properties.
1
Introduction
Clements et al. define a software product line (SPL) as "a set of softwareintensive systems that share a common, managed set of features satisfying the specific needs of a particular market segment or mission and that are developed from a common set of core assets in a prescribed way" [1]. SPL engineering involves managing common and variable features of the family during different development phases (requirements, architecture, implementation), to ensure that family instances are correctly configured and derived [2]. In this context, Model-Driven Engineering is gaining more attention as a provider of techniques and tools that can be used to manage the complexity of SPL development. In model-based development of SPLs, feature models (FMs) [3, 4] are widely used to capture SPL requirements in terms of common and variable features. From an early stage (e.g. requirements elicitation) to components and platform modeling, FMs can be applied to any kind of artefacts (code, documentation, models) and at any level of abstraction. As a result, FMs can play a central role in managing variability and product derivation of SPLs (e.g., see [5, 6, 7]).
This work was partially funded by the French ANR TL FAROS project.
M. van den Brand, D. Gašević, J. Gray (Eds.): SLE 2009, LNCS 5969, pp. 62–81, 2010. c Springer-Verlag Berlin Heidelberg 2010
Composing Feature Models
63
Like other model-based approaches, SPL engineering now faces major scalability problems and FMs with thousands of features are not uncommon [8, 9]. Creating and maintaining such large FMs can then be a very complex activity [10, 11, 12, 13, 14, 15]. This problem indicates a need for tools that developers can use to better manage complexity. One way that this can be done is to provide the means to separate the concerns or the business domains in an SPL. Our work focuses on an approach that puts FMs at the center of SPL management. The separation of concerns approach we propose enables stakeholders to manage and maintain FMs that are specific to a business domain, a technological platform or a crosscutting concern. In this paper, we propose generic composition operators to compose FMs in order to produce a new FM. The proposed operators have been determined through a classification of possible manipulations when composing elements of two FMs. This classification is inspired by the similar distinctions made when composing models (introduction, merging, modification, extension) [16]. The proposed insert operator supports different ways of inserting features from a crosscutting FM into a base FM. Depending on the inserted and targeted feature nodes, we determine whether the insertion preserves the set of configurations determined by the input FMs. This preservation property is called the generalization property. We also propose a merge operator that is capable of combining matching features in two input FMs. This operator is defined using the insert operator and similar properties are also determined. The remainder of this paper is organized as follows. Section 2 describes the motivation for separating and composing FMs through an example. Section 3 sets out the rationale behind the design of the proposed composition operators and discusses properties that are used to characterize the provided operators. Section 4 and Section 5 detail the insert and merge operators and illustrate their use on the example presented in Section 2. Section 6 discusses related work. Section 7 describes future work and concludes this paper.
2
Motivation
The plethora of feature definitions [17] suggests that FMs can be used at different stages of the SPL development, from high-level requirements to code implementation. In this paper, FMs are considered from a general perspective in that FMs are not restricted to a specific development phase. As a result, a FM can just as well describe a family of software programs, a family of requirements or a family of models. 2.1
Feature Model
FMs organize a hierarchy of features while explicitly specifying the variability [18]. Features of a FM are nodes of a tree represented by strings and related by various types of edges [19]. The edges are used to progressively decompose features into more detailed subfeatures. (The tree structure starts from the root
64
M. Acher et al.
feature, which is then the parent of its child features and so on.) Some mechanisms are also used to express variabilities in a FM. Hence, a group of child features can form an And -, Xor -, or Or -groups. Features in an And -group can be either mandatory or optional subfeatures of the parent feature. There are some rules to determine whether a FM is well-formed or not. For example, there cannot be an And, Or or Xor -group with only a single child. In Fig. 1, the concept of person is represented as a FM, whose root feature is Person. Information associated to a person includes housing, transport and telephone, which are mandatory features. The transport feature consists of either a car or an other kind of transport. These child features are mutually exclusive and thus are organized in a Xor-group. The housing feature is composed of any combination of an address, a street name or a street number feature. Since their original definition by Kang et al. [3], several FM notations have been proposed [19]. The FM language used throughout this paper supports standard structures previously described, but we do not consider directed acyclic graph structures and do not deal with constraints defined across features, whether they are internal or between several FMs. Nevertheless, taking into account constraints on FM is part of our future work (see Section 7).
Fig. 1. A feature model representing the concept of person
A FM is a representation of a family and describes the set of valid feature combinations. Every member of a family is thus represented by a unique combination of features1 . In the remainder of the paper, a combination of selected features is called a configuration of a FM. A configuration is valid if all features contained in the configuration and the deselection of all other features is allowed by the FM. The validity of a configuration is determined by the semantics of FM that prevents the derivation of illegal configurations. A FM is a characterization of a set of valid configurations. The semantics of a FM can be expressed in terms of the following rules: i) if a feature is selected, its parent must also be selected. The root feature is thus 1
A member of a family can be an “instance”, a “product”, a “program”, etc. All these terms are equivalent. Their uses depend on the kind of family represented.
Composing Feature Models
65
always included in any configuration; ii) If a parent is selected, all the mandatory features of its And group are selected; iii) If a parent is selected, exactly one feature of its Xor-group must be selected; iv) If a parent is selected, at least one feature of its Or-group must be selected (it is also possible to select more than one feature in its Or-group). A valid configuration of the FM depicted in Fig. 1 follows: {P erson, housing, telephone, transport, address, streetN ame, areaCode, car} 2.2
A Running Example
We use the following example to illustrate the FM composition operators described in this paper. The example is complex enough to illustrate composition needs. In Fig. 2, the concept of person is designed from a general perspective and described as a FM. It acts as a base or primary model that may not provide all the elements required by an application or system of a person, that is, it may be augmented with other features describing different aspects of a person. We explain how this base model can be composed incrementally with other FMs describing different aspects of features in the base model. These other FMs are called aspects. Let us take a first aspect called Service Provided, which deals with the services that may be offered to a person and another aspect called Transport, which addresses the kinds of transport that may be used by a person. These two aspects are orthogonal to the concept of person. Furthermore, they are not particularly applied to the concept of person and thus can be composed with other base models, e.g., representing an hotel or a nursing home. Additionally, the concept of person is enriched using two other aspects. The first aspect describes features that provide information about the living Service Provided aspect
Transport aspect
Economical aspect
Living environment aspect
+
General perspective
Fig. 2. Integrating several feature models
Integrated view
66
M. Acher et al.
environment of a person while the second aspect describes features that defined its economic characteristics. These aspects may be considered as different viewpoints that represent the concept of person from the perspective of stakeholder’s interests. Fig. 2 shows the four aspects to be composed with the base FM depicted in Fig. 1. The Service Provided and Transport aspects are orthogonal to the concept of person whereas the Economical and Living Environment aspects are additional facets of the concept of person. 2.3
Requirements
The example presented above highlights the need for compositional operators that can i) add information (e.g. subset features of a FM) to an existing feature, ii) refine some features with more detailed information, and iii) merge the contents of several features. The operators should work at the feature level to enable a modeler to compose only part of a FM with another FM. This should also enable reuse of part of an input FM when creating a larger composed FM. Additionally one may need to reuse more than one part of a FM or the same part several times. One should also be able to preselect some of the features of one aspect before the composition is performed. The running example shown in Fig. 2 illustrates a sequence of introduction and merging of features. These requirements mean that composing two models can correspond to a wide range of situations, from the single use of one operator on the root of two models to be merged, to multiple uses of one or several operators on various features of these aspects. In addition, taking into account the expressiveness of FMs, there are several ways to introduce one feature into another one or to merge them. Previous work has pointed out that dealing with large, monolithic FMs is problematic, in particular, FM maintenance is a difficult process [11, 12, 10]. As in our running example, an appealing approach is rather to use multiple FMs during the SPL development. A first challenge is to allow different stakeholders or software suppliers, at different stages of the software development, to focus on their expertise and integrate their specific concerns. Another challenge is to manage the evolution of FMs [13, 14, 15]. In order to ensure that software products are well maintained, some relevant properties of the models have to be preserved during time. The primary issue of all this work is to define some compositional mechanisms. But, to the best of our knowledge, they do not i) provide a set of composition operators, ii) define the semantics of these operators according to the expressed configurations, iii) propose a systematic technique to implement them.
3
Rationale
In order to meet the requirements above, we first identify some relevant semantic properties regarding composition operators. Then we discuss our main design choices regarding the proposed operators. These operators aim to compose two
Composing Feature Models
67
concerns represented in two FMs. We then distinguish the aspect concern from the base concern. The result of the composition is described according to the set of configurations of the base concern. 3.1
Characterizing the Result of a Compositional Operator
Let f and f be FMs and f and f denote their respective set of configurations. Let op be the operator which transforms a base FM f into f using an aspect FM g. The semantics of the operator op is expressed in terms of the relationship between the configuration sets of the input models (f and g) and the resulting model f (i.e. in terms of the relationship between the configuration sets of f , f and g). In [14], the authors distinguish and classify four FM adaptations2 : a refactoring : no new configurations are added and no existing configurations are removed : f = f ; a specialization : some existing configurations are removed and no new configurations are added : f ⊂ f ; a generalization : new configurations are added and no existing configurations removed : f ⊂ f ; an arbitrary edit : a change that is not a refactoring, a specialization or a generalization. The classification proposed in [14] covers all the changes a designer can produce on a FM and the formalization provided in [14] is a sound basis for reasoning about these changes. We rely on these four categories of FM adaptations in order to characterize the semantics of the insert and merge operators (see Section 4 and 5). 3.2
Main Design Choices
The composition of an aspect and a base concern may correspond either to the single use of the two proposed compositional operators (insert or merge), or to any combination of these two operators. Any of the two compositional operators ensure that the result of a successful composition is a well-formed FM (see Section 2). Scope of an operator. An operator specifies what feature(s) g of the aspect concern is to be composed with features in the base concern, and where (i.e. which feature in the base model f ) it is going to be inserted or merged with3 . All features of the aspect concern not included in the hierarchy starting with g are not involved in the composition process and are not included in its result. 2
3
The author use the term “edits” because the focus seems to be on local edits on FM. An example of edit given in the paper is “moving a feature from one branch to another”. To choose the root feature is equivalent to consider the whole FM.
68
M. Acher et al.
An aspect concern is either strongly or loosely related to the base concern. It can participate to the description of the same concept but can consider another facet of the information (another viewpoint), or its purpose is orthogonal to the concept described in the base concern. For example, the concern dealing with the economical information of a person corresponds to the first case whereas the kind of transport that may be offered in general (i.e. not only to a person) corresponds to the second case. Let us now address how to compose FMs g with f and let us emphasize why both insert and merge are needed. The insert operator makes it possible to specify any applicable FM operators (i.e. And-, Xor-, or Or-groups) to compose g and f . It is more suited to the case of loosely connected aspects. Merge determines the FM operator to be used and it corresponds to the composition of two views of the same concept. Merge is higher level and we show that it may be implemented thanks to the insert operator (see Section 4 and 5). Renaming. When two features are merged, two typical cases may occur: two features with the same name (resp. different names) in both the base and aspect model may not address the same meaning (resp. correspond to the same meaning). We provide an operator rename that allows the user to align the two FMs before composition. For the sake of brevity the renaming operator is not detailed in this paper. Limits. We might have included more operators as it is proposed in several approaches coming from the Aspect-Oriented Modeling community [20]. Mainly they deal with two other kinds of operators : replace and delete. We choose not to do so but not for the same reasons. Instead of proposing a new operator for deleting features in the base model4 , we propose that i) the semantics of merge may rely either on the semantics of the intersection (to only keep the common features) or union (to keep all features) and ii) more generally an operator may perform some deletion according to its semantics and to guarantee that the resulting FM is well-formed. We consider replace only as a special case of merge with some possible renamings before composition.
4
Insert Operator
The insert operator aims at introducing newly created elements into any base element or inserting elements from the aspect model into the base model. For example, a stakeholder can extend the transport feature associated to a Person (left part of Fig. 3(a)) by including the urban transport information, represented in an aspect FM (right part of Fig. 3(a)). The dotted arrow indicates that the feature urbanTransport is inserted below the feature transport; it does not indicate how the feature tree will be inserted (e.g. which variability information will be associated to the feature tree). The 4
According to what had been said at the beginning of the section, there is no need to use such operators for the aspect concern.
Composing Feature Models
(a) Insertion of the Urban transport aspect
69
(b) A possible resulting FM
Fig. 3. Example of insertion of FM
stakeholder needs syntactic mechanisms to define precisely how the insertion is achieved. 4.1
Syntactic Definition
The insert operator is syntactically defined as follows: insert (aspectFeature: Feature, joinpointFeature: Feature, operator: Operator) It takes three arguments: the feature to be inserted (a feature in the aspect model), the targeted feature (a feature in the base model) where the insertion needs to be done, and the operator (e.g. Xor -group) specified by the user. The precondition of the insert operator requires that the intersection between the set of features of the base FM and the one of the aspect FM is empty. This condition preserves the well-formed property of the composed FM which states that each feature’s name is unique. The insert’s parameters allow the stakeholder to control the insertion addressing the three following issues: Where will the aspect FM be inserted into the base FM? The joinpointF eature is a feature of the base FM and describes where the aspectF eature should be inserted into the base FM. What feature(s) of the aspect FM will be inserted into the base FM? The aspectF eature feature is inserted and comes with its child features. If the aspectF eature feature is the root of an aspect FM, the aspect FM is entirely inserted into the base FM. Otherwise only the subtree starting at aspectF eature is inserted. How will the insertion be done? What are the effects on the properties of the composed model? According to the third argument operator (e.g. Xor group) and the group (e.g. Or ) of joinpointF eature in the base FM , it can change the group of the aspectF eature to be inserted. The remainder of this section defines the semantics and the rules to implement it.
70
M. Acher et al.
4.2
Semantics
The semantics of the insert operator is represented by the relationship that exists between the new composed model and the base/primary model, so that it refers to the properties preserved or not by the composed model according to its set of configurations. The insert operator should respect one (or more) properties defined in Section 3.1 (generalization, specialization, refactoring or none of these) considering the composed model and the base model. A stakeholder can thus anticipate the changes to the base model while applying the insertion. Intuitively, if an aspect model is added somewhere in a base model Base, the set of configurations of Base should grow. The new version of Base which results from applying the insert operation can produce a generalization: new configurations are added and no existing configurations are removed. But the situation corresponding to an arbitrary edit may also happen depending on the operator that is passed as parameter of insert : some new configurations are added while some others are removed. The refinement of a FM can indeed alter the existing configurations such as they become deprecated. According to their definition (see Section 3.1), specialization and refactoring are not possible because they correspond to situations that are not compatible with the meaning of an insertion. This simply follows the rationale behind the insert operator, which is to add details and to populate the base model with additional information. In the remainder of this section, Base FM corresponds to the (sub-)tree of the base FM whose root is joinpointF eature while Aspect FM corresponds to the (sub-)tree of the aspect FM whose root is aspectF eature. More formally the semantics of insert is defined as follows: – The set of configurations of the FM after insertion (Result ) is at least the set of configurations of Base FM. This can be expressed as follows: Base ⊂ Result
(I1 )
– or the set of configurations of Result is at least the set of configurations of the cross product of Base and Aspect. This can be expressed as follows: Base ⊗ Aspect ⊆ Result
(I2 )
where the cross product is defined as (A and B being a set of sets): A ⊗ B = {a ∪ b | a ∈ A, b ∈ B} The two relations (I1 ) and (I2 ) define the semantics. The former states that Result FM is a generalization of Base FM. The latter ensures that each configuration of Base FM is supplemented by the features of Aspect FM. The insert operator may, in some situations, respect i) only one of the relation (i.e. (I1 ) or (I2 )) or ii) both of them (i.e. (I1 ) and (I2 )). A supporting tool can easily exploit this information to produce appropriate warnings when an insertion only preserves one relation and thus assist modelers in reasoning during composition. As an example, let us consider the set of configurations of the base FM included in the left part of Fig. 3(a), Base,
Composing Feature Models
71
Base = {{P erson, transport, car} , {P erson, transport, other}} the set of configurations of the aspect FM included in the right part of Fig. 3(a), Aspect, Aspect={{urbanT ransport, bike} , {urbanT ransport, twoW heeledV ehicle}}
and the set of configurations of the composed FM corresponding to an insertion using the Xor operator is described in Fig. 3(b), Result: Result = {{P erson, transport, car} , {P erson, transport, other} , {P erson, transport, urbanT ransport, bike} , {P erson, transport, urbanT ransport, twoW heeledV ehicle}} The relationships between Base, Aspect and Result respect only the relation (I1 ). As a result, the composed FM of Fig. 3(b) is a generalization of the base FM from the left part of Fig. 3(a). 4.3
Rules
In this subsection, we describe rules associated with an insertion. They define when and how the operator passed as an argument preserves (or not) the previously described properties on the base FM. The rules are given on a base model called Base, which has a root feature B and one or several children B1, B2, ..., Bn. The model to be inserted has a root feature A and its child features are A1, A2, ..., An and is called Aspect.
(a) Base FM
(b) Aspect FM
(c) One possible resulting FM
Fig. 4. Rule for insertion of FM
Let us consider the insertion of Aspect (Fig. 4(b)) into the Base (Fig. 4(a)). If the operator passed to insert is an “And with the mandatory status”, the feature A is inserted as a child feature of B with the mandatory status (Fig. 4(c)). For this example, the sets of configurations of Base, Aspect, and Result are: Base = {{B, B1, B2} , {B, B2}} Aspect = {{A, A1} , {A}} Result = {{B, A, B1, B2, A1} , {B, A, B2, A1} , {B, A, B1, B2} , {B, A, B2}}
72
M. Acher et al.
Consequently, the relation (I1 ) does not hold. For instance, {B, B1, B2} is not a member of Result. Nevertheless, the relation (I2 ) is satisfied and the resulting FM is an arbitrary edit to the Base FM. On the contrary, if the stakeholder wants to preserve the (I1 ) property, the feature A should be inserted as a child feature of B with the optional status. Overview of the table of rules. The result of an insertion of a given feature only depends on i) the operator passed as argument of insert and ii) the operator associated to the feature where the insertion is made. All combinations are given in Table 1. We distinguish the cases where no FM operator is associated to a feature of the base FM (it is a leaf) and those where there is either And, Or or Xor operators. Insert may accept the following operators : And with mandatory (resp. optional) sub-features, Or and Xor. The table summarizes the properties that are verified by Result FM for each combination. When “=” is set, this means that the set of configurations of Result FM is strictly equal to Base⊗Aspect. Note that the insertion of one single feature with an Or or Xor operator into a leaf feature is forbidden, as it would generate badly-formed FMs. Nevertheless, this is possible when insertion deals with a set of features of the aspect model (i.e. parameter aspectFeature is a set and not a single feature). Table 1. Insertion rules Base / Operator Leaf And Xor Or
5
And-Mandatory = I2 = I2 = I2 = I2
And-Optional I1 and I2 I2 and I1 I1 and I2 I1 and I2
Xor I1 I1 I1 I1
I1 I1 I1 I1
Or and and and and
I2 I2 I2 I2
Merge Operator
When two FMs share several features and are different views of an aspect of a system, it is necessary to merge the overlapping parts of the two FMs to obtain a single model that presents an integrated view of the system. Let us consider the example of a base FM (left part of Fig. 5(a)). The root feature is the Person feature which has a child feature transport with two alternatives features car and other. The aspect FM (right part of Fig. 5(a)) describes the concept of Person from another perspective. In that case, a person has also the feature meansOfTransport but the set of alternatives is structured in an Or-group, addressing also additional features such as bike, publicService and twoWheeledVehicle. The merge operator can then be used to unify the two viewpoints from the FMs. A mapping can be specified by the stakeholder (e.g. to relate the feature transport of the base FM and the feature meansOfTransport of the aspect FM). More important, the merged FM should verify some properties such as the preservation of configurations. This requires to solve some of the variability issues in each FM. For example, in Fig. 5(a), features car and other cannot be
Composing Feature Models
73
(a) Base and Aspect FMs to be merged
(b) Merged FM
Fig. 5. Merging of two FMs
concurrently selected in the Base FM whereas the selection of both of them is allowed by the Aspect FM. 5.1
Syntactic Definition
The merge operator is syntactically defined as follows: merge (aspectFeature: Feature, baseFeature: Feature, mode: Mode) It takes three arguments: the feature to be merged (a feature of the aspect model), the feature in the base model where the merge is done, and the mode specified by the user. This mode indicates how the merge has to be done in terms of union or intersection of configurations (see below). Like for the insert operator, the merge’s parameters allow the stakeholder to answer the three same questions: Where are the features of the aspect FM and the base FM such as the two FMs match? To merge FMs we thus need to first identify match points (similar to joinpoints in aspect terminology). The stakeholder can thus specify the feature aspectF eature of the aspect FM and the feature baseF eature of the base FM. They are not necessary the root of the FMs.
74
M. Acher et al.
What are the features of the aspect FM and base FM that will appear in the merged model? Two FMs are merged by applying the operator recursively to their subtrees, starting from the match points (aspectF eature and baseF eature). If two features have been merged, the whole process proceeds with their children features. If not, they are inserted as separate child features. The variability information associated to features in the merged model should also be set. How features are merged by the operator? It uses a name-based matching: two features match if and only if they have the same name. If so, they are merged to form a new feature. Features with different names can be bound to each other thanks to an explicit renaming (see Section 3). Finally, a set of rules resolves possible variability mismatches between features of the two FMs according to the mode (i.e. the third argument of the merge operator). 5.2
Semantics
Like for the operator insert, the semantics of merge is defined according to the relationship which exists between the FM resulting from the merging and the two input FMs. It is based on the union or the intersection of the two configuration sets. Union: When transport is merged with meansOfTransport (see Fig. 5), original information from the base model must be preserved while adding information from the aspect model. The set of configurations of the base and aspect FMs should then be preserved in the merged FM. The union of two FMs, Base and Aspect, is a new FM where each configuration that is valid either in Base or Aspect, is also valid. More formally, the result of a merge in the union mode has the following properties: – The set of configurations of the FM after merging (Result ) is at least the set of configurations of Base FM (i.e. Result FM is a generalization or a refactoring of Base FM). This can be expressed as follows: Base ⊆ Result
(M1 )
– The set of configurations of Result is at least the set of configurations of the Aspect FM (i.e. Result FM is a generalization or a refactoring of Aspect FM). This can be expressed as follows: Aspect ⊆ Result
(M2 )
Note that if the relations (M1 ) and (M2 ) are met, the following relationship holds: Base ∪ Aspect ⊆ Result
Composing Feature Models
75
This means, the merged FM may allow some configurations that are not included in the set of configurations of the base or in the one of the aspect FMs. In order to restrict these configurations, we propose to reinforce the constraints on the merged FM with an additional property (see (M3 )). It states that the set of configurations of Result is at least the set of configurations of the cross product of Base and Aspect. This can be expressed as follows: Base ⊗ Aspect ⊆ Result
(M3 )
(M3 ) can hold concurrently with (M1 ) and (M2 ), individually or not at all. Intersection. When transport is merged with meansOfTransport (see Fig. 5), only common information of the base model and the aspect model is retained: The intersection of two FMs, Base and Aspect, is a new FM where each configuration that is valid both in Base and Aspect, is also valid. In the intersection mode, the relationship between the merged FM Result, the base FM Base and the aspect FM Aspect can be expressed as follows: Base Aspect = Result (M4 ) Besides, if the following condition holds: Base Aspect = ∅
(M5 )
the FM Result then defines no configuration at all and can be considered as an inconsistent or an unsatisfiable FM [8]. 5.3
Merging Rules
We now describe rules for merging FMs. These rules aim at resolving variabilities in each FM such as the expected properties are met. For example, in Fig. 5, features car and other do not exhibit the same variability as they belong to a Xor-group in the base FM whereas they belong to an Or-group in the aspect FM. Not surprisingly, the sets of configurations of the base FM and the aspect FM are not the same, and some configurations are valid in the base FM but not valid in the aspect FM. For example, {P erson, meansOf T ransport, car, housing} is only valid in the aspect FM (since the feature housing is included in all its configurations). Yet, the merged FM should be able to express the set of configurations of both FMs. To tackle this issue, we propose i) to make an explicit difference between common and non common features of the two FMs and ii) to (re-)use the insert operator at each step of the merge. As the common features of the two FMs can belong to a different group, a new variability operator has to be chosen in accordance with the intended semantics properties (i.e. merge in the union or
76
M. Acher et al. Table 2. Merge in union mode - relations (M1 ) and (M2 ) are satisfied Base / Aspect And-Mandatory And-Mandatory And-Mandatory And-Optional And-Optional Xor Or Or Or
And-Optional Xor Or And-Optional Or Or And-Optional And-Optional And-Optional And-Optional Xor Or And-Optional Or Or
Table 3. Merge in intersection mode - relation (M4 ) is satisfied Base / Aspect And-Mandatory And-Optional Xor Or
And-Mandatory And-Optional Xor Or And-Mandatory And-Mandatory And-Mandatory And-Mandatory And-Mandatory And-Optional Xor Or And-Mandatory Xor Xor Xor And-mandatory Or Xor Or
Fig. 6. Merging example
intersection mode). We thus propose to organize rules to compute the variability operator into predominance tables. Tables 2 and 3 make the assumption that the same set of features are shared by the base and aspect FMs. In Fig. 6, features car and other are child features of transport. They belong either to a Xor-group in Base FM or to an Or-group in Aspect FM. In this case, the predominant operator is an Or-group, that is, the features car and other can be both selected at the same time (i.e. (M1 ) is respected), or car and other can be selected alone (i.e. (M2 ) is respected). As a result, the relations (M1 ) and (M2 ) truly hold for the merged FM depicted in the left bottom part of Fig. 6. Moreover, the relation (M3 ) holds too.
Composing Feature Models
77
Merging in the intersection mode the features car and other of the aspect FM (which belong to an Or-group) with the features car and other of the base FM (which belong to a Xor-group) gives the predominant operator Xor (see right bottom part of Fig. 6). The relation (M4 ) truly holds. Algorithm 1. Merging algorithm merge (aspectFeature: Feature, baseFeature: Feature, mode: Mode) begin if ¬matching(aspectF eature, baseF eature) then “error” fi new := newF M (newF eature (baseF eature.getN ame() )) predominanceOp := computeOperator (baseF eature, aspectF eature, mode) base := extractChild (baseF eature) aspect := extractChild (aspectF eature) foreach N ∈ (base aspect) do res := merge (aspectF eature :: N, baseF eature :: N, mode) / ∗ recursively ∗ / stackF eatures.push (res) / ∗ pushes the merged f eature ∗ / od / ∗ insert the set of f eatures of the stack ∗ / insertmulti (stackF eatures, new, predominanceOp) / ∗ f ollowing loops are notexecuted in the intersection mode ∗ / foreach N ∈ (base (base aspect)) do insert (N, new.getRoot(), predominanceOp) od foreach N ∈ (aspect (base aspect)) do insert (N, new.getRoot(), predominanceOp) od return new end
We define an algorithm for the merge that implements the principles above (see Algorithm 1). As an illustration, let us consider the merge of the Base Model and the Aspect Model depicted in top of Fig. 6. The merge operator is used with the first parameter “ transport feature” of the base FM, the second parameter “ transport feature” of the aspect FM and the third parameter being the union mode. Algorithm for the merge. First, a new FM is created with one single feature called “transport”, which becomes its root, and acts as a temporary FM where the features of the base and aspect FMs will be incrementally inserted. The predominant operator is computed using the predominance table corresponding to the mode. In the example, we obtain an Or -group with the union table (see bottom left part of Fig. 6). The common features of the two FMs (i.e. car and other) are merged recursively. Then, they are inserted all together with the predominant operator. At this stage, the connection between the transport root feature of the temporary FM and its group of children car and other is an Orgroup. The next step is to insert the non common features urbanTransport and publicService with the Or-operator into the root feature of the temporary FM, transport. The insertion of a feature with an Or-operator into a feature which is
78
M. Acher et al.
connected to its group of children by an Or-group respects (I1 ) and (I2 ). As a result, urbanTransport and publicService also belong to an Or-group. In the intersection mode, the algorithm is executed when the condition (M5 ) does not hold. Only the set of common features are considered. In the example, only the features car and other are merged. The result is depicted in bottom right part of Fig. 6. The predominant operator is the Xor-group.
6
Related Work
Several previous works consider some forms of composition for FMs. Alves et al. motivate the need to manage the evolution of FMs (or more generally of an SPL) and extend the notion of refactoring to FMs [13]. The authors provide a catalog of sound FM refactorings, which has been verified in Alloy by automatically checking properties of resulting FMs [21]. Although their work is focused on refactoring single FMs, they also suggest to use these rules to merge FMs. Our proposal goes further in this direction by providing mechanisms to implement the merge and by clarifying the semantics (as in [14], our terminology is to consider the unidirectional refactoring as a generalization and a bidirectional refactoring as a refactoring). Segura et al. provide a catalogue of visual rules to describe how to merge FMs [15]. The authors emphasize the need to provide a formal semantics to their approach. To the best of our knowledge, their rules implement the merge in the union mode while the the merge in the intersection is not taken into account. Schobbens et al. identify three operations to merge FMs – intersection, union (a.k.a. disjunction) or reduced product of two FMs [19] but do not provide mechanisms to implement the merging. Czarnecki et al. propose to construct FM from propositional formulas and suggest to use their algorithm to merge FMs, but without further detail [22]. Computing the intersection or union at the propositional logic level is not without problems. It is necessary to generate a FM from the new propositional formula and a major issue is then to take additional structuring information into account. In [23], a feature is represented by a FST (Feature Structure Tree), roughly a stripped-down abstract syntax tree. The authors propose to use superimposition to compose features. A FM is a “hierarchy of features with variability” [18] and can be seen as a FST plus variability. As a result, the superimposition mechanism has to be adapted to resolve variabilities mismatch. In SPL engineering, reusable software assets must be composed to derive specific products according to a particular set of features. An approach is to use FMs to specify the variability and then to relate FMs to architectural or design models (e.g. UML models) [6, 24, 7, 5]. A configuration of the FM can correspond to the removal or the activation of some elements of a model [5, 6]. Another option is to associate each feature to some model artefacts which are then inserted in a primary design model [7] or composed together [25,6,24]. Our work focuses strictly on the composition of the variability models, i.e. FMs. Our proposal is not incompatible with the approaches described as the composed FM can be related to other models and thus be used during the derivation process.
Composing Feature Models
79
Aspect-Oriented Modeling (AOM) allows developers to isolate and address separately several aspects of a system by providing techniques to achieve separation and composition of concerns [20]. Existing AOM approaches notably focused on the composition of UML models such as UML class diagrams (e.g. [26]) or UML state and sequence diagrams (e.g. [27]). To the best of our knowledge, no existing approach proposes to compose FMs.
7
Conclusion and Future Work
In this paper, we proposed two main operators to compose feature models (FMs). Each operator is described by stating where it is applied, what features will be composed and how the composition is made. Each composition is defined by rules that formally describe the structure of the resulting FM. Depending on the composed and the targeted features, some properties regarding the expressed set of configurations are made explicit for each operator. A first insert operator enables developers to insert features from a crosscutting FM into a base FM. Each insertion can then be characterized by its ability to preserve or not the set of configurations expressed by the base FM. Building on this operator, the proposed merge operator makes possible to put together features from two separated FMs, when none of the two clearly crosscuts the other. The result is also characterized through the set of expressed configurations, and is parameterized to enable developers to choose between union or intersection of the configurations. The two operators cover different use cases but always ensure the wellformedness of the resulting FM. When using the provided operators, developers can choose to make insertion or merge while preserving the expression of the original set of configurations. This enables them to compose FMs at a large scale. On the contrary, when the need to make more important changes appears, developers can then use all presented forms of insertion and merge, while being aware of whether the original semantics of the base FM is preserved or not. Future work aims at tackling current restrictions and at getting validation of the scalability and usability of the proposed operators. These operators are currently under validation with the construction and usage of a large SPL which is dedicated to medical imaging services on the grid. The services are part of a service-oriented architecture in which data-intensive workflows are built to conduct numerous computations on very large set of images [28, 29]. This SPL is decomposed into several FMs, which are then to be composed using the proposed operators. Moreover, some of the designed FM are planned to be reused in another SPL that deals with video surveillance systems [30]. Some features related to QoS and imaging are likely to be common. The two case studies and SPLs are intended to be complementary and yet different to determine in what sense the merging operators can actually help to scale feature modeling (from the users’ perspective). They can also help to determine whether an arbitrarily decomposed FM can be relevant to all stakeholders or not. Another interest is to quantify the amount of information needed to apply merging operators in order to assess their easiness of use. To achieve these goals, we will raise the limitation
80
M. Acher et al.
on the hierarchy regularity of the composed FMs. Currently the considered FM cannot include any constraints between features, e.g. selecting a feature constrains that another one must be or not be selected. Taking into account such constraints will oblige us to tackle issues on how to reuse consistency checking in a modular way. But as a result, this should also solve some of the scalability issues that FM checking techniques currently face [8, 9].
References 1. Clements, P., Northrop, L.M.: Software Product Lines: Practices and Patterns. Addison-Wesley Professional, Reading (2001) 2. Pohl, K., Böckle, G., van der Linden, F.J.: Software Product Line Engineering: Foundations, Principles and Techniques. Springer, Heidelberg (2005) 3. Kang, K., Cohen, S., Hess, J., Novak, W., Peterson, S.: Feature-Oriented Domain Analysis (FODA) Feasibility Study. Technical Report CMU/SEI-90-TR-21, Software Engineering Institute (November 1990) 4. Czarnecki, K., Eisenecker, U.: Generative Programming: Methods, Tools, and Applications. Addison-Wesley Professional, Reading (2000) 5. Czarnecki, K., Antkiewicz, M.: Mapping features to models: A template approach based on superimposed variants. In: Glück, R., Lowry, M. (eds.) GPCE 2005. LNCS, vol. 3676, pp. 422–437. Springer, Heidelberg (2005) 6. Sanchez, P., Loughran, N., Fuentes, L., Garcia, A.: Engineering languages for specifying Product-Derivation processes in software product lines. In: Software Language Engineering (SLE), pp. 188–207 (2008) 7. Voelter, M., Groher, I.: Product line implementation using aspect-oriented and model-driven software development. In: SPLC 2007: Proceedings of the 11th International Software Product Line Conference, pp. 233–242. IEEE, Los Alamitos (2007) 8. Batory, D., Benavides, D., Ruiz-Cortés, A.: Automated analysis of feature models: Challenges ahead. Communications of the ACM (December 2006) 9. Mendonca, M., Wasowski, A., Czarnecki, K., Cowan, D.: Efficient compilation techniques for large scale feature models. In: GPCE 2008: Proceedings of the 7th international conference on Generative programming and component engineering, pp. 13–22. ACM, New York (2008) 10. Reiser, M.O., Weber, M.: Multi-level feature trees: A pragmatic approach to managing highly complex product families. Requir. Eng. 12(2), 57–75 (2007) 11. Czarnecki, K., Helsen, S., Eisenecker, U.: Staged Configuration through Specialization and Multilevel Configuration of Feature Models. Software Process: Improvement and Practice 10(2), 143–169 (2005) 12. Hartmann, H., Trew, T.: Using feature diagrams with context variability to model multiple product lines for software supply chains. In: SPLC 2008: Proceedings of the 2008 12th International Software Product Line Conference, pp. 12–21. IEEE, Los Alamitos (2008) 13. Alves, V., Gheyi, R., Massoni, T., Kulesza, U., Borba, P., Lucena, C.: Refactoring product lines. In: GPCE 2006: Proceedings of the 5th international conference on Generative programming and component engineering, pp. 201–210. ACM, New York (2006) 14. Thüm, T., Batory, D., Kästner, C.: Reasoning about edits to feature models. In: Proceedings of the 31th International Conference on Software Engineering (ICSE 2009). IEEE Computer Society, Los Alamitos (2009)
Composing Feature Models
81
15. Segura, S., Benavides, D., Ruiz-Cortés, A., Trinidad, P.: Automated merging of feature models using graph transformations. In: Lämmel, R., Visser, J., Saraiva, J. (eds.) Generative and Transformational Techniques in Software Engineering II. LNCS, vol. 5235, pp. 489–505. Springer, Heidelberg (2008) 16. Lahire, P., Morin, B., Vanwormhoudt, G., Gaignard, A., Barais, O., Jézéquel, J.M.: Introducing Variability into Aspect-Oriented Modeling Approaches. In: Engels, G., Opdyke, B., Schmidt, D.C., Weil, F. (eds.) MODELS 2007. LNCS, vol. 4735, pp. 498–513. Springer, Heidelberg (2007) 17. Classen, A., Heymans, P., Schobbens, P.: What’s in a Feature: A Requirements Engineering Perspective. In: Fiadeiro, J.L., Inverardi, P. (eds.) FASE 2008. LNCS, vol. 4961, pp. 16–30. Springer, Heidelberg (2008) 18. Czarnecki, K., Kim, C.H.P., Kalleberg, K.T.: Feature models are views on ontologies. In: SPLC 2006: Proceedings of the 10th International on Software Product Line Conference, pp. 41–51. IEEE Computer Society, Los Alamitos (2006) 19. Schobbens, P.Y., Heymans, P., Trigaux, J.C., Bontemps, Y.: Generic semantics of feature diagrams. Comput. Netw. 51(2), 456–479 (2007) 20. Aspect-Oriented Modeling Workshop Series, http://www.aspect-modeling.org/ 21. Gheyi, R., Massoni, T., Borba, P.: A theory for feature models in alloy. In: Proceedings of First Alloy Workshop, pp. 71–80 (2006) 22. Czarnecki, K., Wasowski, A.: Feature diagrams and logics: There and back again. In: SPLC 2007: Proceedings of the 11th International Software Product Line Conference, pp. 23–34 (2007) 23. Apel, S., Lengauer, C., Möller, B., Kästner, C.: An algebra for features and feature composition. In: Meseguer, J., Roşu, G. (eds.) AMAST 2008. LNCS, vol. 5140, pp. 36–50. Springer, Heidelberg (2008) 24. Perrouin, G., Klein, J., Guelfi, N., Jézéquel, J.M.: Reconciling automation and flexibility in product derivation. In: SPLC 2008: Proceedings of the 2008 12th International Software Product Line Conference, pp. 339–348. IEEE, Los Alamitos (2008) 25. Jayaraman, P.K., Whittle, J., Elkhodary, A.M., Gomaa, H.: Model composition in product lines and feature interaction detection using critical pair analysis. In: Engels, G., Opdyke, B., Schmidt, D.C., Weil, F. (eds.) MODELS 2007. LNCS, vol. 4735, pp. 151–165. Springer, Heidelberg (2007) 26. Reddy, Y.R., Ghosh, S., France, R.B., Straw, G., Bieman, J.M., McEachen, N., Song, E., Georg, G.: Directives for composing aspect-oriented design class models. In: Rashid, A., Aksit, M. (eds.) Transactions on Aspect-Oriented Software Development I. LNCS, vol. 3880, pp. 75–105. Springer, Heidelberg (2006) 27. Kienzle, J., Al Abed, W., Jacques, K.: Aspect-oriented multi-view modeling. In: AOSD 2009: Proceedings of the 8th ACM international conference on Aspectoriented software development, pp. 87–98. ACM, New York (2009) 28. Acher, M., Collet, P., Lahire, P.: Issues in Managing Variability of Medical Imaging Grid Services. In: Olabarriaga, S., Lingrand, D., Montagnat, J. (eds.) MICCAIGrid Workshop (MICCAI-Grid), New York, NY, USA (September 2008) 29. Acher, M., Collet, P., Lahire, P., Montagnat, J.: Imaging Services on the Grid as a Product Line: Requirements and Architecture. In: Service-Oriented Architectures and Software Product Lines - Putting Both Together (SOAPL 2008), associated workshop issue of SPLC 2008. IEEE, Los Alamitos (2008) 30. Acher, M., Lahire, P., Moisan, S., Rigault, J.P.: Tackling High Variability in Video Surveillance Systems through a Model Transformation Approach. In: MiSE 2009: Proceedings of the International Workshop on Modeling in Software Engineering at ICSE 2009, Vancouver, Canada. IEEE Computer Society, Los Alamitos (2009)
VML* – A Family of Languages for Variability Management in Software Product Lines∗ Steffen Zschaler1, Pablo Sánchez2, João Santos3, Mauricio Alférez3, Awais Rashid1, Lidia Fuentes2, Ana Moreira3, João Araújo3, and Uirá Kulesza3 1
Computing Department, Lancaster University, Lancaster, United Kingdom {zschaler,awais}@comp.lancs.ac.uk 2 Dpto. de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain {pablo,lff}@lcc.uma.es 3 Computer Science Department, Universidade Nova de Lisboa, Lisbon, Portugal {jps,mauricio.alferez,amm,ja}@di.fct.unl.pt, [email protected]
Abstract. Managing variability is a challenging issue in software-product-line engineering. A key part of variability management is the ability to express explicitly the relationship between variability models (expressing the variability in the problem space, for example using feature models) and other artefacts of the product line, for example, requirements models and architecture models. Once these relations have been made explicit, they can be used for a number of purposes, most importantly for product derivation, but also for the generation of trace links or for checking the consistency of a product-line architecture. This paper bootstraps techniques from product-line engineering to produce a family of languages for variability management for easing the creation of new members of the family of languages. We show that developing such language families is feasible and demonstrate the flexibility of our language family by applying it to the development of two variability-management languages. Keywords: Software Product Lines, Family of Languages, Domain-specific Languages, Variability Management.
1 Introduction Software Product Lines Engineering (SPLE) is seen as a promising approach to increasing the productivity and quality of software, especially where essentially similar software needs to be provided for a variety of contexts and customers each requiring customizations and variations for their specific conditions [1-2]. In SPLE, features [3] are used to capture commonalities or discriminate among products, i.e. capture variabilities, in an SPL. SPL features are often modelled using feature models [3-4]. Management of variability throughout the product line is a key challenge in SPLE. An important part of variability management is to make explicit the relation between the variability model (e.g., the feature models referred to in the previous ∗
The work reported in this paper was supported by the EC FP7 STREP project AMPLE: Aspect-Oriented Model-Driven Product Line Engineering (www.ample-project.net).
VML* – A Family of Languages for Variability Management in Software Product Lines
83
paragraph) and other models and artefacts of the SPL. Once this relation has been explicitly represented, it can be used for a number of purposes, most importantly to automatically derive product instances based on product-configuration specifications, but also for other purposes such as trace-link generation and consistency checking of SPL models. Due to its relevance, this topic is currently an area of intensive research and a number of approaches have been proposed [5-9]. Initial research focused on using general-purpose model transformations to encode product derivation [10-11]. Later it was argued that this placed too heavy a burden on SPL engineers, as they would now also have to learn the intricacies of model transformations. Consequently, a number of approaches that hide the model transformations from the SPL engineers have recently been developed [6-7, 12]. Czarnecki et al and Heidenreich et al [6-7] propose generic techniques that associate features with arbitrary combinations of model elements and generate a standard model transformation for product derivation from this. In contrast, we have argued before [12] [13] that transformation actions that are specific to the types of models used for describing the SPL are more useful, as they provide a terminology already known to SPL engineers, allow consideration of model semantics in the definition of transformations, and allow avoiding some inconsistencies (e.g., dangling references) in product models by design. This requires new languages to be developed for each type of model that may be used in describing an SPL—a costly and error prone task. To make development of such languages feasible, this paper proposes VML*1, a family of languages—or a language product line—for variability management, showing that developing such languages is a feasible goal. Individual members of the family are described using a domain-specific language (DSL). Based on such a specification, a generator produces the complete infrastructure for the specified language. Such a generative approach has the added benefit of making it easier to support other evaluations beyond product derivation: they can be implemented in additional code generators from the language specification. The key contribution of this paper is, thus, in the domain of software-language engineering, where it applies ideas from SPLE and model-driven development to the development of VML* languages. This enables us to efficiently build new VML* languages for new SPL contexts, and thus improves over our previous work [12], which was limited to copy-and-paste-based reuse, limiting efficiency and increasing error-proneness of language development. A secondary contribution is that this new approach to language development allows us to support additional evaluations for VML* languages, such as generation of trace links or SPL consistency checking. Section 2 further discusses the motivation for building custom languages instead of one generic language and derives a set of challenges to be overcome to enable efficient development of such languages. Section 3 then presents how we applied SPLE techniques to construct a family of languages for variability management and is followed by Sect. 4, which shows how concrete languages have been developed based on our approach. Section 5 reviews some related work and Sect. 6 concludes the paper and points out directions for future work.
1
For Variability Management Languages.
84
S. Zschaler et al.
2 Motivation This section describes the motivation that led to the creation of the VML* family of languages. First, we provide some background on VML languages and then we present the motivation of this paper. 2.1 Managing Variability Using Target-Model–Specific Languages This section explains why we choose to model SPL variability using target-model– specific languages rather than a single generic language. We use as an example an architectural model of a lock control framework for a Smart Home Software Product Line (SPL) [1, 14]. Smart Home applications aim at automating and controlling houses and buildings in order to improve the comfort and security of their inhabitants. The lock control is placed on doors of rooms whose access must be controlled. Several options are available to end users acquiring a specific Smart Home software installation: - Different authentication mechanisms can be used: identification cards, fingerprint scanners or a simple numeric keypad. - Doors are opened manually and users have a time period to authenticate before triggering the alarms. Optionally, it is possible to select a computer-controlled door lock control (Automatic Lock), which will be released upon successful authentication. - Automatic sliding doors can also be used (Door Opener). This option requires that the Automatic Lock control of the door lock be selected.
class [ LockControl ] <> Keypad Reader
<> Fingerprint Reader
<> Card Reader
IAccess
IRegister
<> LockControlMng
IDoor
ILockControl
<> Door Actuator <> Lock Control
IVerify <> KeypadAuth
<> FingerprintAuth
<> CardAuth
Fig. 1. A software architecture for the lock control framework
VML* – A Family of Languages for Variability Management in Software Product Lines
85
Figure 1 depicts a software architectural design for this lock control framework. This architectural design is comprised of three different parts, which are explained in the following. Firstly, variability inherent to the domain is expressed using a feature model [4, 15] (Fig. 1 (top)). This feature model represents variability specification or problem space. It specifies which features of the system are variable and the reasons why. For instance, the AuthenticationDevice to be used is a variable feature because there are several alternative devices available but only one must be selected. AutomaticLock and DoorOpener are variable features because they are options that may be included in a specific lock control application or not. Secondly, once variability has been identified, the software architecture is designed using the component model of UML 2.0 (Fig. 1 (bottom)). This represents variability realization or solution space. The mechanism selected for supporting variability in the architectural design is plugin components. The LockControlMng component is the central component of this architecture. Each alternative for authentication is designed as a pair of plugin components: one for controlling the physical device that serves to authenticate users (e.g. KeypadReader); and the other one encapsulating the logic of the authentication algorithm (e.g. KeypadAuth). These plugin components communicate with the LockControlMng through the IAccess interface, in the case of reader components, and the IVerify interface, in the case of authenticator ones. All plugin components must register in the LockControlMng component using the interface IRegister. The LockControlMng receives data from the reader components and, with the data received, it calls the authenticator component. The latter is in charge of checking if the user has access to the room or not. If the user is authentic, the LockControlMng component invokes the LockControl component, which releases the lock. This invocation is placed only if the automatic lock control option has been selected. If the door is a sliding one, the LockControlMng should also invoke the DoorActuator component for automatic opening of the door. Thirdly, we must specify the links between variability specification and variability design, or problem space and solution space, indicating how the components of the architectural model must be composed according to the selected features. In our case, for instance, when a specific authentication device is selected, the corresponding reader component must be connected to the LockControlMng through the IAccess interface. In the same way, the LockControlMng component must be connected, to the corresponding authenticator component though the IVerify interface. Both the authenticator and the reader components must also be connected to LockControlMng through the IRegister interface. The components corresponding to non selected alternatives must simply be removed. Similarly, the DoorActuator and LockControl components are adequately connected if the corresponding optional features are selected; otherwise, they should be removed. These relationships can be expressed using general purpose model transformation languages, such as demonstrated in [10-11]. Nevertheless, as previously discussed in [10], these have the following shortcomings: - Metamodel Burden. A model transformation language is often based on abstract syntax manipulations. According to Jayaraman et al. [16], “Most model
86
S. Zschaler et al.
Table 1. Part of the VML4Arch Specification for Smart Home 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16
import features <"/SmartHome.fmp">; import core <"/SmartHome.uml">; ... variant for FingerprintScanner { connect("FingerprintReader","LockControlMng","IAccess"); connect("FingerprintReader","LockControlMng","IRegister"); connect("FingerprintAuth","LockControlMng","IRegister"); connect("LockControlMng","FingerprintAuth","IVerify"); } // Fingerprint scanner variant for not (FingerprintScanner) { remove('FingerprintReader'); remove('FingerprintAuth'); } // not FingerprintScanner
developers do not have this knowledge. Therefore, it would be inadvisable to force them to use the abstract syntax of the models”. - Language Overload and Abstraction Mismatch. There are different kinds of model transformation languages [16], and each of them is based on a specific computing model. They range from rule-based languages (e.g. ATL [17]) to expression-based languages (e.g. xTend [18]) and graph-based languages (e.g. AGG [19]). When employing a model transformation language, software product line engineers must also understand the underlying computing style (e.g. rule-based) and learn the language syntax. As a result, software product line engineers are forced to rely on abstractions that might not be naturally part of the abstraction level at which they are working. To overcome these shortcomings, we proposed [12] to create dedicated languages, for specifying product derivation processes; that is, for specifying how features map to software models. These dedicated languages must follow a very basic computation style, where based on a selection of features, small sequence of simple commands are executed. These commands, moreover, must use syntax familiar to the modeler, using concepts of the concrete syntax of the model rather than their abstract syntax. These user-friendly high-level specifications are then translated into a set of low-level general purpose model transformations, which support the automation of the product derivation process. So, the SPL engineer can enjoy the benefits of using model-driven techniques but without paying the associated cost, i.e. without needing to learn the intricacies of model transformation languages. Table 1 provides an example of such a dedicated language for manipulating UML component models. This specification establishes that whenever the Fingerprint option is selected (lines 06-11), the KeypadAuth and KeypadReader components must be connected to the LockControlMng component through the corresponding interfaces, as previously described. The connect operator is an intuitive composition mechanism to specify that two components must be connected using the interface specified as a parameter. The first parameter of the connect operator is the component that requires the interface while the second parameter is the component that provides it. In the case
VML* – A Family of Languages for Variability Management in Software Product Lines
87
where the Fingerprint variant is not selected (lines 03-16), the FingerprintAuth and the KeypadReader components are removed from the architecture, using the remove operator. 2.2 Automating the Generation of New VML Languages Beyond the language from Figure 1, a wide range of languages for managing variability in any kind of target modeling language need to be constructed. For instance, we need to develop a dedicated language with specific operators for managing variability in use cases models, activity models, business process models or any other kind of architectural description language. Developing such languages is cost-intensive and error-prone, especially as so far there is no support for reuse between different such languages beyond a copy-and-paste approach. This is a serious barrier to the adoption of our approach in SPL projects. To make developing such languages feasible, we need to solve the following three challenges: 1. Support of reuse between different languages. The support infrastructure should be easily reused for new languages. Reuse should not be based on copying an existing language implementation and adjusting it, removing unneeded actions and adding new actions. Otherwise, if errors are found and fixed in the infrastructure for one language, these corrections would have to be manually transferred into all other language infrastructures. The same would be true for new features of the infrastructure, for example, new evaluations of specifications other than product derivation. 2. Allow the type of variability models to vary. Different approaches to modelling variability have been proposed: very often, feature trees [4] or cardinality-based feature models [20] are used. However, DSLs have also been used to represent variability [21]. Any variability management language should be easily adapted to any type of variability model. 3. Support for easy customisation of target-model element access. Target-model model elements need to be accessed from a specification based on a textual reference (e.g., their fully qualified name or some pattern matching a number of names). Depending on the target model different forms of such textual references may be useful. The evaluation of such textual references should be implemented separately from the individual actions to allow for easy exchange and customisation of this feature. In this work, we present a generative infrastructure for creating new VML languages for a concrete target model that tackles these issues.
3 The VML* Family of Languages In response to the challenges identified in the previous section, we propose to bootstrap SPLE techniques using a model-driven and generative approach for creating the infrastructure (e.g., parser, editor, evaluation engine) for a specific VML* language. To this end, we have developed the VML* family of languages, which consists of:
88
S. Zschaler et al.
Fig. 2. Common metamodel for VML languages. Variation points have been highlighted in dark grey.
1.
A common metamodel for VML* languages including variation points that can be customised for describing specific VML* languages. This provides the concepts common to all VML* languages. 2. A DSL for specifying the choices a specific language makes for each variation point. 3. A generator-based infrastructure that can instantiate all custom elements of the process from [12] for any VML* language. A working prototype of this system is available as a set of Eclipse plugins [22]. 3.1 A Common Metamodel for VML* Languages Figure 2 shows the general concepts required for expressing variability in product-line models. This metamodel has been developed as a generalisation of the metamodels of VML4Architecture, or simply VML4Arch [12-13] and VML4Requirements, or simply VML4RE [23-24], two variability management languages we have previously developed. VML4Arch is a language for relating feature models and UML2.0 architectural models of an SPL. VML4RE is a language for relating feature models and UML2.0 use case and activity models. These languages have been developed in parallel, but independently. They have a number of differences, but they also share a large number of commonalities, enabling us to derive a common metamodel for VML* languages. The metamodel shown in Figure 2 is independent of both the specific models used for variability modelling (e.g., feature models, domain-specific languages) and the specific target models (e.g., UML, architecture description models, generation workflow models). Consequently, a number of concepts are abstract in this metamodel. To
VML* – A Family of Languages for Variability Management in Software Product Lines
89
apply the metamodel for a specific combination of target model and variability model, these concepts (highlighted in dark grey in Figure 2) need to be specialised (how to specify such specialisations will be discussed in the next section). In the following, we discuss each of the metamodel concepts in more detail. VMLModel. A VML model relates a variability model and a target model, using a set of variants to describe how the target model needs to vary as each of the concerns of the variability model is selected or unselected. VariabilityModel. A variability model is the central artefact in variability modelling. VariabilityModel and Variability Unit serve as adapters to the specific form of variability modelling employed in a specific scenario. Variability Unit. These are the units of variability in variability modelling. A variability model describes what variability units a potential product may have and what constraints govern the selection of combinations of variability units for individual products. From the perspective of variability management, we are mainly interested in the name of a variability unit and whether it has been selected for a specific product configuration. Notice that for the purposes of our metamodel we do not care about how variability units are expressed in a variability model. They may be represented as explicit features in a feature model [4] or more implicitly in a DSL [21], or in any other form that is convenient for modelling variability in a specific project. To enable our metamodel to relate to all these different kinds of representations, we standardise on the common notion of Variability Unit and require adapters that extract these from any of the representations discussed above. TargetModel. Target models describe a product line. There are a large number of potential target models—for example, requirements models, architecture models, or code-generation-workflow models. ModelElement. Model elements represent arbitrary elements of the target model. This concept serves as an adapter to actual model elements and needs to be specialized for each kind of target model (thereby defining the concrete model elements available). The model elements are typed using metaclasses imported from the target metamodel. Variant. A variant describes how the target models must be varied when a certain combination of variability units is selected or unselected. Notice that for product derivation it is sufficient to provide a variant for each non-mandatory variability unit, as we can assume the unvaried target model to represent the model for all the mandatory variability units. For some other evaluations (e.g., trace-link generation), however, a variant must be provided for each variability unit including mandatory ones. Each variant defines two sets of actions for its variability units: a set of onSelect actions defines how to vary the target model when the variability units are selected; a set of onUnSelect actions defines what to do when the variability units are not selected. ConcernExpression. For certain use cases it is not sufficient to map variability units directly onto modifications of the target model, as has also been previously discussed in the literature [6-7]. Therefore, we define variants for so-called concern expressions, logic expressions over variability units. We support And, Or, and Not expressions as well as atomic terms. VariantOrdering. Sometimes the order in which the actions of different variants are executed during product derivation is important, as actions for one variant may rely on model elements created by actions for another variant. VariantOrdering
90
S. Zschaler et al.
provides SPL developers with a means of defining a partial order of execution over variants using pairs of variants. The infrastructure will guarantee that all actions of the first variant in a pair are executed before any action of the second variant of that pair is executed. Action. Actions are used to describe modifications to the target model. These need to be customised for each kind of target model, depending on the kinds of variations that make sense at the level of abstraction the target model covers. For example, if the target model is a use case model, one particular action may be to connect an actor and a use case, while for an architectural model a possible action could be to connect two components. Actions may add, update or remove model elements in the target model and may create, update or remove links between existing or newly added model elements. PointcutExpression. A pointcut expression is an expression that identifies a model element or a set of model elements. It is constructed from atomic designators, pointcut references and combining operators (Not, And, and Or). Pointcut. A pointcut identifies a model element or set of model elements. The model elements are denoted by a pointcut expression. The main purpose of the Pointcut concept is to allow particular pointcut expressions to be named. A named Pointcut can then be reused using a PointcutReference. PCOperator. Operators enable the construction of pointcut expressions combining the set of elements returned from more than one element pointcut. Here, we define only two operators, namely and and or, which represent intersection and union of the sets of model elements of their element expressions, respectively. Designator. A designator is a piece of text that is used to identify a model element or a set of model elements. It may be a name (possibly qualified), a signature, a wildcard expression, or anything else that makes sense in the target model. As resolution of designator text into actual model elements is specific to the target model, the designator concept needs to be customised for each target model. 3.2 A DSL for Specifying Individual VML* Languages To enable succinct description of the specificities of a certain VML* language, we have defined a metamodel and concrete syntax for language-instance description. Figure 3 shows the key concepts. Based on an instance of this metamodel—a VML* language description—we can then generate an appropriate infrastructure customised for that specific VML* language. The individual concepts in the language-description metamodel are: LanguageInstanceModel. The central metaclass of VML* language descriptors, binding together the other parts of a VML* language descriptor. VariabilityModelImport. This provides information about the type of variability model to be supported by the VML* language. The key interface between VML* and a variability model is the set of features defined. The language descriptor, therefore, contains a snippet of model-query code2 that serves as an adapter between the 2
Our prototype uses openArchitectureWare’s (oAW) xTend language to express model queries and model transformations. These xTend snippets can be kept as operations in a separate xTend file and referenced from the language instance descriptor, allowing language designers to take full advantage of oAW’s checking capabilities.
VML* – A Family of Languages for Variability Management in Software Product Lines
91
TargetModelImport 1
LanguageInstanceModel
VariabilityModelImport
ActionDescriptor *
1
1 *
EvaluationAspect
TracingAspect
TransformationAspect
ActionTransformation *
1
ConfigurationImport
Fig. 3. Metamodel for VML* language instance descriptions
variability model and a VML* specification. This snippet is the only place where knowledge about the variability-model metamodel is located in a VML* language descriptor. TargetModelImport. This provides information about the type of target model to be supported by the VML* language. Mainly, this defines how pointcut designators should be evaluated for a specific target model. Depending on the specific kind of target model, different pointcut designators may be required. While, for example, usecase models require only simple qualified names (possibly using wildcards for quantification) to identify individual actors, use cases, or activities, architectural models may additionally require pointcut designators for operation signatures or component provided or required interfaces. Therefore, both the syntax of pointcut designators and their interpretation is specific to the kind of target model. In all VML* languages, pointcut designators are syntactically represented as simple string values. They are then passed to a piece of model-query code interpreting them to return a set of model elements from a given target model. This piece of code is defined for a specific VML* language using TargetModelImport. ActionDescriptor. Each action descriptor provides general syntactic information about one action. This includes the name of the action and the number of parameters it takes. The concrete syntax for action invocation in the generated VML* language will be ‘ (param1, ..., paramn)’. For each parameter, users of the VML* language will be able to provide a pointcut expression. EvaluationAspect. Every evaluation aspect describes one form of evaluation of a VML* specification. The VML* family can be extended with a number of these evaluation aspects (currently only one aspect—product derivation—has been implemented, but we are working on an implementation for trace-link generation and are planning to work on consistency evaluation), which can be supported for every
92
S. Zschaler et al.
concrete VML* language, but not all VML* languages will need support for all evaluation aspects. A VML* language description can, therefore, include only those evaluation aspects that are actually required for this VML* language, providing an additional opportunity for optimisation. Notice that making such a selection manually based on the architecture presented in the previous subsection can be very difficult, as the different evaluation aspects actually overlap in some elements of the architecture (for example, in plugin configuration files). The model-driven approach not only allows a selection of one aspect or another, it additionally allows this selection to be changed flexibly, even experimentally. TransformationAspect. If present, it enables product-derivation for target models. For each ActionDescriptor this defines an ActionTransformation specifying the model transformation encapsulated by this action. Furthermore, a ConfigurationImport defines an adapter for configuration models. ConfigurationImport. For the construction of models for specific products, the VML* infrastructure requires access to the set of features selected in a specific product configuration. To avoid polluting the VML* infrastructure with knowledge about the inner structure of product configurations, ConfigurationImport provides a snippet of model-query code that serves as an adapter to product-configuration specifications by extracting the set of selected features from a product configuration. ActionTransformation. Provides additional information for an action pertaining to the transformation of target models by this action. For every ActionDescriptor there needs to be a corresponding ActionTransformation instance. In particular, this includes a snippet of model-transformation code that implements the action. In this code, the parameters can be referenced as ‘param1’ thru ‘paramn’. The type of each parameter is defined in the ActionTransformation. TracingAspect. If present, it enables the generation of trace links from a VML specification. Such trace links connect selected features and added or removed model elements of the target model. The tracing aspect is specified by naming the modeltransformation operations that create or remove model elements; wildcards may be used to provide these names. VML* will then generate an aspect for the model transformation that advises these operations and creates appropriate trace links using the AMPLE Tracing Framework (ATF) [25]. 3.3 Generation of VML* Language Infrastructure Instances of this metamodel can be defined using a textual concrete syntax. Table 2 shows an excerpt of the language descriptor for VML4RE (cf. Sect. 4). Mapping this concrete syntax to the abstract syntax discussed above is rather straightforward so that we will not discuss it in any more detail here. It is worth noting, though, that this language descriptor does not contain complicated model-transformation code; all that is specified are the names of some functions. These functions with the actual modeltransformation code are contained in an external file3, allowing standard editors and error highlighting to be used when writing the code. Including the fully qualified
3
An oAW xTend file for our prototype.
VML* – A Family of Languages for Variability Management in Software Product Lines Table 2. Excerpt from the language descriptor for VML4RE 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
vml instance vml4req { // Define a new language called vml4req // This section defines the type of variability model and // how to access it features { metamodel "/bin/fmp.ecore" // Extracts all variability units from a variability model function "getAllFeatures" } // This section defines the type of target model and how to // access it target model { metamodel "UML2" type "uml::Package" // Metamodel type of a model // Function to interpret pointcut designators function "dereferenceElement" } // Importing plugins and external specifications bundles: "unl.vml4req", "ca.uwaterloo.gp.fmp", ... extensions: "unl::vml4req::library::umlUtil" // Syntactical definition of available actions actions: createInclude { params "List[uml::UseCase]" "List[uml::UseCase]" } insertUseCase { params "String" "uml::Package" } ... // Definition of available evaluation aspects aspects: transformation { // Evaluation for product derivation // Defines adapter for product-configuration access features { type "String" function "getAllSelectedFeatures" } // Definition of the semantics of actions as // model transformations createInclude { function "createIncludes" } insertUseCase { function "createUseCase" } ... } }
93
94
S. Zschaler et al.
name of the external file in the list after the “extensions” keyword ensures that the extension can be accessed from all relevant places in the generated code. Similarly, the “bundles” keyword lists other plugins that should be made available to any generated plugins. Here we include the plugin project containing our extension and the FMP plugin [26] providing support for cardinality-based feature models. Furthermore, we have developed a generator that takes language descriptors such as shown in Table 2 and generates a set of Eclipse plugins containing the infrastructure for this language. The operational prototype can be obtained from [27]. The code generated by this generator is based on the work previously presented in [12]. The generation is completely automatic; the only manual input provided by language developers is the language instance descriptor and the implementations of the actions provided in a separate file. The complete infrastructure for editing, compiling, and executing specifications of the new VML language is encapsulated in the generator and can, thus, be reused for each new language.
4 Example Languages from the VML* Family We have re-implemented both VML4Arch and VML4RE based on our new infrastructure. As VML4Arch has already been discussed extensively in [12], here we will focus on VML4RE. For VML4Arch we will only give a brief discussion of what needed to be changed to make it compatible with VML*. Both implementations can be downloaded from [24]. 4.1 VML4RE Requirements are most recurrently documented in a multi-view fashion [28-29]. Their description is typically based on considerably heterogeneous languages, such as use cases, activity diagrams, goal models, and natural language. Initial work on compositional approaches for early development artefacts does not clearly define composition operators for combining common and varying requirements based on different views or models. Therefore, a key problem in SPLE remains how to specify and apply the composition of elements defined in separated and heterogeneous requirements models. With the Variability Modelling Language for Requirements (VML4RE) [23] we propose an initial solution for this problem by introducing a new requirements composition language for SPLs. VML4RE is a textual language with two main goals: (i) to support the definition of relations between SPL features expressed in feature models and requirements expressed in multiple views (based on a number of UML diagram types, such as use case diagrams and activity diagrams); and (ii) to specify the compositions of requirements models for specific products of a SPL. VML4RE supports composition operators for UML use cases and activity models. It has been applied to case studies in domains such as home automation [23] and Mobile Applications [30]. It has shown great flexibility to specify composition rules and references to different kinds of elements in heterogeneous requirements models. The results of these experiments are encouraging and comparable with other approaches that support semi-automatic generation of trace-links relationships and composition between model elements in SPLs.
VML* – A Family of Languages for Variability Management in Software Product Lines
95
Table 3. Selected VML4RE actions for Use Case Models
Description A new use case named name is inserted into package p. A new package named name is inserted into package p. A new connection is created between each of the actors and each of the use cases. A new <> dependency is created between each of the source use cases and each of the target use cases.
Table 3 shows an overview of some of the available actions of the VML4RE language for use cases. A more complete list can be found in [23]. VML4RE provides another set of actions for activity models, which are not shown here due to space restrictions. Table 2 shows an excerpt from the language descriptor for VML4RE. It has been defined to map from feature models expressed using the FMP metamodel [26] to UML2 use case and activity models. This is expressed in the two sections named ‘features’ and ‘target model’, respectively, which also reference the functions to adapt to the feature model and to dereference pointcut designators in the target model. The real dereferencing code is implemented in the extension referenced through the ‘extensions’ keyword. The full language descriptor also specifies a tracing aspect. This is not shown in Table 2 for lack of space. Table 4. Part of the VML4RE Specification for Smart Home 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20
Finally, Table 4 shows an excerpt of a VML4RE specification for the Smart Home case study [23]. Lines 7 to 20 show the additional use cases needed when the Security feature is selected in a product configuration. Notice the use of wildcards on Line 13 to select all use cases in a package. If, and what, wildcards are supported and how they are evaluated is defined in the dereferenceElement operation invoked from the language instance descriptor in Table 2 on Line 16. Further, notice the use of a slightly more complex pointcut expression on Lines 16 to 19 of Table 4. This pointcut expression results in a set of two use cases: Notification::SendSecurityNotification and WindowsManagement::OpenAndCloseWindowsAutomatically. 4.2 VML4Arch Re-implementing VML4Arch based on the VML* infrastructure proved surprisingly easy. However, as any product line requires a certain amount of stream-lining between individual products to maximise reuse, there were some minor adjustments we had to make to fit VML4Arch into the family of languages. These adjustments, however did not affect the functionality provided by VML4Arch. In detail, we had to: • Adjust the syntax of some VML4Arch operators. VML4Arch originally had some operators like connect c1, c2 using interface i, which used a concrete syntax slightly different from the standard concrete syntax for VML* operators. We had to adjust the concrete syntax of these operators to fit the standard scheme generated by VML*. For example, the connect operator from above now is expressed as connect (c1, c2, i). • Extend some operator definitions to allow for the use of pointcut expressions as parameters. VML4Arch originally used direct references to model elements rather than pointcut expressions. This meant that we had to modify some of the operator definitions so that they would be able to deal with receiving sets of model elements as parameters rather than individual model elements only.
5 Related Work The work presented in this paper is related to work in two areas of research: systematic development of families of languages and support for variability management in SPLE. As the main focus of this paper is on constructing a family of languages, we will begin by discussing literature from this area. A number of research projects—for example, CAFÉ, Families, or ESAPS—have explored the notion of software system families (or product lines). In this work, we are extending these ideas to families of software languages, specifically for the case of VML languages. Families of languages have been presented in the research literature for a range of domains: Voelter presents an approach for a family of languages for architecture design at different levels of abstraction [31], Akehurst et al. [32] present a redesign of the Object Constraint Language as a family of languages of different complexity, Visser et al. [33] present WebDSL, a family of interoperating languages for the design
VML* – A Family of Languages for Variability Management in Software Product Lines
97
of web applications. All approaches, including ours presented in this paper, use very different kinds of technologies for their specific case: Voelter uses conditional compilation to construct an appropriate infrastructure, Akehurst et al. use a special parser technology that enables modular language specification, Visser et al. use rewriting of abstract syntax trees and our approach generates a monolithic infrastructure for each language. Equally, all approaches focus on different purposes of the language family: the different members of the family presented by Voelter are architectural languages at different levels of abstraction. The family presented by Akehurst et al. modularises different features of the OCL language, so that specific languages can be constructed as required for a project. WebDSL is a set of interoperating languages with purposes ranging from data modelling to workflow specification. The family of languages presented in our paper consists of languages that share a common set of core concepts, but adapt these to different languages with which they interface. At this point, an overview of the different potential uses of families of languages begins to emerge. What is needed next, is research into systematic development of such language families beyond individual examples. Ziadi et al. [10] and Botterweck et al. [11] both propose the implementation of product derivation processes as model transformations. Their proposal relies on the realization of product derivations via a model transformation language. This strategy requires SPL engineers to deal with low-level details of model transformation languages. Our approach provides syntax and abstractions familiar to the SPL engineers. This eliminates the burden of understanding the intricacies associated with model transformation languages and metamodels. A VML* specification is automatically compiled into an implementation of the product derivation process in a model transformation language, but SPL engineers need not be aware of this generation process. In [12] we have presented a process for developing variability management languages. The structure of these languages has some similarities to the languages developed using VML*, in fact VML4Arch was previously developed based on this process. However, focusing on process rather than infrastructure, [12] falls short of solving the issues discussed in the introduction. In particular, reuse between individual languages is only possible based on a copy-and-paste approach, variability-model and target-model access are closely intertwined with the other infrastructure code making it difficult to modify them independently. In contrast, in this paper we have presented an infrastructure, which tackles all of these issues. The code generated for a specific VML* language is partially based on code developed for VML4Arch following the process from [12]. Czarnecki and Antkiewicz [6] present an approach similar to ours based on using feature models to model variability. They create a template model, which models all products in the product line. Elements of this model are annotated with so-called presence conditions. Given a specific configuration, each presence condition evaluates to true or false. If a presence condition evaluates to false, its associated model elements are removed from the model. Thus, such a template-based approach is specific to negative variability, which might be critical when a large number of variations affect a single diagram. Our approach can also support positive variability by means of actions such as connect or merge. Moreover, presence conditions imply introducing
98
S. Zschaler et al.
annotations into the SPL model. Therefore, the actions associated with a feature selection are scattered across the model, which could also lead to scalability problems. In our approach, they are well-encapsulated in a VML* specification, where each variant specifies the actions to be executed. FeatureMapper [7] is another approach similar to that of Czarnecki and Antkiewicz and our approach, avoiding the pollution of the SPL model with variability annotations. FeatureMapper is generic for all EMF-based models and generically integrates into GMF-based editors. In contrast, our approach is based on languages that are specific to a kind of feature model and a kind of target model. Genericity is achieved through a generative approach to creating the infrastructure for these languages from a set of common core concepts. The actual variability model in FeatureMapper is created implicitly by the designer selecting model elements in an editor and associating them with so-called feature expressions determining when the model element should be present in a product model. Negative variability is easily supported by this approach, as model elements can be easily removed if their feature expression is not satisfied by a specific configuration. Positive variability is more difficult to implement: instead of mapping features to target model elements, they need to be mapped to elements of a model transformation, again requiring SPL designers to have sufficiently detailed knowledge of that model-transformation language and the metamodels involved. In contrast, in our approach, designers of a specific VML* language can provide powerful actions that can support both negative and positive variability (or any mixture of the two) in a systematic manner. Finally, Haugen et al. [34] define the common variability language (CVL), which is a generic extension to DSLs for expressing variability. It provides three generic operators, but using these to express variability can lead to comparatively complex models. On the flip side, a VML* language is potentially less flexible than the two other approaches discussed in this paragraph, as it can only support the variability mechanisms for which a corresponding action has been defined. A completely different approach to SPLE is followed in the feature-oriented software development community. Here, features are directly related to separate modules implementing each feature, where these feature modules can be understood as program or model transformations (e.g., [35]). This implies that no mapping from features to target models is required. Instead, the programming or modelling language must be sufficiently powerful to support modularizing of features as coherent wellencapsulated units of compositions. In another publication [36], we have presented a feature-oriented approach towards SPL development. In this context, we also noted that a pure feature-oriented approach can lead to a large number of small feature modules negatively impacting scalability and comprehensibility of the approach, especially where features are often associated with small-grain changes to the architecture or implementation. Thus, for such cases, an approach with an explicit mapping may be beneficial. Generally, all SPL approaches face the problem of ascertaining that only consistent and well-formed product models and implementations can be constructed. This problem becomes even worse when several interconnected types of models representing different views of the system are used—for example, activity diagrams and class diagrams. As a consequence, there is a need to analyse the changes of each view and
VML* – A Family of Languages for Variability Management in Software Product Lines
99
the inconsistencies that these may cause with other views when instantiating a product model. In our work on VML*, we have not discussed this issue so far, but some previous work on this topic exists from other groups—for example, [37-38].
6 Conclusions This paper presented a generative approach to building a family of languages for specifying the relationship between variability models and other models in softwareproduct-line engineering. Our experience shows that the proposed infrastructure is powerful enough to support generating different language instances (in addition to the two languages presented here, we are currently developing VML* languages for mapping to openArchitectureWare workflows as well as a number of project-specific DSLs) and that it can reduce the effort required to learn about the support infrastructure for such languages. Specifically, regarding the challenges we identified in Sect. 2.2, our generative approach to the family of VML* languages provides the following solutions: reuse is substantially improved over a copy-and-paste approach as all reusable parts of the infrastructure are encoded in the generator and all variable parts are explicitly configured through language descriptors (Challenge 1). Because all dependencies on varying variability and target models have been made explicit in the language descriptor, model access code could be completely disentangled from the actual model manipulation code (Challenges 2 and 3). In implementing our prototype, we identified a need for aspect-oriented code generation beyond what is offered by current code-generation engines. Our system is structured such that the code generators for the basic VML* infrastructure and for each evaluation aspect are kept in separate modules. This is sensible because evaluation aspects can be included or excluded from a specific VML* language as required. For some files generated (for example, for plugin descriptors contained in plugin.xml files) there is a conflict between code generators for the evaluation aspects: each evaluation aspect needs to contribute to the final contents of the file. Using separate code-generation templates for each evaluation aspect would result in a file containing only the contributions from one evaluation aspect. Aspect-oriented code generation could provide a solution here: it effectively allows the results of two or more different generators to be merged into one output file. However, all current aspect-oriented code generators [18, 39] only support asymmetric aspect orientation. This requires one template to be declared as the base template while the other templates are aspect templates. These aspect templates can then manipulate generation rules in the base template, providing before, after, and around advice for code generation. For our purposes this is not appropriate; because evaluation aspects may be included or excluded as required, we cannot rely on any one of them being present. Consequently, no template defined for an evaluation aspect can be made into the base template. As the basic VML* generator does not provide a template for plugin.xml, this can also not be designated as the base template. For our prototype, this problem has been solved by breaking the encapsulation of evaluation-aspect
100
S. Zschaler et al.
code generators in a controlled way. However, a cleaner solution using a more symmetric approach to aspect-oriented code generation remains for future work.
References [1] Pohl, K., et al.: Software Product Line Engineering: Foundations, Principles and Techniques. Springer, Berlin (2005) [2] Clements, P., Northrop, L.M.: Software Product Lines: Practices and Patterns. AddisonWesley, Boston (2002) [3] Czarnecki, K., Eisenecker, U.W.: Generative Programming: Methods, Tools, and Applications. ACM Press/Addison-Wesley Publishing Co. (2000) [4] Kang, K., et al.: Feature-Oriented Domain Analysis (FODA) Feasibility Study. Software Engineering Institute, Technical report, CMU/SEI-90-TR-0211990 [5] Alférez, M., et al.: A Model-Driven Approach for Software Product Lines Requirements Engineering. In: Proceedings of the 20th International Conference on Software Engineering and Knowledge Engineering, San Francisco Bay, USA, July 2008, pp. 779–784 (2008) [6] Czarnecki, K., Antkiewicz, M.: Mapping Features to Models: A Template Approach Based on Superimposed Variants. In: Glück, R., Lowry, M. (eds.) GPCE 2005. LNCS, vol. 3676, pp. 422–437. Springer, Heidelberg (2005) [7] Heidenreich, F., et al.: FeatureMapper: mapping features to models. Presented at the Companion of the 30th international conference on Software engineering, Leipzig, Germany (2008) [8] Batory, D., Azanza, M., Saraiva, J.: The Objects and Arrows of Computational Design. In: Czarnecki, K., Ober, I., Bruel, J.-M., Uhl, A., Völter, M. (eds.) MODELS 2008. LNCS, vol. 5301, pp. 1–20. Springer, Heidelberg (2008) [9] Soares, S., et al.: Supporting software product lines development: FLiP - product line derivation tool. Presented at the Companion to the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications, Nashville, TN, USA (2008) [10] Ziadi, T., Jézéquel, J.M.: Software Product Line Engineering with the UML: Deriving Products. In: Software Product Lines 2006, pp. 557–588 (2006) [11] Botterweck, G., et al.: Model-Driven Derivation of Product Architectures. In: Proceedings of the 22nd International Conference on Automated Software Engineering (ASE), Atlanta, Georgia, USA, November 2007, pp. 469–472 (2007) [12] Sánchez, P., et al.: Engineering Languages for Specifying Product-Derivation Processes in Software Product Lines. Presented at the Software Language Engineering 2008, Toulouse, France (2008) [13] Loughran, N., Sánchez, P., Garcia, A., Fuentes, L.: Language Support for Managing Variability in Architectural Models. In: Pautasso, C., Tanter, É. (eds.) SC 2008. LNCS, vol. 4954, pp. 36–51. Springer, Heidelberg (2008) [14] Voelter, M., Groher, I.: Product Line Implementation using Aspect-Oriented and ModelDriven Software Development. In: Procceedings of the 11th International Software Product Line Conference (SPLC), Kyoto, Japan, September 2007, pp. 233–242 (2007) [15] Czarnecki, K., Eisenecker, U.W.: Generative Programming: Methods, Tools, and Applications. Addison-Wesley, Reading (2000)
VML* – A Family of Languages for Variability Management in Software Product Lines
101
[16] Jayaraman, P., Whittle, J., Elkhodary, A.M., Gomaa, H.: Model Composition in Product Lines and Feature Interaction Detection Using Critical Pair Analysis. In: Engels, G., Opdyke, B., Schmidt, D.C., Weil, F. (eds.) MODELS 2007. LNCS, vol. 4735, pp. 151–165. Springer, Heidelberg (2007) [17] Jouault, F., Kurtev, I.: Transforming Models with ATL. In: Bruel, J.-M. (ed.) MoDELS 2005. LNCS, vol. 3844, pp. 128–138. Springer, Heidelberg (2006) [18] OpenArchitectureWare, http://www.openarchitectureware.org/ [19] Taentzer, G.: AGG: A Graph Transformation Environment for Modeling and Validation of Software. In: Pfaltz, J.L., Nagl, M., Böhlen, B. (eds.) AGTIVE 2003. LNCS, vol. 3062, pp. 446–453. Springer, Heidelberg (2004) [20] Czarnecki, K., Helsen, S., Eisenecker, U.W.: Staged Configuration Using Feature Models. In: Nord, R.L. (ed.) SPLC 2004. LNCS, vol. 3154, pp. 266–283. Springer, Heidelberg (2004) [21] Volter, M., Stahl, T.: Model-Driven Software Development. Wiley, Glasgow (2006) [22] VML* Download, http://www.steffen-zschaler.de/publications/vmlstar/ [23] Alférez, M., et al.: A Metamodel for Aspectual Requirements Modelling and Composition, AMPLE D1.3 (2007), http://ample.holos.pt/gest_cnt_upload/editor/File/public/ AMPLE_WP1_D13.pdf [24] Alférez, M., et al.: Multi-View Composition Language for Software Product Line Requirements. In: Proceedings of the 2nd Int. Conference on Software Language Engineering (SLE), Denver, USA (2009) [25] Sousa, A.: AMPLE Traceability Framework Frontend Manual (2008), http://ample.di.fct.unl.pt/Front-End_Framework/ ATF%20Front-end%20Manual.pdf [26] Generative Software Development Group, U. Waterloo, Feature Modelling Plugin (FMP) for Eclipse, http://gsd.uwaterloo.ca/projects/fmp-plugin/ [27] VML* Download (2009) [28] Kotonya, G., Sommerville, I.: Requirements Engineering: Processes and Techniques. John Wiley, Chichester (1998) [29] Ian, S., Pete, S.: Requirements Engineering: A Good Practice Guide. John Wiley and Sons, Chichester (1997) [30] Young, T.: Using AspectJ to Build a Software Product Line for Mobile Devices. University of Waterloo (2005), http://www.cs.ubc.ca/grads/resources/thesis/Nov05/ Trevor_Young.pdf [31] Voelter, M.: A Family of Languages for Architecture Description. Presented at the Conference on Object-Oriented Programming, Systems, Languages, Orlando, Florida (2008) [32] Akehurst, D.H., et al.: Supporting OCL as part of a Family of Languages. In: Proceedings of the MoDELS 2005 Conference Workshop on Tool Support for OCL and Related Formalisms - Needs and Trends (2005) [33] Visser, E.: WebDSL: A Case Study in Domain-Specific Language Engineering. In: Lämmel, R., Visser, J., Saraiva, J. (eds.) Generative and Transformational Techniques in Software Engineering II. LNCS, vol. 5235, pp. 291–373. Springer, Heidelberg (2008) [34] Haugen, Ø., et al.: Adding Standardized Variability to Domain Specific Languages. In: Proceedings of the Conference on Software Product Lines (SPLC 2008), pp. 139–148 (2008)
102
S. Zschaler et al.
[35] Batory, D., et al.: Scaling Step-Wise Refinement. IEEE Transactions on Software Engineering, 355–371 (2003) [36] Fuentes, L., et al.: Feature-Oriented Model-Driven Software Product Lines: The TENTE approach. In: Proceedings of the Forum of the 21st International Conference on Advanced Information Systems (CAiSE), Amsterdam, The Netherlands (2009) [37] Thaker, S., et al.: Safe Composition of Product Lines. In: Proceedings of the 6th International Conference on Generative Programming and Component Engineering (GPCE), Salzburg, Austria, pp. 95–104 (2007) [38] Janota, M., Botterweck, G.: Formal Approach to Integrating Feature and Architecture Models. In: Fiadeiro, J.L., Inverardi, P. (eds.) FASE 2008. LNCS, vol. 4961, pp. 31–45. Springer, Heidelberg (2008) [39] MOFScript, http://www.eclipse.org/gmt/mofscript/
Multi-view Composition Language for Software Product Line Requirements Mauricio Alférez1, João Santos1, Ana Moreira1, Alessandro Garcia2, Uirá Kulesza1, João Araújo1, and Vasco Amaral1 1 New University of Lisbon, Caparica, Portugal Pontifical Catholic University of Rio de Janeiro, Brazil {mauricio.alferez,joao.santos,amm,uira,ja, vasco.amaral}@di.fct.unl.pt [email protected] 2
Abstract. Composition of requirements models in Software Product Line (SPL) development enables stakeholders to derive the requirements of target software products and, very important, to reason about them. Given the growing complexity of SPL development and the various stakeholders involved, their requirements are often specified from heterogeneous, partial views. However, existing requirements composition languages are very limited to generate specific requirements views for SPL products. They do not provide specialized composition rules for referencing and composing elements in recurring requirements models, such as use cases and activity models. This paper presents a multi-view composition language for SPL requirements, the Variability Modeling Language for Requirements (VML4RE). This language describes how requirements elements expressed in different models should be composed to generate a specific SPL product. The use of VML4RE is illustrated with UMLbased requirements models defined for a home automation SPL case study. The language is evaluated with additional case studies from different application domains, such as mobile phones and sales management. Keywords: Requirements Engineering, Software Product Lines, Variability Management, Composition Languages, Requirements Reuse.
Model-based development methods for SPLs [2, 5, 6] support the construction of different models to provide a better understanding of each SPL feature. However, features, which are modeled separately in partial views, must be composed to show the requirements of the target applications. Composing variable and common requirements is a challenging task. Requirements are the early software artifacts most frequently documented in a multi-view fashion. Their description is typically based on significantly heterogeneous languages, such as use cases [7] (coarse-grained operational view), interaction diagrams (fine-grained operational view) [8], goal models (intentional and quality view) [9, 10], and natural language. This varied list of requirements models is a direct consequence of requirements having to be understood by stakeholders with different backgrounds, from customers of specific products to SPL architects, programmers and testers. However, initial work on compositional approaches [2, 5, 6, 11] for requirements artifacts is rather limited in language support. These approaches do not offer composition operators for combining common and varying requirements based on different partial views. They are also often of limited scope and expressiveness [11]. Therefore, a key problem in SPL remains to be addressed: how to compose elements defined in separated and heterogeneous requirements models using a simple set of operators? This paper answers this question by proposing the Variability Modeling Language for Requirements (VML4RE), a requirements composition language for SPLs. VML4RE has two main goals: (i) to support the definition of relationships between SPL features expressed in feature models and requirements expressed in multiple models; and (ii) to specify the composition of requirements models for deriving specific SPL products using a simple set of operators. VML4RE provides a set of specialized operators for referencing and composing requirements elements of specific types, based on recognizable abstractions used in the domain of each requirements modeling notation or technique. Such operators can help SPL engineers to understand and choose the composition rules for requirements models. In contrast with conventional and general-purpose languages for model transformation, such as XTend [12], ATL [13] and AGG [14], VML4RE is tailored to requirements composition in a way that is accessible to requirements engineers. This is an important contribution of our work, as it addresses the problem of abstraction mismatch caused by such general-purpose model transformation languages [15, 16]. VML4RE prevent SPL designers from the burden of language intricacies that are not part of the abstraction level at which they are used to work. The remainder of this paper is organized as follows. Section 2 presents a set of criteria used when creating the requirements variability composition language. Section 3 describes a case study that is later used to illustrate the VML4RE composition language and creates an example specification. Section 4 presents VML4RE and Section 5 discusses its application to the case study. Section 6 presents the evaluation of the language and discusses its benefits and limitations. Section 7 examines related work and compares it with ours. Finally, Section 8 concludes the paper and points directions to future work.
Multi-view Composition Language for Software Product Line Requirements
105
2 Criteria to Design VML4RE SPL Requirements Engineering handles both common and variable requirements that enable the derivation of customized products of the family. Feature models are used to specify SPL commonalities and variabilities and feature model configurations are used as a driver during the process to derive product-specific requirements models. Requirements variability composition is the ability to customize requirements models for specific family products. The customization of model-based requirements implies a composition process where some elements are added, others are removed and, eventually, some are modified from the initial models. This section describes five criteria taken into account for the design of the VML4RE. These criteria raised from the needs for requirements models specification and composition for heterogeneous SPLs proposed by the industrial partners in the AMPLE project [17]: C1: Support Multi-View Variability Composition: Requirements are the early software artifacts most recurrently documented in a multi-view fashion. In this context, variability manifests itself in different kinds of requirements (e.g., functional requirements and quality attributes) and design constraints (e.g., different databases, network types or operating systems) [2]. Modeling the requirements using multiple views facilitates the understanding of the SPL’s variabilities and its specific products. This is particularly important in SPL development as it encompasses a number of stakeholders, from customers and managers of specific products, to core SPL architects, programmers and testers. C2: Provide Requirements-Specific Composition Operators: Requirements descriptions are typically based on significantly-heterogeneous languages. Specific composition operators for combining common and varying requirements based on elements used in different views or models facilitate the operators’ adoption by the SPL developers. General-purpose composition languages, such as XTend [12], ATL [13] and AGG [14], require a deep knowledge of the abstract syntax of the models to describe their composition. This highlights the problem of abstraction mismatch and the need for a composition language that does not require additional developer expertise. Requirements engineers should work at the same level of abstraction they are used to [15]. C3: Support Fine- and Coarse-Grained Composition: Requirements models can represent different levels of detail for a specific product. Coarse-grained modeling helps to define the scope of the system to be built by expressing the goals or the main functions of the product. Each coarse-grained element is often associated with a variety of fine-grained elements. The latter provide detailed requirements for what the system must do or sub-goals of the different parts of the system. For instance, UML provides coarse-grained model elements, such as packages and use cases, to organize the main subsystems and functions of the system to be built. Then, other models, such as activity diagrams, support further refinements of use cases. As a result, both finegrained and coarse-grained composition is required to address the different levels of abstraction employed in SPL requirements engineering. C4: Support Positive and Negative Variability: In general, there are three means to derive models for a specific SPL product: positive variability, negative variability
106
M. Alférez et al.
and a combination of both. Negative variability is the removal of optional elements from a given structure, whereas positive variability is the addition of optional parts to a given core [18]. Optional elements are related to optional and alternative features of the SPL and the core part encompasses features that are common to all the products. Sanchez et al. [15] presented a positive-negative modeling technique for variability management, but the composition operators are specific to architectural models. The flexibility provided by a positive-negative approach for composition is also advisable for requirements models. For example, in cases where the addition of a model element requires the removal of other(s) elements, as often happens when modeling mutuallyexclusive features. C5: Facilitate Trace Links Generation: Variability specification usually keeps implicit information governing the relationships between each SPL feature and respective requirements models. Composition methods could support explicit traceability of varying features through the generation of trace links from variability specifications. Hence, traceability information could be used to analyze system evolution properties, such as change impact analysis or requirements coverage. The five criteria described above formed the basis for the VML4RE design. The use of the VML4RE language assumes a process workflow, which is described in Figure 1. Domain Engineering encompasses the creation of a set of artifacts associated with the SPL. These artifacts are reused in application engineering to produce specific SPL products. VML4RE is useful at the first stage of domain engineering, called domain analysis. Variability identification and SPL requirements modeling are the most important activities, which are performed in parallel during domain analysis. During variability identification (Figure 1-A), distinction is made between core (common) SPL features and the features of specific products. SPL requirements modeling (Figure 1-B) tackles the detailed specification of features using different requirements modeling techniques and notations (related to C1). Composition specification (Figure 1-C) relies on requirements-specific composition rules to specify how to customize requirements models (related to C2). These rules can be based on operators that address both fine- and coarse grained compositions (related to C3). The reusable artifacts created in domain engineering are used in application engineering to derive specific product models through the definition of configurations. Existing product derivation tools like pure::variants [19] and Gears [20] mainly allow to derive the complete or partial source code of a product. The input to this derivation is the existing code artifacts produced for a SPL architecture. However, these tools do not provide language support for the derivation of requirements models for a specific product (related to C2). In a VML4RE-centric process, variability resolution (Figure 1-D) implies selection of the variable features to be included in the product. Finally, models derivation (Figure 1-E) is the actual composition of the different models of a specific product. This supports the addition and removal of elements from the initial models (related to C4). Additionally, when deriving the models, appropriate tool support can be able to generate the trace links (Figure 1-F) between the features chosen for the product and the different parts of the requirements models (related to C5). This paper focuses on the Composition Specification activity highlighted in grey in Figure 1. The next section presents the case study and introduces VML4RE as a way of addressing the five criteria just discussed.
Multi-view Composition Language for Software Product Line Requirements
3 Case Study: Home Automation Smart Home is a software product line for home automation being developed by Siemens AG [21]. For brevity and clarity, we rely on a subset of the Smart Home features. The left hand side of Figure 2 shows the partial feature model of the product line, while the middle of the figure presents one of its possible configurations, the “Economical Smart Home” (to create the models we use the FMP tool [22]). Some optional features are not included in such an Economical edition. Therefore, camera surveillance and internet as user interfaces are not part of the final product, for example. Hence, these features are not ticked in the product feature model (middle). Figure 3 presents the use case model of the Economical Home as an exemplar of the set of models that we intend to obtain after the composition process. The elements highlighted in grey are related to variable features selected to be included in the Economical Home, while the rest of the elements are related to common features. Table 2 gives an example of the relationships between features and parts of the models. Also, the following sections provide more details on how this model was composed. Smart Home inhabitants must be able to adjust the heater of the house to their preferred value (Manual Heating feature). In addition, the Smart Heating feature might be activated in a house. If so, a heater control will adjust itself automatically to save energy. For instance, the daily schedule of each inhabitant is stored in the Smart Home gateway. When the house is empty, the heater is turned off and later turned back on, on time to reach the desired temperature when the inhabitants return home. Smart Home can also choose to open or close windows automatically, to regulate the temperature inside the house as an option to save energy (Electronic Windows feature). Alternatively to the electronic windows, the inhabitants could always be able to open and close the windows manually (Manual Windows feature). There are different types of graphical user interfaces that allow monitoring and managing the different devices of the smart home as well as receive security notifications (GUI feature). The available GUIs alternatives are: touch screens inside the house (Touch Screen feature), or via internet through a website and a notifier (Internet feature). As far as the Security feature is concerned, inhabitants can initiate the secure mode by activating the glass break sensors or/and camera surveillance devices (Glass Break Sensors and Cameras features). If an alarm signal is sent by any of these devices, and according to the security configuration of the house, the Smart Home decides to (i) send a notification to the security company and the inhabitants via internet
108
M. Alférez et al.
and touch screens, (ii) Secure the house by activating the alarms (Siren and Lights features), and/or (iii) closing windows and doors (Electronic Windows feature). Next we introduce VML4RE and illustrate its use with this case study.
Fig. 2. (left) Smart Home Feature Model; (middle) Feature Model Configuration for the Economic Home; (right) Feature Model Notation
Heating <> Heater
Adjust HeaterValue <> Calculate Energy Consumption
Fig. 3. Smart Home Economical Edition Use Case Model
Multi-view Composition Language for Software Product Line Requirements
109
4 VML4RE This section outlines the VML4RE process, its main elements and its composition semantics. 4.1 VML4RE Process The VML4RE process is described by instantiating the requirements composition process outlined in Figure 1. Figure 4 shows the specific artifacts employed in each of the activities. For variability identification (Figure 4-A), we employ a feature model that specifies the common and variable features of the SPL, as well as their dependencies. For requirements modeling, we employ various requirements models. In particular, we chose use cases whose detailed behavior is modeled using activity models. This mimics what often happens in mainstream UML-based methods, such as RUP [23]. The further elaboration of use cases with activity models; in contrast to freeformat textual descriptions, facilitate the adoption of model-driven generation tools. This alternative provides models that conform to a metamodel (i.e., the metamodel of UML activity diagrams), thereby reducing the ambiguity in the specifications [2]. The detailed specification of use cases as activity models also enables customizations of use cases realizing specific SPL configurations. During requirements modeling, other models, such as goal models [9, 10], can be used to specify interactions between functional and non-functional requirements. Such models also allow studying the actors and their dependencies, thus encouraging a deeper understanding of the business process. In addition, goal models can be used as a way to introduce intentionality in the elicitation and analysis of requirements. As a consequence, these goals allow the underlying rationale to be exploited in the selection of variants in the application development process [24]. The VML4RE specification (Figure 4-C) references the requirements models and specifies composition rules (also called actions). The VML4RE interpreter (Figure 4E and F) receives as input the SPL REquirements (RE) models (Figure 4-B), the feature model configuration (Figure 4-D) and the VML4RE specification (Figure 4-C). As output, the interpreter generates: (i) use cases of a product; (ii) activity models that describe product usage scenarios; (iii) additional requirements models, such as, goal models (Figure 4-E); and (iv) the trace links between features and specific elements in the requirements models (Figure 4-F). 4.2 VML4RE Main Elements Each VML4RE specification is composed of three main kinds of elements: 1. Importing: it imports the set of requirements and feature model that are used in the VML4RE specification. This is accomplished using import sentences. 2. Commonalities: it defines the features that are mandatory to every product of a SPL. It is used to reference the parts of the requirements that are related to SPL common features. 3. Variabilities: it defines the variable (optional, variation points and variants) features of the SPL. Optional features are not mandatory and might not be included in some of the products of the SPL. A variation point identifies a particular concept
110
M. Alférez et al.
within the SPL requirements specification as being variable and it offers a number of variants. A variant describes a particular variability decision, such as a specific choice among alternative variants. The variabilities blocks are used to: (i) reference (sentences initiated by the keyword ref) the requirements related to each variable feature, and (ii) enclose operators used to compose the requirements related to each variable feature. A. Feature Model
Generates E. Product –Specific RE Models F. Trace Links betw een Features & RE models
Fig. 4. Artifacts and Composition Workflow
The VML4RE specification outline (in Figure 4-C) contains separated blocks for import sentences, common features like X, and variable features like Y, Y1, Y2 and Z. Each optional, variationPoint and variant blocks can have select and unselect subblocks. They indicate the set of references and actions that are taken into account if the feature was selected or not in the feature model configuration. Thus, given that Y and Y1 are selected in the feature model configuration, the actions and references
Multi-view Composition Language for Software Product Line Requirements
111
inside the select block of feature Y1 are executed. The actions and references inside the unselect block of the Y2 and Z features are also executed. 4.3 References and Composition Operators VML4RE provides references to indicate which elements in the requirements models are related to specific features. Also, it provides a set of specialized operators for composing requirements model elements like use cases, packages, activities or goals. The upper part of Table 1 summarizes the description of the structure of the elements related with the references. In VML4RE specifications, the ref statements allow creating a reference between the different common, optional and alternative features and specific parts of models. In the ref statements, it is possible to use designators (e.g., “.”, “equal”) and quantification (e.g., “*” that indicates all the elements inside a model element). Logical operators like “And” and “Or” can be used to create more complex query expressions over the models. Listing 1 provides examples of references to packages, activities and use cases that will be explained in the description of the Smart Home section. Table 1 also summarizes the structure of some composition operators. These include operators that are relevant to use case, activity and goal models (in particular, the strategic dependency model of the i* [10] goal-oriented approach). Analogous to the insert operators that add parts to the base model, we have replace and remove operators. The complete metamodel and grammar of the language can be found in [25]. The semantics of each VML4RE composition operator can be defined in terms of a model-to-model transformation. For instance, the “Insert Use Case Links” operator using the use case link type “associatedWith”, connects an actor and a use case using an association link (for example, insert(UCLinks_of_type: associatedWith{from actorD to useCaseModelA.PackageB.useCaseC});). The intended transformation of the use case model can be presented by the left hand side (LHS) and right hand side (RHS) graphs in Figure 5, where the inputs are a use case model, a use case, a use case’s package, and an actor. If there is already an association between the actor and the use case in the same package, the transformation is not applied to avoid duplicates. This is expressed with the cross in some elements in the LHS graph that act as negative application conditions (NAC). It means that any match against the LHS graph cannot have a packageB with any existing association between actorD and the useCaseC. In general, a graph transformation is a graph rule r: L —› R for LHS graph L to a RHS graph R. The process of applying r to a graph G involves finding a graph monomorphism, h, from L to G and replacing h(L) in G with h(R) [26]. The notation used to define this graph transformation is similar to the one used by [27] where the LHS and RHS patterns are denoted by a generalized form of object diagrams. However, for visual simplicity we added dashed lines between elements to represent any number of containments (in this case, package’s containments). We defer to [27] for the readers interested in details of this notation. Figure 6 illustrates the replace operator with the example “Replace use case”. A replace in this context includes to remove a use case and then insert a new use case linked in the place of the old use case (for example, replace (useCase useCaseModelA.useCaseB by UseCase useCaseC } ) ; ).
112
M. Alférez et al. Table 1. Some of the VML4RE elements
Element
Reference
Where Declaration
Expression
Element Insert Package Insert Use Case Insert Use Case Links
Description and Structure of Some Elements Related with References Identifies one or more requirements model elements. The references are made to specific types of elements in the models. This is expressed using the designator ofType that allows querying based on the type of model element (ElementType), e.g., UseCase, Activity, Actor, or Element when the referenced models elements are of different types. Reference : "ref" ref_name ofType ElementType"{"(RefExpression | ref_name2) WhereDeclaration?"}"; RefExpression : elementName (("." RefExpression)|".*")?; It is an optional part of a reference expression that allows querying a set of model elements based on their name. WhereDeclaration : "Where" "(" Expression ")"; Some of the possible designators are: equal, different, startsBy, finishesWith, and contains. They search for matches between a literal and the first letters, last letters or in any place in the names of the model elements of a specific type, respectively. Besides, the expressions can be combined with logical operators like and and or to create more complex queries. Expression : BooleanExpression(SubExpression)*; SubExpression : Operator BooleanExpression ; BooleanExpression : "contains" literal | "equal" literal | "different" literal | "startsBy" literal | "finishesWith" literal Description and Structure of Some Actions Insertion of a package in a use case model, or in another package "package" package_name “into” RefExpression; Insertion of a use case into a use case model or inside a package(s) "useCase" useCase_name "into" RefExpression; Insertion of different relationships between elements in a use case model "UClinks_of_type:" UseCaseLinkType "{" UCElementsLinkage+ "}" ;
Helps to factorize the insertion of relationships in a use case model (Insert Use Case Links) according to the UseCaseLinkType for a better organization of the actions. "from " RefExpression "to" RefExpression ("," RefExpression )*; Available relationships between use cases (inherits, extends, includes) and between actors and use cases Use Case (associatedWith and biAssociatedWith for bidirectional relationships). ("inherits" | "extends" | "includes" | "associatedWith" | Link Type "biAssociatedWith"); Insertion of an actor into a use case model or package Insert "actor" actorName "into" RefExpression ; Actor Insert Inserts an activity into an activity model "activity" (newActivityName "into" RefExpression) ; Activity Activity Helps to factorize the insertion of relationships in an activity model (InsertActivityLinks, not shown in this table) and optionally to add a guard condition. Elements "from" RefExpression "to" RefExpression ("with guard" guardCondition)?; Flow Replace Replaces a use case by a new one "useCase" RefExpression "by" "useCase" newUseCaseName; Use Case Replaces an activity by a new activity or a complete activity model. Replace "activity" RefExpression "by"(("activity" newActivityName)|("activityModel" Activity RefExpression); Inserts a goal of I* (indicated by the i in iGoal) in a strategic dependency model. Insert “iGoal” goalName "into" RefExpression ; iGoal Insertion of different dependencies relationships between elements in a strategic dependency model. Insert iGoal "IGoalDependencies_of_type:" iGoalDependencyType "{" iGoalElementsLinkage+ "}" dependencies ; iGoal The links between the nodes in the strategic dependency diagram go from depender to dependee through dependum. Elements "from " RefExpression "to" RefExpression "through" dependumName; Linkage iGoal Dependency ("resourceDependency"|"taskDependency"|"goalDependency"|"SoftGoalDependency"); Type UC Elements Linkage
The advantage of specifying model compositions with a pure graph transformation approach is its expressivity by allowing accessing all the elements of the metamodel. However, software modelers typically do not have the in-depth knowledge about intricacies of the requirements metamodels required to specify a graph rule [28]. The actions in VML4RE do not require any kind of knowledge about the details of the metamodels. They provide requirements-specific composition operators that facilitate the specification of the composition of the models.
Multi-view Composition Language for Software Product Line Requirements
LHS
RHS
useCaseModelA: Model
packageB:Package
actorD:Actor
useCaseModelA: Model
packageB:Package
actorD:Actor
Type useCaseC : UseCase
:Association
Type
113
Type useCaseC : UseCase
A_src_dst: Association
Type memberEnd
memberEnd
memberEnd memberEnd
:Property
:Property
Dst:Property
Src:Property
Aggregation= none
Aggregation= none
Aggregation= none
Aggregation= none
Fig. 5. Graph Rule to Insert an Association between actorD and useCaseC in PackageB
LHS
useCaseModelA : Model useCaseB : UseCase
RHS
useCaseModelA : Model useCaseC : UseCase
Fig. 6. Graph Rule to Replace UseCaseB by UseCaseC
5 Applying VML4RE This section illustrates the use of the references and some VML4RE actions for domain and application engineering. 5.1 VML4RE in Domain Engineering The Smart Home requirements were modeled with use case and activity models created with the UML Tools plug-in [29]. The FMP tool [22] was used to build a feature model to specify SPL commonalities and variabilities. This tool supports cardinalitybased feature models. The relations between the models are specified with VML4RE. The VML4RE editor is implemented using xText [30], a framework for the development of textual DSLs. It is part of the VML4RE tool suite [25] implemented in the Eclipse platform as a set of extensible plugins. It is based on openArchitectureWare [12], a model-driven development infra-structure, and the Eclipse Modeling Framework (EMF) [31]. Listing 1 shows a partial view of this specification. Initially, the different requirements and feature models are imported to be used in the specification (Lines 2-4). In the VML4RE specification, the modeler can create references to requirements models. For instance, it is possible to reference a specific element in a model, like an actor; this happens in “ref Heater ofType Actor {uc.Heater}” (line 10); or all the elements (e.g., use cases, packages, actors) inside one container element, e.g., “ref AllHeatingElementsInUCs ofType Element{uc.Heating.*}” (lines 13-15); or elements in different parts of the models according to a search condition, like “ref SurDev ofType
114
M. Alférez et al.
Activity {ams.* Where equal VerifyInstalledSurveillanceDevice} ” that searches in the set of activity models for activities with the name “VerifyInstalledSurveillanceDevice” (lines 51-52).
The VML4RE specification also employs actions to specify how variable requirements model elements are composed with common requirements model elements. Listing 1 presents several actions to be applied in activity and use case models. For example, the insertion of the Security package into the use case uc (line 28), or the insertion of the SecureTheHouse use case in the Security package (line 31-32) and the insertion of an association between the GlassbreakSensor actor and the use case SecureTheHouse (lines 49-50). 5.2 VML4RE in Application Engineering In application engineering, the feature model configuration is used as a driver during the process to derive product-specific requirements models. Figure 2 (middle) shows
Multi-view Composition Language for Software Product Line Requirements
115
the feature model configuration of an Economical Smart Home. The Economical Smart Home does not have camera surveillance or use internet to send security notifications. The VML4RE interpreter processes the SPL requirements models and the feature model configuration to derive a product-specific requirements model. During this process, we can use a positive, negative, or a mixture of positive and negative variability transformation strategies. Our interpreter first includes all the model requirements elements related to mandatory features by processing the respective ref statements specified inside of the common feature blocks. These elements are also called the core model in our approach, since they are included in every SPL instance. After that, the interpreter processes the ref statements and actions of the variabilities. In this example of the Smart Home, we will illustrate the use of the VML4RE in conjunction with a positive variability approach since we mostly used actions that add optional parts to the base model. Finally, product-specific requirements models are produced based on processing the VML4RE specification (Listing 1). Given the possibility of defining, in a unique VML4RE specification, the relationships between a feature model and several requirements models (e.g., use case and activity models), our interpreter produces different product-specific requirements models in just one-step. Our current implementation [25] supports the derivation of use case and activity models, and we are working to address other models (i* [10] and KAOS [32], for example). During the Economical Smart Home derivation process, the actions and references related to Internet and Cameras were not included; for instance, the reference and action related to the Cameras actor (Listing 1, lines 58-59). The result of the composition of the use case model is shown in Figure 3, where the elements added to the core model were highlighted in grey. In addition to the use case model, other requirements models were transformed according to the execution of VML4RE actions. Figure 7 shows the ActivateSecureMode activity model, related to the use case with the same name. When the Security optional feature is chosen, the actions contained in the “select” block of the Security variation point are performed (Listing 1, lines 28-38), and the ActivateSecureMode (Figure 7 (left)) activity model is included into the final requirements product models. During derivation of product-specific requirements models, some of the generic activities in the activity model can be replaced with others more specific to the product that is being configured. This happens in the Economical Home where the GlassBreakSensors are the only surveillance devices selected in the configuration. Hence, we could create a simple replacement of the VerifyInstalledSurveillanceDevices activity by the VerifyInstalledGlassBreakSensors activity as appears in Listing 1 (lines 4748). As there are probably other activities for verification of surveillance devices in the requirements models, we use the Where operator. The result of the replacement is shown in Figure 7 (right). If two (or more) variants from an OR feature are selected, such as the Intrusion Detection, our interpreter produces two (or more) different activity models, one for each instance. This strategy was developed to avoid conflicts in the transformation of a same activity model.
116
M. Alférez et al.
VerifyInstalled SurveillanceDevices
VerifyInstalled GlassBreakSensors
WaitForAlarmSignal
WaitForAlarmSignal
Fig. 7. Simplified Smart Home ActivateSecureMode Before and After a Replace Activity Action
VML4RE allows the product derivation of trace links between the features and elements in other models, such as use cases and activity diagrams. This derivation can be accomplished based on the ref sentences inside each of the common and variable blocks. Each ref in the VML4RE specification can determine several references between model elements from the feature model and the SPL requirements models. There may be also cases when an element in a requirements model is referenced by more than one feature. VML4RE specifications are processed automatically by our tool [25, 33] to generate all the set of links involving SPL requirements models (see Section 2, C5). Table 2 presents a partial set of the trace links relevant to the feature “Heating Management”. These links are created based on references, such as the ones in Listing 1, lines 9-10, 13-14. Lines 9-10 refer to the package “Heating” and the actor “Heater”, and lines 13-14 refer to any kind of element inside the “Heating” use case package like the use case “Control Temperature Automatically” and “Adjust Heater Value”. Table 2. Part of the Trace Links Generated by the References in the Heating Management Feature Feature
Element Name
Heating Management
Type
Heating Heater Control Temperature Automatically Adjust Heater Value Smart Heating
Package Actor UseCase UseCase ActivityModel
…
…
6 Evaluation and Discussion This section discusses the benefits and limitations of VML4RE based on our experience from the application of the language. We have evaluated the usefulness of VML4RE in three case studies [21], two of them proposed by partners of the European AMPLE project [17], the Smart Home proposed by Siemens A.G. [34], and a slice of the customer relationship management system, developed by SAP A.G [35] . The third case study is a product line for handling mobile media [36]. These three product lines are from different domains and exhibit different kinds of variability (e.g., options and variants). All of them encompassed textual requirements. Feature models and UML use cases were available for the Mobile Media and Sale Scenario and an activity model was also available for the latter. The activity models of Mobile
Multi-view Composition Language for Software Product Line Requirements
117
Media and Smart Home were translated from informal textual use case scenarios to activity models. The output models were validated by the original developers of the case studies. The goal models for the Sales Scenario system were produced by two teams of postgraduate students at Universidade Nova de Lisboa, based on the use case scenarios and market requirements provided by SAP A.G. We evaluate the VML4RE usefulness based on the criteria for requirements models composition defined in Section 2 and then we discuss on additional benefits and limitations of VML4RE. C1: Support Multi-view Variability Composition: Each feature block in VML4RE concentrates the actions related only with itself and that can transform models in multiple views of the requirements. VML4RE was initially designed to support the composition of two of the most commonly used requirements modeling techniques, such as use cases and activity models that address coarse and fine grained operational views of the requirements. We have also started using it with very different kinds of requirements modeling, like the goal-oriented modeling technique i* [10], that addresses a quality and intentionality view of the requirements, as happened in the case of the Sales Scenario. C2: Provide Requirements-Specific Composition Operators: as presented in Table 1, VML4RE provides specialized operators for composing requirements model elements of specific types, such as use cases, packages, activities or goals. The composition operators are simple and did not require from the modeler a deep knowledge on the relationships between the metamodel’s metaclasses. For instance, the UML2.0 metamodel for the use cases has metaclasses like property, association and classifier. These metaclasses are important on the design of the transformations, but they are not needed when writing compositions with VML4RE. The composition description was simple in the three case studies because it was based on a vocabulary used in the domain of each modeling technique (e.g., use case, associatedWith, package, dependency). C3: Support Fine and Coarse-Grained Composition: in the three case studies the coarse-grained composition was performed in terms of broadly-scoped elements, such as packages, use cases. The operators “remove package” and “insert…use case” are examples of such cases. VML4RE also addresses fine-grained composition when the actions are performed within coarse grained elements. The operators “insert activity”, “insert activity links” are examples of such cases. C4: Facilitate Trace Links Generation: As explained in Section 5.3, our approach supports the derivation of trace links. These links record relations between features specified in feature models and other requirements model elements pertaining to the SPL or to a specific product. This is accomplished with the reference sentences that are processed by the tool suite. We are currently exploring different traceability scenarios that process these relationships to expose useful information. This information could be exploited in many activities, such as discovering candidates of bad feature interactions and visualizing variations in different requirements models. Many of these traceability functionalities also facilitate the job of SPL architects as they are also valuable to analyze the design change impact when evolving SPL features and requirements. C5: Support Positive and Negative Variability: VML4RE offers operators to support positive variability (e.g., insert) and negative variability through remove or replace operators. Positive variability presents some advantages for modeling and composing requirements models. For example, requirements modeling is characterized by
118
M. Alférez et al.
the incremental knowledge acquisition about the system. In this sense, starting with a relatively small and easy to understand set of models seems to be a good starting point. Then, while the developer knows more about each feature of the SPL, s/he can incrementally specify how each new variable feature will modify the existing models. Positive variability also allows the management of variability in time. If the core model is created using generic requirements then, the requirements models are more flexible to include future specific requirements that instantiate the generic ones. Take for example Figure 7 (Right), it specifies that, at some point, it is necessary to “verify installed Surveillance devices”. Then this requirement not only allows its instantiation for current surveillance devices, like “GlassBreak Sensors” and “Cameras”, but it can also be instantiated by other unknown surveillance devices that were not initially considered in the SPL. While modeling with VML4RE we saw additional benefits of the composition of requirements models. For example: Testing and Understanding the Behavior of Specific Products: The automatic derivation of requirements models for a specific product is useful both to understand which requirements and features are involved in the development of an SPL product, and to support the testing and documentation activities. In particular, activity models are an example of requirements artifacts that are well suited for business process modeling and for modeling the logic captured by a single use case or scenario as happened during the modeling of the Sales Scenario. Activity models can provide a base to understand and validate the behavior of parts of a product of the SPL in the presence or absence of specific features. Also, using the goal-based modeling in the Sales Scenario allowed us to understand the dependencies between the actors, thus encouraging a deeper understanding of the business process. Consistency Checking between Feature Models and other Requirements Models: Modeling different models in large systems like SPLs can be a difficult, timeconsuming, and highly error-prone task, if appropriate supporting tools are not available. During the realization of our three case studies, we noticed that the generated trace links from VML4RE specifications, can be process by our traceability framework [33] to detect inconsistencies between features and requirements in different models. Examples of such inconsistencies are: (i) inexistence of features related to specific requirements; (ii) inexistence of requirements related to specific features; (iii) conflicts between features acting over the same requirements (which can be valid or not). The consistency management of the relationships between features and requirements models is also fundamental to help the functionalities’ tracing mentioned above, especially in SPL evolution scenarios. Finally, we came across with some issues during the application of the compositions. When creating the composition actions for each variation point with the core model, the modeler could have assumptions regarding the existence, position and name of the model elements. However, the models change after the application of each insertion, replacement or deletion of model elements and it could unable the application of some subsequent actions. It is necessary to determine the best precedence order for the application of the actions in each variation point and also for the application of each variation point after model modifications. Existing formal methods and model-checking techniques and tools like simulation, or critical pair analysis as introduced by Javaraman, et al.[16] may be the first solution candidates.
Multi-view Composition Language for Software Product Line Requirements
119
7 Related Work Most of the work on feature composition is focused on implementation, such as Ahead [37] and pure::variants [19]. There are a couple of languages that focus on the architecture level like VML4Architecture [38] and Koala [39]. Recently, some approaches were proposed to support the definition of relationships between SPL features and requirements and the composition of requirements models. Pohl [2] separates variability from functional information, in an orthogonal model. He proposes a variability metamodel that includes the following two relationships: the Artifact Dependency relationship (that relates variants with development artifacts), and VP Artifact Dependency (that relates variation points to development artifacts). These elements enable the definition of links between the variability model and other development artifacts. Nevertheless, this work is focused on documenting variability, rather than expressing how to specify the composition of requirements models. Czarnecki and Antkiewicz [6], and Bragança and Machado [40] create explicit relationships between features and requirements. Czarnecki and Antkiewicz propose a general template-based approach which enables the creation of relationships between elements (abstractions and relationships) from an existing model to the corresponding features through a set of annotations. The annotations are used mainly to indicate the presence of conditions of specific model elements or model templates according to feature occurrences. In contrast with VML4RE that allows positive, negative or positive-negative variability, Czarnecki and Antkiewicz [6] only employ negative variability. Bragança and Machado use a simplified feature model based on the one proposed by Czarnecki and Antkiewicz and employ UML notes in use case diagrams to indicate variability. These notes are linked to includes and extends relationships, providing variability data. The main disadvantage with these two approaches [6, 40] is that they fail to fully separate functional and variability information as they use intrusive graphical elements such as, presence conditions or notes in their models to indicate variability. Hence, variability information may be scattered and polluting the models, making them difficult to understand and maintain. Gomaa [5] extends UML-based modeling methods for single systems to address software product lines. He uses stereotypes (e.g., <>, <> or <>) to indicate variability, models use case packages as features in a feature model, and manually relates features with other model elements using matrixes. Variability stereotypes and other kinds of stereotypes are mixed in the same models reducing the understandability of the models. Although the previous approaches provide techniques to establish the relationships between feature and requirements models, they lack a language to specify the actual composition of different requirements models. Our work proposes a requirements specific language and tool support to deal with composition of requirements models for software product lines. There are other approaches that provide languages to create reference expressions and composition rules. XWeaver [18], for example, supports the composition of different architectural viewpoints. It composes crosscutting concerns encapsulated as aspect models into non-aspect-oriented base models, thus following an asymmetric composition approach (though this could be extended for symmetric approach with relatively little effort). XWeaver is similar to our approach because the composition is done based on matching names of elements in the aspect and the base model. It
120
M. Alférez et al.
employs an OCL-like expressions language [12] that play the role of the VML4RE’s references. However, it does not provide requirements specific composition operators. MATA [28] is an aspect-oriented approach to model composition based on graph rewriting formalisms that can be used to compose models of different SPL features to create models specific of products [16]. It employs graphical patterns that resemble the concrete syntax of specific kinds of UML models (e.g., state machines). In aspectoriented terminology, graphical patterns can be thought of as pointcuts and the composition operators can be thought of as the advices. Similarly, in VML4RE, references can be thought of as the pointcuts and the actions as the advices. VML4RE, in comparison to MATA, provides more simple operators that are especially tailored to facilitate writing composition of requirements models. However, VML4RE can complement MATA by providing concrete language support to express in the same code block of each feature, the references and composition rules for all the different requirements views. VML4RE together with similar variability composition languages focused on architecture like VML4Architecture [38] could be used as an alternative frontend for MATA. Apel et al. [11] employ superimposition of feature-related model fragments as a general models’ composition technique. We believe that this technique can be especially useful in requirements to compose coarse-grained models that keep a common structure in a positive-variability setting. However, to be more useful in a broader kind of requirements models, it requires language support to express also positivenegative variability, and to reference potentially multiple composition points for model fragments during fine-grained composition.
8 Conclusions and Future Work VML4RE address the question on how to compose elements defined in separated and heterogeneous requirements models using a simple set of operators. It was designed taking into account the five fundamental criteria discussed in Section 2. Section 6 reviewed how these criteria are addressed. VML4RE presents a contribution to the field of language support for composing SPL requirements due to its unique characteristics: (1) each feature block (e.g., common, variant) concentrates a cohesive set of actions that can transform models in multiple requirements views; (2) new composition operators are especially tailored for canonical requirements models and rely on a vocabulary familiar to requirements engineers; (3) there is an explicit separation between the modeling of variability and requirements, without forcing the intrusive inclusion of variability-related elements in requirements models; (4) the new operators that can add, remove or replace parts of the models, thus supporting both positive and negative variability; (5) the use of references to facilitate the creation of compositions and the generation of trace links. Currently, we are investigating the application of model-driven techniques to keep consistent the relationships between SPL variability and requirements models during models’ evolution. Also, we are studying an effective way to determine the best precedence order for the application of the actions in each variation point and also for the application of each variation point after model modifications. Finally, we are interested in showing the use of our language using other requirements views and improving the usability of VML4RE by including a graphical concrete syntax.
Multi-view Composition Language for Software Product Line Requirements
121
Acknowledgments. This work is supported by the European FP7 STREP project AMPLE [17].
References 1. Clements, P., Northrop, L.M.: Software Product Lines: Practices and Patterns. AddisonWesley, Boston (2002) 2. Pohl, K., Böckle, G., van der Linden, F.: Software Product Line Engineering: Foundations, Principles and Techniques. Springer, Berlin (2005) 3. Czarnecki, K., Eisenecker, U.W.: Generative Programming: Methods, Tools, and Applications. ACM Press/Addison-Wesley Publishing Co. (2000) 4. Kang, K., Cohen, S., Hess, J., Novak, W., Peterson, A.: Feature-Oriented Domain Analysis (FODA) Feasibility Study. Technical Report CMU/SEI-90-TR-021, Software Engineering Institute, Carnegie Mellon University (1990) 5. Gomaa, H.: Designing Software Product Lines with UML: From Use Cases to PatternBased Software Architectures. Addison-Wesley, Reading (2004) 6. Czarnecki, K., Antkiewicz, M.: Mapping Features to Models: A Template Approach Based on Superimposed Variants. In: Glück, R., Lowry, M. (eds.) GPCE 2005. LNCS, vol. 3676, pp. 422–437. Springer, Heidelberg (2005) 7. Alexander, I., Maiden, N.: Scenarios, Stories, Use Cases. Wiley, Chichester (2004) 8. Unified Modeling Language (UML) Superstructure, Version 2.1.2: 2007-11-02 9. Chung, L., Nixon, B., Yu, E., Mylopoulos, J.: Non-Functional Requirements in Software Engineering. Kluwer Academic Publishers, Dordrecht (1999) 10. i* an Agent-oriented Modelling Framework, http://www.cs.toronto.edu/km/istar/ 11. Apel, S., Janda, F., Trujillo, S., Kästner, C.: Model Superimposition in Software Product Lines. In: Paige, R.F. (ed.) ICMT 2009. LNCS, vol. 5563, pp. 4–19. Springer, Heidelberg (2009) 12. openArchitectureWare, http://www.openarchitectureware.org/ 13. Jouault, F., Kurtev, I.: Transforming Models with ATL. In: Bruel, J.-M. (ed.) MoDELS 2005. LNCS, vol. 3844, pp. 128–138. Springer, Heidelberg (2006) 14. Taentzer, G.: AGG: A graph transformation environment for modeling and validation of software. In: Pfaltz, J.L., Nagl, M., Böhlen, B. (eds.) AGTIVE 2003. LNCS, vol. 3062, pp. 446–453. Springer, Heidelberg (2004) 15. Sánchez, P., Loughran, N., Fuentes, L., Garcia, A.: Engineering Languages for Specifying Product-derivation Processes in Software Product Lines. In: Gašević, D., Lämmel, R., Van Wyk, E. (eds.) SLE 2008. LNCS, vol. 5452, pp. 188–207. Springer, Heidelberg (2009) 16. Jayaraman, P., Whittle, J., Elkhodary, A., Gomaa, H.: Model Composition in Product Lines and Feature Interaction Detection Using Critical Pair Analysis. In: Engels, G., Opdyke, B., Schmidt, D.C., Weil, F. (eds.) MODELS 2007. LNCS, vol. 4735, pp. 151–165. Springer, Heidelberg (2007) 17. Ample Project, http://www.ample-project.net/ 18. Groher, I., Volter, M.: XWeave: Models and Aspects in Concert. In: 10th International Workshop on Aspect-oriented Modeling. ACM, Vancouver (2007) 19. pure: variants, http://www.pure-systems.com/Variant_Management.49.0.html 20. Gears, http://www.biglever.com/ 21. Morganho, H., Gomes, C., Pimentão, J.P., Ribeiro, R., Grammel, B., Pohl, C., Rummler, A., Schwanninger, C., Fiege, L., Jaeger, M.: Requirement Specifications for Industrial Case Studies. Technical Report, D5.2, AMPLE Project (2008)
122
M. Alférez et al.
22. Antkiewicz, M., Czarnecki, K.: FeaturePlugin: Feature Modeling Plug-in for Eclipse. In: 2004 OOPSLA workshop on eclipse technology eXchange, pp. 67–72. ACM Press, Vancouver (2004) 23. Kruchten, P.: The Rational Unified Process: An Introduction. Addison-Wesley Longman Publishing Co., Inc., Amsterdam (2003) 24. González-Baixauli, B., Laguna, M.A., Leite, J.C.S.d.P.: Using Goal-Models to Analyze Variability. Variability Modelling of Software-intensive Systems, Limerick, Ireland (2007) 25. Variability Modeling Language for Requirements, http://ample.di.fct.unl.pt/VML_4_RE/ 26. Grzegorz, R. (ed.): Handbook of Graph Grammars and Computing by Graph Transformation. Foundations, vol. I. Foundations. World Scientific Publishing Co., Inc., River Edge (1997) 27. Markovic, S., Baar, T.: Refactoring OCL Annotated UML Class Diagram. In: Briand, L.C., Williams, C. (eds.) MoDELS 2005. LNCS, vol. 3713, pp. 280–294. Springer, Heidelberg (2005) 28. Whittle, J., Moreira, A., Araújo, J., Jayaraman, P., Elkhodary, A., Rabbi, R.: An Expressive Aspect Composition Language for UML State Diagrams. In: Engels, G., Opdyke, B., Schmidt, D.C., Weil, F. (eds.) MODELS 2007. LNCS, vol. 4735, pp. 514–528. Springer, Heidelberg (2007) 29. MDT-UML2Tools, http://www.eclipse.org/uml2/ 30. Xtext Reference Documentation, http://www.eclipse.org/gmt/oaw/doc/4.1/r80_xtextReference.pdf 31. Eclipse Modeling Framework, http://www.eclipse.org/modeling/emf/?project=emf 32. Goal-Driven Requirements Engineering: the KAOS Approach, http://www.info.ucl.ac.be/~avl/ReqEng.html 33. Sousa, A., Kulesza, U., Rummler, A., Anquetil, N., Mitschke, R., Moreira, A., Amaral, V., Araújo, J.: A Model-Driven Traceability Framework to Software Product Line Development. In: 4th Traceability Workshop held in conjunction with ECMDA, Berlin, Germany (2008) 34. Siemens AG - Research & Development, http://www.w1.siemens.com/innovation/en/index.php 35. SAP A.G, http://www.sap.com/about/company/research/centers/ dresden.epx 36. Figueiredo, E., Cacho, N., Sant’Anna, C., Monteiro, M., Kulesza, U., Garcia, A., Soares, S., Ferrari, F.C., Khan, S., Filho, F.C., Dantas, F.: Evolving software product lines with aspects: an empirical study on design stability. In: ICSE 2008. ACM, Leipzig (2008) 37. AHEAD Tool Suite, http://www.cs.utexas.edu/users/schwartz/ATS.html 38. Loughran, N., Sánchez, P., Garcia, A., Fuentes, L.: Language Support for Managing Variability in Architectural Models. In: Pautasso, C., Tanter, É. (eds.) SC 2008. LNCS, vol. 4954, pp. 36–51. Springer, Heidelberg (2008) 39. Rob van, O., Frank van der, L., Jeff, K., Jeff, M.: The Koala Component Model for Consumer Electronics Software. Computer 33(3), 78–85 (2000) 40. Bragança, A., Machado, R.J.: Automating Mappings between Use Case Diagrams and Feature Models for Software Product Lines. In: SPLC, pp. 3–12. IEEE Computer Society, Kyoto (2007)
Yet Another Language Extension Scheme Anya Helene Bagge Bergen Language Design Laboratory Dept. of Informatics, University of Bergen, Norway [email protected]
Abstract. Magnolia is an experimental programming language designed to try out novel language features. For a language to be a flexible basis for new constructs and language extensions, it will need a flexible compiler, one where new features can be prototyped with a minimum of effort. This paper proposes a scheme for compilation by transformation, in which the compilation process can be extended by the program being compiled. We achieve this by making a domain-specific transformation language for processing Magnolia programs, and embedding it into Magnolia itself.
1
Introduction
Implementing a compiler for a new programming language is a challenging but exciting task. As the language design evolves, the compiler must be updated to support the new design or to prototype the design of new features. Magnolia is both an experimental programming language, and a language for language experiments. We therefore need a compiler flexible enough to keep up with changes in the language design, and with features that make implementation of experimental features easy. Use cases for a language extension facility include experimental features such as data-dependency based loop statements, embedding of domain-specific languages, restriction to sub-languages with stricter semantics and language implementation using a simple core language, and building the rest as extensions. In Magnolia, the programmer can express extra knowledge about abstractions as axioms. In the compiler, we would therefore like to preserve abstractions for as long as possible, in order to take advantage of axioms. Language extensions also provide abstractions, with knowledge we may also want to take advantage of. Desugaring extensions to lower-level language constructs at an early stage, as is done with syntax macros, discards any special meaning associated with the constructs, which could have been used for optimisation and extension-specific error checking. The Magnolia compiler is implemented in Stratego/XT [1], using compilation by transformation, where a sequence of transformation steps transform code in the source language to a target language (object code, or another programming M. van den Brand, D. Gaˇ sevi´ c, J. Gray (Eds.): SLE 2009, LNCS 5969, pp. 123–132, 2010. c Springer-Verlag Berlin Heidelberg 2010
124
A.H. Bagge
language). It is therefore natural to make use of transformation techniques for describing language extension. This paper presents an extension of the Magnolia language with transformation-based meta-programming features, so that extensions to the Magnolia language can be made in Magnolia itself, rather than by extending the Stratego code of the compiler. This gives more independence from the underlying compiler implementation. The rest of this paper is organised as follows. First, we give a brief introduction to the Magnolia language, before we look at how to add language extension to it (Section 3). We have two extension facilities, macro-like operation patterns (Section 3.1) and low-level transforms (Section 3.2). We provide an example of two extensions, before discussing related work and concluding (Section 4).
2
The Magnolia Language
We will start by briefly introducing the parts of Magnolia that are necessary to understand the rest of the paper. Magnolia is designed as a general-purpose language, with an emphasis on abstraction and specification. Abstractions are described by concepts, which consist of abstract types, operations on the types, and axioms specifying the behaviour of the operations algebraically. Multiple implementations may be provided for each concept, and signature morphisms may be used to map between differences in concept and implementation. Operations can be either procedures or functions. Procedures are allowed to update their parameters, and have no return values. Pure procedures only interact with the world through their parameters (e.g., no I/O or global data). Functions may not change their parameters, and are always pure – the only effect a function has is its return value, and it will always produce the same return value for the same arguments. Function applications form expressions, while procedure calls are statements. In addition, Magnolia has regular control-flow statements like if and while. A novel feature (detailed in a previous paper [2]) is the special relationship between pure procedures and functions. Procedures may be called as if they were functions – the process of mutification turns expressions with calls to functionalised procedures into procedure call statements. An expression-oriented coding style is encouraged. Procedures are often preferred for performance reasons, while expressions with pure functions are easier to reason about, and is also the preferred way of writing axioms.
3
Extending Magnolia
At least four types of useful extensions spring to mind: 1. Adding new operation-like constructs, that look like normal functions or procedures, but for some reason cannot or should not be implemented that way – for example, because we need to bypass normal argument evaluation, or because some of the computation should be done at compile time. This
Yet Another Language Extension Scheme
125
type of change has a local effect on the particular expressions or statements where the new constructs are used, and is similar to syntax macros in other systems. 2. Adding new syntax to the language, in order to make it more convenient to work with. We may also consider removing some of the default syntax. In Magnolia, this can be handled by extending the SDF2 grammar of the language. 3. Disabling features or adding extra semantic checks to existing language constructs. This can be used to enforce a particular coding style, to disable general-purpose features when making a DSL embedding, or to ensure that certain assumptions for aggressive optimisation holds. 4. Making non-local changes to the language – features requiring global analysis, or touching a wide selection of code. Cross-cutting concerns in aspect orientation are an example of this. We can implement this by extending the compiler with new transformations and storing context information across transformations. In a syntax macro system, new constructs are introduced by giving a syntax pattern and a replacement (or expansion). In languages like Lisp or Scheme, the full power of the language itself is available to construct the expansion. For Magnolia, things are a bit more complicated, since the extension may pass through several stages of the compiler before it is replaced by lower level constructs. We must therefore provide the various compiler stages with a description of how to deal with the language extension. To provide syntax extensibility of the kind found in languages like Dylan, one could provide Magnolia syntax for syntax definition, then extract and compile the syntax definitions to SDF2, as used in the compiler. We will not consider this here, however. A full treatment of compiler extension in Magnolia is also beyond the scope of this paper, we will therefore focus the macro-like operation patterns and briefly sketch the transform interface to compiler extension. 3.1
Operation Patterns
An operation pattern is a simple interface to language extension, similar to macros in Lisp or Scheme. Patterns are used in the same way as a normal procedure or function, but is implemented using instantiation with arbitrary code transformation. They are useful for things that need to process arguments differently from normal semantics. The implementation of an operation pattern looks like a procedure or function definition, except that one or more of its parameters are meta-variables that take expression or statement terms, rather than values or variables. The argument terms and pattern body may be rewritten as desired by applying transforms to them (see examples below). When the operation pattern is instantiated, metavariables in the body are substituted, and any transformations are applied. The resulting code is inlined at the call site. Meta-variables are typed and are distinguished from normal variables through the type system, thus it is not necessary to use anti-quotation to indicate where
126
A.H. Bagge
meta-variables should be substituted. Operation patterns introduce a local scope, so local variables will not interfere with the call context. The semantic properties (typing rules, data-flow rules, etc.) of an operation pattern are handled automatically by the compiler, and calls to operation patterns are treated the same as normal operation calls during type checking and overload resolution. This means that they can be overloaded alongside normal operations, and follow normal module scoping and visibility rules. Processing code with operation pattern calls requires some extra care, so that arguments that should be treated as code terms won’t get rewritten or lifted out of the call. Operation patterns can also conveniently serve as implementations of syntax extensions, by desugaring the syntax extension into a call to the pattern. For example, the following operation pattern implements a simple way to substitute a default value when an expression yields some error value: forall type T procedure default(T e, T f, expr T d, out T ret) { ret = e; if(ret == f) ret = d; } The f is the failure value (null, for example), d is the default replacement, and e is the expression to be tested. Magnolia will automatically provide a function version of it: forall type T function T default(T e, T f, expr T d); which we can use like: name = default(lookup(db,key), "", "Lucy"); We can describe the behaviour of forall type T axiom default1(T if(e == f) assert default(e, if(e != f) assert default(e, if(f == d) assert default(e, if(f != d) assert default(e, } 3.2
default by axioms, for example: e, T f, T d) { f, d) <-> d; f, d) <-> e; f, d) <-> e; f, d) f;
Transforms
For further processing of language extension, we add a new meta-programming operation to Magnolia – the transform – corresponding to a rule or strategy in Stratego. Transforms work on the term representation of a program, taking at least one term plus possibly other values as arguments, and returning a replacement term. Provided semantic analysis has been done, term pattern matching in transforms are sensitive to typing, overloading and name scoping rules. A transform may call other transforms and operations, and may also manipulate symbol tables and other compiler state. Several transforms can share the same name; when applied they are tried in arbitrary order until one succeeds. In addition to explicit calling, transforms can also be controlled through
Yet Another Language Extension Scheme
127
Table 1. Transform classes: Topdown and bottomup traversals can be modified by repeat, once or frontier. The phase classes can be used to apply a transform before, during or after a particular compiler phase, or to trigger application of a compiler phase. Transforms can also be classified by use – for example, simplification transforms may be marked as such and used many places in the compiler. The ac class can be used to reorder expressions for associative-commutative matching. Traversals/modifiers repeat Can be used repeatedly once In traversal: Apply only once frontier In traversal: Stop on success topdown Traversal type bottomup Traversal type innermost Innermost reduction outermost Outermost reduction
Compiler Phases during(p) apply during p before(p) apply before p after(p) apply after p requires(p) run p first triggers(p) run p after
Uses typecheck simplify mutify ac
transform classes, which describe how and (possibly) when transforms should be applied. For example, a transform may have the classes innermost and during(desugar), signifying that it should be applied using an innermost strategy during the desugaring phase of the compiler. A sample transform is: forall int i1, int i2, int i3 transform example(expr i1 * i2 + i3 * i2) [simplify,repeat] = (i1 + i3) * i2; This example has a pattern with three meta-variables, i1, i2, i3, all of which will match only integer expressions. The expression pattern in the argument list will be matched against the code the transform is applied to, and will only match the integer versions of + and *. If the match is successful, the code is transformed to (i1 + i3) * i2. The transform classes simplify and repeat tell the compiler that this rule can be applied during program simplification, and that it will terminate if applied repeatedly. Table 1 shows a few different transform classes. Axioms, when used as rewrite rules, can also have classes assigned to them, making them usable as transforms [3]. Transforms can be applied directly in program code (most useful inside operation patterns). For example, var x = example(a * b + c * b); will apply the above transform (the expression to the left is implicitly passed as the first parameter) and rewrite the code to: var x = (a + c) * b; The double-bracket operator [[...]] can be used to apply inline rewrite rules, and to specify traversals – we’ll see examples of this later. 3.3
Semantic Rules
Semantic analysis rules are described by the typecheck transform, which takes a statement, expression or declaration as argument, and returns a resolved version
128
A.H. Bagge
of its argument – and its type, in the case of an expression. Resolving means annotating each use of an abstraction with a unique identifier that leads back to its declaration – this is typically taken care of internally in the compiler. Type checking of a declaration will typically involve adding declarations to the symbol table; type checking other constructs is typically a simple case of recursively type checking sub-constructs. A (simplified) typecheck rule for assignment statements is: forall name x, expr e transform typecheck(stat{x = e;}) = stat{x = e’;} where { var (e’, t) = typecheck(e); if(!compatible(typeof(x), t)) call fail("Incompatible types in assignment"); } Note that typechecking may be better described as more formal semantic rules which can be used as a basis for reasoning about typechecking and programs. This is an option we are exploring. Axioms [3] can describe the abstract semantics of a construct. This is only applicable to expression-like constructs at the moment, we should also have a way of describing other constructs. Implementation rules are used to compile constructs to lower-level code. Instantiation rules are triggered during semantic analysis, and receive the unique id of the abstraction and the use case, and produce an instantiated version. Other implementation rules are free-form and should be tied to a program traversal strategy and compiler phase. No effort is made on the part of the compiler to ensure that implementation rules don’t leave behind uncompiled constructs, though we are looking at techniques that can handle this [4]. Other compiler phases may also need rules – for example, doing data-flow analysis and program slicing requires information about which variables are read and written in a statement – the readset and writeset transforms are used for this purpose. Transforms may also be provided for mapping between statement and expression forms. By keeping track of semantic information, we can make more powerful extensions. For example, with the following extended version of default a failure value is no longer needed – it is obtained automatically from a function declaration attribute: forall type T function T default(expr T e, expr T d) = default(e, getAttr("fail_value", e), d); 3.4
Module-Level and Global Extensions
Language extension should normally be done at the module level, so that some modules in your program may use the extension, and others won’t. For example, if your extension defines a restricted subset of Magnolia with some DSL features, you probably still want the compiler to process Magnolia libraries as if they were written in normal Magnolia. Therefore, Magnolia extensions have scope:
Yet Another Language Extension Scheme
129
– The names of transforms and operation patterns are accessible in the module in which they are defined and in modules that import them, just as with other operations. – Transforms are normally applied to the whole program. Semantically aware term pattern matching ensure that only relevant parts of the code are touched, not code that merely looks similar to what is described by the pattern. – For syntax extensions and language-changing transforms that should only be applied to certain modules, there is a language declaration in the module header that can be used to import extension modules. Transforms imported via language are only applied to the local module. 3.5
Example Extensions
We will give two example extensions, one which uses transforms to enforce a restriction on the language, and one which uses operation patterns to add a map construct. Impure procedures are ones that violate the assumption that two calls with equivalent inputs give equivalent results. I/O is typically impure, a random generator that keeps track of the seed would also be impure. Since pure code is easier to reason about, we might want to have a sub-language of Magnolia where calls to impure code is forbidden. We implement this in a module pure, which is used by putting language pure in the module header of pure modules. Our language module contains the following transform: transform purity(stat{call p(_*)}) [after(typecheck)] where if(getAttr("impure", p)) call error("In call to ", p, " -- impure calls forbidden"); The transform purity will be applied to the code in all language pure modules after type checking is done (since the type checker might be used to infer impurity), and will match procedure calls. If the called procedure has the impure attribute, a compiler error is triggered. The map operation applies an operation element-wise to the elements of one or more indexable data structures (arrays, for example). Our map works on multiple indexables at the same time (like Lisp’s mapcar), without the overhead of dealing with a list of indexables at runtime. For example, A = map(@A * @B + @C); // map *,+ over elements of A, B, C A = map(@A * 5); // multiply all elements of A by 5 A = map(@A * V + @C); // V is indexable, but used as-is While map in Lisp and functional languages traditionally takes a function (or lambda expression) and one or more lists as arguments – we will instead integrate everything as one argument, making it look more like a list comprehension. Indexables marked with an @-sign are those that should have element-wise. The @ is just a dummy operator, defined as: forall type A, type I, type E where Indexable(A, I, E) function E @_(A a);
130
A.H. Bagge
This function is generic in E (element type), A (indexable/array type) and I (index type) – together, these must satisfy the Indexable concept. Applying the @-operator outside a map operation will lead to a compilation error – this should ideally be checked for and reported in a user-friendly manner. A generic implementation of map is: forall type A, type I, type E where Indexable(A, I, E) procedure map(expr E e, out A a) { // define index space as minimum of input index spaces var idxSpace = min(e[[collect,frontier: @x:A -> indexes(x)]]); call create(a,idxSpace); // create output array for i in indexes(a) { // do computation a[i] = e[[topdown,frontier: @x:A -> x[i]]]; } } The implementation accepts an expression e (of the element type) and an output array a. The body of map is the pattern for doing maps, and this will be instantiated for each expression it is called with by substituting meta-variables and optionally performing transformations. Note that the statements in the pattern are not meta-level code, but templates to be instantiated. The [[...]] code are transformations which are applied to e – the result is integrated into the code, as if it had been written by hand. The first transformation uses a collect traversal, which collects a list of the indexables, rewriting them to expressions which compute their index spaces on the way. This is used in creating the output array. The computation itself is done by iterating over the index space, and computing the expressions while indexing the @-marked indexables of type A. The frontier traversal modifier prevents the traversal from recursing into an expression marked with @ – in case we have nested maps. As an example of map, consider the following: Z = map(@X * 5 + @Y); where X and Y are of type array(int). Here map is used as a function – the compiler will mutify the expression, obtaining: call map(@X * 5 + @Y, Z); At this point we can instantiate it and replace the call, giving var idxSpace = min([indexes(X), indexes(Y)]); call create(Z,idxSpace); for i in indexes(Z) { Z[i] = X[i] * 5 + Y[i]; } which will be inlined directly at the call site. Now that we have gone to the trouble of creating an abstraction for elementwise operations, we would expect there to be some benefit to it, over just writing for-loop code. Apart from the code simplification at the call site, and the fact that we can use map in expressions, we can also give the compiler more information about it. For example, the following axiom neatly sums up the behaviour of map:
Yet Another Language Extension Scheme
131
forall type A, type I, type E where Indexable(A, I, E) axiom mapidx(expr E e, I i) { map(e)[i] <-> e[[topdown,frontier: @x:A -> x[i]]]; } applying map and then indexing the result is the same as just indexing the indexables directly and computing the map expression. Furthermore, we can also easily do optimisations like map/map fusion and map/fold fusion, without the analysis needed to perform loop fusion.
4
Conclusion
There is a wealth of existing research in language extension [5,6,7] and extensible compilers [8,9], and little space for a comprehensive discussion here. Lisp dialects like Common Lisp [10] and Scheme [11] come with powerful macro facilities that are used effectively by programmers. The simple syntax give macros a feel of being part of the language, and avoids issues with syntactic extensions. C++ templates are often used for meta-programming, where techniques such as expression templates [12] allow for features such as the map operation described in Section 3.5 (though the implementation is a lot more complicated). Template Haskell [13] provides meta-programming for Haskell. Code can be turned into an abstract syntax tree using quasi-quotation and processed by Haskell code before being spliced back into the program and compiled normally. Template Haskell also supports querying the compiler’s symbol tables. MetaBorg [14] provides syntax extensions based on Stratego/XT. Syntax extension is done with the modular SDF2 system, and the extensions are desugared (“assimilated”) into the base language using concrete syntax rules in Stratego. Andersen and Brabrand [4] describe a safe and efficient way of implementing some types of language extensions using catamorphisms that map to simpler language constructs, and an algebra for composing languages. We have started implementing this as a way of desugaring syntax extensions. We aim to deal with semantic extension rather than just syntactic extension provided by macros. We do this by ensuring that transformations obey overloading and name resolution, by allowing extension of arbitrary compiler phases, and allowing the abstract semantics of new abstractions to be described by axioms. The language XL [15] provide a type macro-like facility with access to static semantic information – somewhat similar to operation patterns in Magnolia. In this paper we have discussed how to describe language extensions and presented extension facilities for the Magnolia language extensions, with support for static semantic checking and scoping. The facilities include macro-like operation patterns, and transforms can perform arbitrary transformations of code. Transforms can be linked into the compiler at different stages in order to implement extensions by transforming extended code to lower-level code. Static semantics of extensions can be given by hooking transforms into the semantic analysis phase of the compiler.
132
A.H. Bagge
A natural next step is to try and implement as much of Magnolia as possible as extensions to a simple core language. This will give a good feel for what abstractions are needed to implement full-featured extensions, and also entails building a mature implementation of the extension facility – currently we are more in the prototype stage. There are also many details to be worked out, such as a clearer separation between code patterns, variables and transformation code, name capture / hygiene issues, and so on. The Magnolia compiler is available at http://magnolia-lang.org/. Acknowledgements. Thanks to Magne Haveraaen and Valentin David for input on the Magnolia compiler, and to Karl Trygve Kalleberg and Eelco Visser for inspiration and many discussions in the early phases of this research.
References 1. Bravenboer, M., Kalleberg, K.T., Vermaas, R., Visser, E.: Stratego/XT 0.17. A language and toolset for program transformation. Science of Computer Programming 72(1-2), 52–70 (2008) 2. Bagge, A.H., Haveraaen, M.: Interfacing concepts: Why declaration style shouldn’t matter. In: LDTA 2009. ENTCS, York, UK (March 2009) 3. Bagge, A.H., Haveraaen, M.: Axiom-based transformations: Optimisation and testing. In: LDTA 2008, Budapest. ENTCS, vol. 238, pp. 17–33. Elsevier, Amsterdam (2009) 4. Andersen, J., Brabrand, C.: Syntactic language extension via an algebra of languages and transformations. In: LDTA 2009. ENTCS, York, UK (March 2009) 5. Brabrand, C., Schwartzbach, M.I.: Growing languages with metamorphic syntax macros. In: PEPM 2002, pp. 31–40. ACM, New York (2002) 6. Standish, T.A.: Extensibility in programming language design. SIGPLAN Not. 10(7), 18–21 (1975) 7. Wilson, G.V.: Extensible programming for the 21st century. Queue 2(9), 48–57 (2005) 8. Nystrom, N., Clarkson, M.R., Myers, A.C.: Polyglot: An extensible compiler framework for Java. In: Hedin, G. (ed.) CC 2003. LNCS, vol. 2622, pp. 138–152. Springer, Heidelberg (2003) 9. Ekman, T., Hedin, G.: The JastAdd extensible Java compiler. In: OOPSLA 2007, pp. 1–18. ACM, New York (2007) 10. Graham, P.: Common LISP macros. AI Expert 3(3), 42–53 (1987) 11. Dybvig, R.K., Hieb, R., Bruggeman, C.: Syntactic abstraction in scheme. Lisp Symb. Comput. 5(4), 295–326 (1992) 12. Veldhuizen, T.L.: Expression templates. C++ Report 7(5), 26–31 (1995); Reprinted in C++ Gems, ed. Stanley Lippman 13. Sheard, T., Jones, S.P.: Template meta-programming for Haskell. In: Haskell 2002, pp. 1–16. ACM, New York (2002) 14. Bravenboer, M., Visser, E.: Concrete syntax for objects: domain-specific language embedding and assimilation without restrictions. In: OOPSLA 2004, pp. 365–383. ACM Press, New York (2004) 15. Maddox, W.: Semantically-sensitive macroprocessing. Technical Report UCB/CSD 89/545, Computer Science Division (EECS), University of California, Berkeley, CA (1989)
Model Transformation Languages Relying on Models as ADTs Jerónimo Irazábal and Claudia Pons LIFIA, Facultad de Informática, Universidad Nacional de La Plata Buenos Aires, Argentina {jirazabal,cpons}@lifia.info.unlp.edu.ar
Abstract. In this paper we describe a simple formal approach that can be used to support the definition and implementation of model to model transformations. The approach is based on the idea that models as well as metamodels should be regarded as abstract data types (ADTs), that is to say, as abstract structures equipped with a set of operations. On top of these ADTs we define a minimal, imperative model transformation language with strong formal semantics. This proposal can be used in two different ways, on one hand it enables simple transformations to be implemented simply by writing them in any ordinary programming language enriched with the ADTs. And on the other hand, it provides a practical way to formally define the semantics of more complex model transformation languages. Keywords: Model driven software engineering, Model transformation language, denotational semantics, Abstract data types, ATL.
learning time that cannot be afforded in most projects; on the other hand considerable investment in new tools and development environments is necessary. And finally, the semantics of these specific languages is not formally defined and thus, the user is forced to learn such semantics by running transformations example suites within a given tool. Unfortunately, in many cases the interpretation of a single syntactic construct varies from tool to tool. Additionally other model engineering instruments, such as mechanism for transformation analysis and optimization, can be only built on the basis of a formal semantics for the transformation language; therefore, a formal semantics should be provided. To overcome these problems, in this paper we describe a minimal, imperative approach with strong formal semantics that can be used to support the definition and implementation of practical transformations. This approach is based on the idea of using “models as abstract data types” as the basis to support the development of model transformations. Specifically, we formalize models and metamodels as abstract mathematical structures equipped with a set of operations. The use of this approach enables transformations to be implemented in a simpler way by applying any ordinary imperative programming language enriched with the ADTs, thus avoiding the need of having a full model transformation platform and/or learning a new programming paradigm. Additionally, the meaning of the transformation language expressions is formally defined, enabling the validation of transformation programs. Complementary, this approach offers an intermediate abstraction level which provides a practical way to formally define the semantics of higher level model transformation languages The paper is organized as follows. Section 2 provides the formal characterization of models and metamodels as abstract mathematical structures equipped with a set of operations. These mathematical objects are used in section 3 for defining the semantics of a basic transformation language. Section 4 illustrates the use of the approach to solve a simple transformation problem, while Section 5 shows the application of the approach to support complex transformation languages (in particular ATL). Section 6 compares this approach with related research and Section 7 ends with the conclusions.
2 Model Transformation Languages with ADTs A model transformation is a program that takes as input a model element and provides as output another model element. Thinking about the development of this kind of programs there are a number of alternative ways to accomplish the task: A very basic approach would be to write an ordinary program containing a mix of loops and if statements that explore the input model, and create elements for the output model where appropriate. Such an approach would be widely regarded as a bad solution and it would be very difficult to maintain. An approach situated on the other extreme of the transformation language spectrum would be to count with a very high level declarative language specially designed to write model transformations (e.g. QVT Relations [2]). With this kind of language we would write the ‘what’ about the transformation without writing the ‘how’. Thus, neither concrete mechanism to explore the input model nor to create the output model is exposed in the program. Such an approach is very elegant and concise, but the
Model Transformation Languages Relying on Models as ADTs
135
meaning of the expressions composing these high level languages becomes less intuitive and consequently hard to understand. In addition, the implementation of heavy supporting framework is required (e.g. MediniQVT supporting tools [6]). A better solution, from a programming perspective, would be to build an intermediate abstraction level. We can achieve this goal by making use of abstract data types to structure the source and target models. This solution provides a controlled way to traverse a source model, and a reasonable means to structure the code for generating an output model. With this solution we would raise the abstraction level of transformation programs written in an ordinary programming language, while still keeping the control on the model manipulation mechanisms. Additionally we do not need to use a new language for writing model transformations, since any ordinary programming language would be sufficient. Towards the adoption of the later alternative, in this section we formally define the concepts of model and metamodel as Abstract Data Types (ADTs), that is to say as abstract structures equipped with a set of operations. Definition 1: A metamodel is a structure mm = (C, A, R, s, a, r) where C is the set of classes, A is the set of attributes and R is the set of references respectively; s is an anti-symmetric relation over C interpreted as the superclass relation, a maps each attribute to the class it belongs to; and r maps each reference to its source and target classes. For example, a simplified version of the Relational Data Base metamodel is defined as MMRDB =(C, A, R, s, a, r), where: C={Table, Column, ForeignKey} A={nameTable, nameColumn} R={columnsTable2Column , primaryKeyTable2Column , foreignKeysTable2ForeignKey , tableForeignKey2Table} s= {} a= {(nameTable, Table), (nameColumn, Column)} r= {(columnsTable2Column, (Table, Column)), (primaryKeyTable2Column, (Table, Column)), (foreignKeysTable2ForeignKey, (Table, ForeignKey)), (tableForeignKey2Table, (ForeignKey, Table))} The usual way to depict a metamodel is by drawing a set of labeled boxes connected by labeled lines; however the concrete appearance of the metamodel is not relevant for our purposes. Figure 1 shows the simplified metamodel of the Relational Data Base language. foreignKeys ForeignKey
Table name
columns primaryKey
Column name
table
Fig. 1. The metamodel of the Relational Data Base language
136
J. Irazábal and C. Pons
For the sake of simplicity, we assume single-valued single-typed attributes and references without cardinality specification. The previous metamodel definition could be easily extended to support multi-valued multi-typed attributes and to allow the specification of reference’s cardinality; however, in this paper those features would only complicate our definitions, hindering the understanding of the core concepts. Definition 2: A model is a structure m = (C, A, R, s, a, r, E, c, νa, νr) where mm = (C, A, R, s, a, r) is a metamodel, E is the set of model elements, c maps each element to the class it belongs to, va applied to an attribute and to an element returns the value of such attribute (or bottom, if the attribute is undefined), vr applied to a reference and an element returns a set of elements that are connected on the opposite end of the reference. In such case, we say that model m is an instance of metamodel mm. When the metamodel is obvious from the context we can omit it in the model structure. For example, let m = (C, A, R, s, a, r, E, c, νa, νr) be an instance of the MMRDB metamodel, where: mm = (C, A, R, s, a, r) is the metamodel defined above, E={Book, Author, nameBook, editorialBook, authorsBook2Author, nameAuthor} c={(Book, Table), (Author, Table), (nameBook, Column), (editorialBook, Column), (authorsBook2Author, ForeignKey), (nameAuthor, Column)} νa={((nameTable, Book), Book), ((nameTable, Author), Author)} νr= {((columnsTable2Column, Book), {nameBook, editorialBook}) , ((columnsTable2Column, Author), {nameAuthor}), ((primaryKeyTable2Column, Book), {nameBook}), ((primaryKeyTable2Column, Author), {nameAuthor}), ((foreignKeysTable2ForeignKey, Book), {authorsBook2Author}), ((tableForeignKey2Table, authorsBook2Author), {Book})}) Figure 2 illustrates the instance of the MMRDB metamodel in a generic graphical way. The concrete syntax of models is not relevant here. Book pkey: name editorial
authors
Author pkey: name
Fig. 2. An instance of the MMRDB metamodel
After defining the abstract structure of models and metamodels we are ready to define a set of operations on such structure. These operations complete the definition of the Abstract Data Type. Let M be the set of models and let MM be the set of metamodels, as defined above. The following functions are defined: (1) The function metamodel() returns the metamodel of the input model. metamodel: M → MM metamodel (C, A, R, s, a, r, E, c, va, vr) = (C, A, R, s, a, r) (2) The function classOf() returns the metaclass of the input model element in the context of a given model. classOf: E → M → C classOf e (C, A, R, s, a, r, E, c, va, vr) = c(e)
Model Transformation Languages Relying on Models as ADTs
137
(3) The function elementsOf() returns all the instances of the input class in the context of a given model. Instances are obtained by applying the inverse of function c. elementsOf: C → M → P(E) elementsOf c (C, A, R, s, a, r, E, c, va, vr) = c-1(c) (4) The function new() creates a new instance of the input class and inserts it into the input model. new: C → M → ExM new c (C, A, R, s, a, r, E, c, va, vr) = (e, (C, A, R, s, a, r, E∪{e}, c[e←c], va, vr)), with e ∉ E (5) The function delete() eliminates the input element from the input model. delete: E → M → M delete e (C, A, R, s, a, r, E, c, va, vr) = (C, A, R, s, a, r, E’, c’, va’, vr’), with E’ = E-{e}, c’ = c-{(e,c(e))}, va’ = va-{(a,(e’,n))| e=e’ ^ (a,(e’,n)) ∈ va} vr’ = vr-{(r,(e’,es))| e=e’ ^ (r,(e’,es)) ∈ vr} (6) The function getAttribute() returns the value of the input attribute in the input element belonging to the input model. getAttribute: A → E → M → Z⊥ getAttribute a e (C, A, R, s, a, r, E, c, va, vr) = va(a)(e) (7) The function setAttribute() returns an output model resulting from modifying the value of the input attribute in the input element of the input model. setAttribute: A → E → Z⊥ → M → M setAttribute a e n (C, A, R, s, a, r, E, c, va, vr) = (C, A, R, s, a, r, E, c, va[va(a)[e←n]], vr), if (a,s(e)) ∈ a (C, A, R, s, a, r, E, c, va, vr), if (a,ς(e)) ∉ a (8) The function getReferences() returns the set of elements connected to the input element by the input reference in the input model. getReferences: R → E → M → P(E) getReferences r e (C, A, R, s, a, r, E, c, va, vr) = vr(r)(e) (9) The function addReference() returns an output model resulting from adding a new reference (between the two input elements) to the input model. addReference: R → E → E → M → M addReference r e e’ (C, A, R, s, a, r, E, c, va, vr) = (C, A, R, s, a, r, E, c, va, vr ∪(r,(e,e’))), if (r,(c(e), c(e’))) ∈ r (C, A, R, s, a, r, E, c, va, vr), if (r,(c(e), c(e’))) ∉ r (10) The function removeReference() returns an output model resulting from deleting the input reference between the two input elements from the input model. removeReference: R → E → E → M → M removeReference r e e’ (C, A, R, s, a, r, E, c, va, vr) = (C, A, R, s, a, r, E, c, va, vr -(r,(e,e’))) The remaining functions (e.g. similar functions, but on the metamodel level) are omitted in this paper for space limitations.
138
J. Irazábal and C. Pons
3 A Simple Yet Powerful Imperative Transformation Language In this section we define SITL a simple imperative transformation language that supports model manipulation. This language is built on top of a very simple imperative language with assignment commands, sequential composition, conditionals, and finite iterative commands. As a direct consequence, this language has a very intuitive semantics determined by its imperative constructions and by the underlying model ADT. This language is not intended to be used to write model transformation programs, rather it is proposed as a representation of the minimal set of syntactic constructs that any imperative programming language must provide in order to support model transformations. In practice we will provide several concrete implementations of SITL. Each concrete implementation consists of two elements: an implementation of the ADTs and a mapping from the syntactic constructs of SITL to syntactic constructs of the concrete language. 3.1 Syntax The abstract syntax of STIL is described by the following abstract grammar: ::= null | 0 | 1 | 2 | … | | - | + | - | * | ÷ | <elemexp> . | size <elemlistexp> ::= true | false | = | < | > | ¬ | ∧ | ∨ | <elemexp> = <elemexp> | contains <elemlistexp> <elemexp> <modelexp> ::= m1 | m2 | ... ::= c1 | c2 | …| classof <elemexp> ::= a1 | a2 | … ::= r1 | r2 | … <elemexp> ::= <elemvar> | <elemlistexp> () <elemlistexp> ::= elementsOfClass inModel <modelexp> | <elemlistvar> | <elemexp> . ::= := in | <elemvar> := <elemexp> in | <elemlistvar> := <elemlistexp> in | ; | skip | if then else | for from to do | add <elemexp> to < elemlistvar> | remove <elemexp> from <elemlistvar> | <elemexp> . := | addRef <elemexp> <elemexp> | removeRef <elemexp> <elemexp> | forEachElem <elemvar> in <elemlistexp> where do | newElem <elemvar> ofclass inModel <modelexp> | deleteElem <elemexp>
Model Transformation Languages Relying on Models as ADTs
139
<procD>::=proc <procparams> beginproc endproc| <procD>;<procD> ::= | call (actualparams) | ; <program>::= <procD> Currently, we consider three types of variables: integer variables, element variables, and list of elements variables. It is worth to remark that STIL is limited to finite programs; we argue that model to model transformation should be finite, so this feature is not restrictive at all. Denotational Semantics The semantics of SITL is defined in the standard way [7]; we define semantic functions that map expressions of the language into the meaning that these expressions denote. The usual denotation of a program is a state transformer. In the case of SITL each state holds the current value for each variable and a list of the models that can be manipulated. More formally, a program state is a structure σ = (σM, σEM, σE, σEs, σZ) where σM is a list of models, σEM maps each element to the model it belongs to, σE maps element variables to elements, σEs maps element list variable to a list of elements, and σZ maps integer variables to its integer value or to bottom. Let Σ denote the set of program states; the semantic functions have the following signatures: [[-]]intexp : → Σ → Ζ⊥ [[-]]boolexp : → Σ → B [[-]]modelexp : <modelexp> → Σ → M⊥ [[-]]classexp : → Σ → C [[-]]elemexp : <elemexp> → Σ → E [[-]]elemlistexp : <elemlistexp> → Σ → [E] [[-]]attrexp : → Σ → Ζ [[-]]refexp : → Σ → Ζ [[-]]comm : → Σ → Σ Then, we define theses functions by semantic equations. The semantic equations of Integer expressions and Boolean expressions are omitted, as well as some equations related to well-understood constructs such as conditionals, sequences of commands and procedure calls. For the following equations let σ = (σM, σEM, σE, σEs, σZ) ∈ Σ: − Equations for integer expressions [[null]]intexp σ = ⊥ [[e . a]]intexp σ = getAttribute ([[a]]attrexp σ) ([[e]]elemexp σ) (σM (σEM ([[e]]elemexp σ))) − Equations for class expressions [[classof e]]classexp σ = classOf ([[e]]elemexp σ) (σM (σEM ([[e]]elemexp σ))) − Equations for element expressions [[ex]]elemexp σ = σE (ex) − Equations for element list expressions [[elementsOfClass c inModel m]]elemlistexp σ = elementsOf ([[c]]classexp σ) (σM ([[m]]modelexp σ))
140
J. Irazábal and C. Pons
[[esx]]elemlistexp σ = σEs (esx) [[e . r]]elemlistexp σ = getReferences ([[r]]refexp σ) ([[e]]elemexp σ) (σM (σEM ([[e]]elemexp σ))) − Equations for commands [[x := ie]]comm σ = (σM, σEM, σE, σEs, σZ[x←([[ie]]intexp σ)]) [[ex := ee]]comm σ = (σM, σEM, σE[ex ←([[ee]]elemexp σ)], σEs, σZ) [[e . a := ie]]comm σ = setattribute ([[a]]attrexp σ) ([[e]]elemexp σ) ([[ie]]intexp σ) (σM (σEM ([[e]]elemexp σ))) [[newElem ex ofclass c inModel m]]comm σ = (σM’, σEM’, σE’, σEs, σZ) with im = [[m]]modelexp σ, (e,m) = new ([[c]]classexp σ) (σM (im)), σM’ = σM[im ← m], σE’ = σE[ex ← e], σEM’ = σME[e ← im] [[deleteElem e]]comm σ = (σM’, σEM’, σE, σEs, σZ) with e’ = [[e]]elemexp σ, im = σEM e’, m = delete e’ (σM im) σM’ = σM[im ← m], σEM’ = σME[e ← im] [[for x from ie1 to ie2 do c]]comm σ = iSec ([[ie1]]intexp σ) ([[ie2]]intexp σ) x c σ iSec n m x c σ = σ, if n > m iSec (n+1) m x c ([[c]]comm ((σM, σEM, σE, σEs, σZ[x←n]))), if n ≤ m [[forEachElem ex in es where b do c]]comm σ = eSec ([[es]]elemlistexp σ) ex b c σ eSec es ex b c σ =σ, if es = ∅ eSec es’ ex b c σ’’, es ≠ ∅ with es = e:es’, σ’ = (σM, σEM, σE[ex ← e], σEs, σZ) σ’’ = [[c]]comm σ’, if [[b]]boolexp σ’, σ’’ = σ’, if not [[b]]boolexp σ’ By applying these definitions we are able to prove whether two programs (i.e. transformations) are equivalent. Definition 3: two programs t and t’ are equivalent if and only if ([[t]]comm σ)σM = ([[t’]]comm σ)σM, for all σ ∈ Σ. Note that this definition does not take the values of variables into consideration, so two programs using different sets of internal variables would even be equivalent. Equivalence is defined considering only the input and output models (observable equivalence).
4 A Simple Example Let mm be the metamodel defined in section 2; let m1 be an instance of mm and m2 be the empty instance of mm. The following SITL program when applied to a state containing both the model m1 and the model m2 will populate m2 with the tables in m1, but none of the columns, primary keys or foreign keys will be copied to m2. forEachElem t in (elementsOfClass Table inModel m1) where true do newElem t’ ofClass Table inModel m2; t’.name = t.name;
Model Transformation Languages Relying on Models as ADTs
141
The resulting model is m2=(E, c, νa, νr) where, E={Book, Author}, c={(Book, Table), (Author, Table)}, νa={((nameTable, Book), Book), ((nameTable, Author), Author)}, νr = ∅). A formal proof of the correctness of this transformation would be written in a straightforward way by using the SITL’s semantics definition.
5 Encoding ATL in SITL Due to the fact that SITL is situated at midway between ordinary programming languages and transformation specific languages, such intermediate abstraction level makes it suitable for being used to define the semantics of more complex transformation languages. With the aim of showing an example in this section we sketch how to encode ATL in SITL. Each ATL rule is encode into a SITL procedure. Lets considerer the following simple rule template in ATL: module m from m1: MM1 to m2: MM2 rule rule_name { from in_var1 : in_class1!MM1 (condition1), … in_varn : in_classn!MM1 (conditionn) to out_var1 : out_class1!MM2 (bindings1), … out_varm : out_classm!MM2 (bindingsm) do {statements}} The equivalent code fragment in SITL would be: proc rule_name () beginproc forEachElem in_var1 in (elementsOfClass in_class1 inModel m1) where condition1 do … forEachElem in_varn in(elementsOfClass in_classn inModel m1) where conditionn do newElem out_var1 ofclass out_class1 inModel m2; … newElem out_varm ofclass out_classm inModel m2; bindings1; … bindingsm; statements; endproc A more complete encoding of ATL in SITL taking into account called rules and lazy unique rules can be read in [8].
142
J. Irazábal and C. Pons
6 Related Work Sitra [9] is a minimal, Java based, library that can be used to support the implementation of simple transformations. With a similar objective, RubyTL [10] is an extensible transformation language embedded in the Ruby programming language. These proposals are related to ours in the sense that they aim at providing a minimal and familiar transformation framework to avoid the cost of learning new concepts and tools. The main difference between these works and the proposal in this paper is that we are not interested in a solution that remains confined to a particular programming language, but rather in a language-independent solution founded on a mathematical description. Barzdins and colleagues in [11] define L0, a low level procedural strongly typed textual model transformation language. This language contains minimal but sufficient constructs for model and metamodel processing and control flow facilities resembling those found in assembler-like languages and it is intended to be used for implementation of higher-level model transformation languages by the bootstrapping method. Unlike our proposal, this language does not have a formal semantics neither is based on the idea of models as ADTs. Rensink proposes in [12] a minimal formal framework for clarifying the concept of model, metamodel and model transformation. Unlike that work, our formal definitions are more understandable while still ensuring the characterization of all relevant features involved in the model transformation domain. Additionally, the proposal in [12] does not define a particular language for expressing transformations. On the other hand, due to the fact that SITL is situated at midway between ordinary programming languages and transformation specific languages, such intermediate abstraction level makes it suitable for being used to define the semantics of complex transformation languages. In contrast to similar approaches - e.g. the translation of QVT to OCL+Alloy presented in [13] or the translation of QVT to Colored Petri Nets described in [14] - our solution offers a significant reduction of the gap between source and target transformation languages.
7 Conclusions In this paper we have proposed the use of “models as abstract data types” as the basis to support the development of model transformations. Specifically, we have formalized models and metamodels as abstract mathematical structures equipped with a set of operations. This abstract characterization allowed us to define a simple transformation approach that can be used to support the definition and implementation of model-to-model transformations. The core of this approach is a very small and understandable set of programming constructs. The use of this approach enables transformations to be implemented in a simpler way by applying any ordinary imperative programming language enriched with the ADTs, thus we avoid the overhead of having a full model transformation platform and/or learning a new programming paradigm.
Model Transformation Languages Relying on Models as ADTs
143
Additionally, the meanings of expressions from the transformation language are formally defined, enabling the validation of transformation specifications. Such meaning is abstract and independent of any existing programming language. Finally, we have shown that other well-known model transformation languages, such as ATL, can be encoded into this frame. Thus, this approach provides a practical way to formally define the semantics of complex model transformation languages.
References [1] Stahl, T., Völter, M.: Model-Driven Software Development. John Wiley & Sons, Ltd., Chichester (2006) [2] QVT Adopted Specification 2.0 (2005), http://www.omg.org/docs/ptc/05-11-01.pdf [3] Jouault, F., Kurtev, I.: Transforming Models with ATL. In: Bruel, J.-M. (ed.) MoDELS 2005. LNCS, vol. 3844, pp. 128–138. Springer, Heidelberg (2006) [4] Lawley, M., Steel, J.: Practical Declarative Model Transformation With Tefkat. In: Bruel, J.-M. (ed.) MoDELS 2005. LNCS, vol. 3844, pp. 139–150. Springer, Heidelberg (2006) [5] Varro, D., Varro, G., Pataricza, A.: Designing the Automatic Transformation of Visual Languages. Science of Computer Programming 44(2), 205–227 (2002) [6] Medini, QVT. ikv++ technologies ag, http://www.ikv.de (accessed in December 2008) [7] Hennessy’s Semantics of Programming Languages. Wiley, Chichester (1990) [8] Irazabal, J.: Encoding ATL into SITL. Technical report (2009), http://sol.info.unlp.edu.ar/eclipse/atl2sitl.pdf [9] Akehurst, D.H., Bordbar, B., Evans, M.J., Howells, W.G.J., McDonald-Maier, K.D.: SiTra: Simple Transformations in Java. In: Nierstrasz, O., Whittle, J., Harel, D., Reggio, G. (eds.) MoDELS 2006. LNCS, vol. 4199, pp. 351–364. Springer, Heidelberg (2006) [10] Sánchez Cuadrado, J., García Molina, J., Menarguez Tortosa, M.: RubyTL: A Practical, Extensible Transformation Language. In: Rensink, A., Warmer, J. (eds.) ECMDA-FA 2006. LNCS, vol. 4066, pp. 158–172. Springer, Heidelberg (2006) [11] Barzdins, J., Kalnins, A., Rencis, E., Rikacovs, S.: Model Transformation Languages and Their Implementation by Bootstrapping Method. In: Avron, A., Dershowitz, N., Rabinovich, A. (eds.) Pillars of Computer Science. LNCS, vol. 4800, pp. 130–145. Springer, Heidelberg (2008) [12] Rensink, A.: Subjects, Models, Languages, Transformations. In: Dagstuhl Seminar Proceedings 04101 (2005), http://drops.dagstuhl.de/opus/volltexte/2005/24 [13] Garcia, M.: Formalization of QVT-Relations: OCL-based static semantics and Alloybased validation. In: MDSD today, pp. 21–30. Shaker Verlag (2008) [14] de Lara, J., Guerra, E.: Formal Support for QVT-Relations with Coloured Petri Nets. In: Schürr, A., Selic, B. (eds.) MODELS 2009. LNCS, vol. 5795, pp. 256–270. Springer, Heidelberg (2009)
Towards Dynamic Evolution of Domain Specific Languages Paul Laird and Stephen Barrett Department of Computer Science, Trinity College, Dublin 2, Ireland {lairdp,stephen.barrett}@cs.tcd.ie
Abstract. We propose the development of a framework for the variable interpretation of Domain Specific Languages (DSL). Domains often contain abstractions, the interpretation of which change in conjunction with global changes in the domain or specific changes in the context in which the program executes. In a scenario where domain assumptions encoded in the DSL implementation change, programmers must still work with the existing DSL, and therefore take more effort to describe their program, or sometimes fail to specify their intent. In such circumstances DSLs risk becoming less fit for purpose. We seek to develop an approach which makes a DSL less restrictive, maintaining flexibility and adaptability to cope with changing or novel contexts without reducing the expressiveness of the abstractions used.
1
Introduction
In this position paper we propose a model for the dynamic interpretation of Domain Specific Languages (DSLs). We believe that this is an important but as yet largely unexplored way to support changes in a program’s execution, which varying context may require. The benefit such an approach would deliver is a capacity to evolve a program’s behaviour to adapt to changing context, but without recourse to program redevelopment. A key benefit of this approach would be the ability to simultaneously adapt several applications through localised change in DSL interpretation. Our research seeks to explore the potential of this form of adaptation as a mechanism for both systemic scale and context-driven adaptation. Domain Specific language constructs are a powerful method of programming primary functionality in a domain. A recent study by Kosar et al.[6] found that end-user effort required to specify a correct program was reduced by comparison to standard programming practice. However development of DSL systems is time consuming and expensive[11]. Requirements that emerge during development may end up left out, leaving the language release suboptimal, or if included, may delay the release as the compiler or generator must be updated. Modelling lag [15] results. Domain evolution may also render inappropriate the formulae which roll up complex semantics into simple, accessible and expressive DSL statements, inappropriate. Where variability is high, resulting DSL constructs can become unwieldy or low-level in response. M. van den Brand, D. Gaˇ sevi´ c, J. Gray (Eds.): SLE 2009, LNCS 5969, pp. 144–153, 2010. c Springer-Verlag Berlin Heidelberg 2010
Towards Dynamic Evolution of Domain Specific Languages
145
Updates of general purpose languages are overwhelmingly polymorphic in nature in order to ensure backward compatibility. It would generally be inappropriate to change the interpretation of low-level constructs such as byte streams and classes. However, because the underlying semantics of high level DSL terms may vary over the life cycle of the DSL, we argue that these semantic changes are best implemented in a manner capable of equivalent adaptation. If the intent or purpose of the program are not being changed nor should the program. The decoupling of program intent and implementation would allow for a new form of dynamic, post-deployment adaptation, with possibilities for program evolution by means other than those offered by current adaptation techniques. The cost of developing a new domain specific language would be reduced by the use of such a framework for DSL interpretation. If any common features were shared with an existing DSL, their implementations and specifications could be reused in the new language.
2
Proposed Solution
We propose to investigate the feasibility of varying the interpretation of a domain specific program as an adaptation strategy. Our solution is component based, and would involve the dynamic reconfiguration of the interactions of running components that constitute a DSL interpreter. Figure 1 Shows the architecture of the proposed solution. The language specification functions as a co-ordination model, specifying the structure and behaviour of the interpreter. The language specification is interpreted by a generic interpreter, which co-ordinates the interactions of executing components, shown as diamonds, to yield the required behaviour. Context is used to switch between variations of the interpretation, for example to deal with network degradation in mobile applications. The interpreter, on reading a statement, would instantiate the elements required to execute the statement, combine them in a configuration which matches the specification statement’s terms in the language description, provide the components with access to the appropriate data as inputs, and locations in which to store the outputs. The effect achieved by using a generic interpreter, relying on language input to determine how it interprets a program, is to support a kind of adaptation based on changing the way in which a program is interpreted, by selective reconstruction of the interpreter. In order for the dynamic adaptation outlined earlier to function, the correct interpreter must be running after the adaptation, there must be a mechanism for the replacement of the version of the language in play with a newer version in the event of an update by the system. In our model, this amounts to dynamic architectural variation through component recomposition [3]. The architecture we are proposing to test the execution of DSLs is Service Oriented [1] , with the interpreter maintaining state and co-ordinating the instantiation and replacement of components as necessary. Some of these components could be stubs communicating with external services. Our approach proposes to use the late binding of service oriented computing to allow flexibility in the
146
P. Laird and S. Barrett
Fig. 1. System Architecture
execution of a program. CoBRA[5] demonstrates the ability to reconfigure the interactions of executing components, including atomic replacements, and compares the method to other means of replacing service implementations. We envisage using an infrastructure of that nature to co-ordinate interactions below the interpreter, but to make the configuration dependent on the DSL. CoBRA uses state store/restore to maintain state between replacement services, but we envisage separating the state from the behavioural implementation of components, with components able to access their relevant state information. Chains of execution are specified by giving a component outputs of previous components as inputs, while the interpreter need only deal with the outputs directly when intervention or a decision is required. The net effect of input and output variables used in this manner is not unlike connectors in MANIFOLD [14] but with greater flexibility for change. Interpretation as a Service. Enterprise computing systems have moved from mainframe based architecture to client server architecture and are now in some cases moving to a web based architecture [10]. This is being facilitated by technologies such as virtualisation [9] and Platform as a Service [19]. We posit a DSL platform operating across an organisation, capable of execution of an open ended set of DSL variations. This will allow us to support consistent change in interpretation across large scale enterprises. Changes at the level of the domain specific language could be used to effect change across an entire organisation, across software systems. Application specific changes to the language used for interpretation could be used to pilot trial changes to the domain specific language, in order to evaluate their utility for future use across the domain. Applications may also have terms with application specific meaning, which could be adjusted in the same manner.
Towards Dynamic Evolution of Domain Specific Languages
147
Usage of cloud computing in banks is below that in other domains [20]. Some concerns expressed in the financial services industry about using cloud computing based services include security, service provider tie-in and price rises, lack of control and potential for down-time if using a third-party cloud. The resources required to manage an internal cloud discourage this option, while both, but particularly third party clouds, could suffer from failure to respond in a timely manner to time critical operations. The resources issue is likely to diminish in importance as technology advances and prices fall. An interpreter for a domain specific language, provisioned on a Platform as a Service basis, will require resources to set up and maintain, however the ease with which programs could thereafter be written to run on the platform may outweigh this outlay. The initial cost is likely to be the inhibiting factor as this would be an expense which would otherwise not be incurred, while the savings in application maintenance and updating should more than offset the cost of maintaining the platform. Large enterprises may run several different software systems and may want to implement the same change across all of them, following a change in the domain. If they use the traditional model driven development approach and change the transformation of the term in the domain specific language, they must still regenerate the appropriate source code and restart the affected components. This is not an atomic action and inconsistencies may arise between different software systems in the organisation in terms of how they treat this term. In an environment where an interpreter is provisioned as a Platform as a Service, a single change to the interpretation of that term will affect all software systems running on that platform.
2.1
An Example Domain
We introduce our model by way of an example banking application. Financial services is a domain with well-defined domain constructs. A DSL for financial products can be seen in [2]. DSL and DSML programs are concise and easier to understand and program for those who work in the domain than low-level code. However the concise program encodes complex behaviour in any particular statement. Over time, the precise interpretation of the high level abstractions may change, but the overall meaning would not. Changes to the language used by banking system developers would normally be required after policy decisions, statutory changes or the introduction of new banking products whose specifications do not match previously available options. An example of a change to the language, which does not require programming specialists, is the introduction of free banking. This means that if a current account matches certain criteria, then standard fees such as maintenance, standing order fees, transaction fees etc. do not apply. If the implementation of a standing order previously charged a fee to carry out the request, then this could be preceded by a conditional checking that the account was not free or did not fulfil the necessary conditions for free banking, which could easily be expressed in the language.
148
P. Laird and S. Barrett
Statutory changes introducing a new concept, such as deposit interest retention tax, would initially require new abstractions, but some of these could be implemented in the high level language. In the case of the introduction of a new tax, all that is needed is an abstract definition of where the tax will apply, what rate of tax is applicable, and a mechanism to deal with the tax collected. The abstract definition of where the tax will apply will almost certainly be expressible in the domain specific language, the tax rate is a primitive fraction, and while the mechanism to deal with the tax collected may be potentially complex, it will reflect the actual banking process involved, and will therefore also be expressible in the language. The introduction of a deduct tax function would encapsulate the treatment of tax so that a single statement need only be added to a credit interest function to include that functionality. As the entire meaning is contained in one location, only one change needs to be made if the bank decides to change where it holds the funds it is due to pay in tax, or if the government changes the tax rate. The DSL would contain provide functionality to reliably transfer money from the client account to the tax account, keeping the change succinct. Developers within the context of the banking system are constrained in what they express by what abstractions have been defined in the domain specific language. They in turn constrain what the end users of the system can do. These relationships retain relevance, although the domain developers would have greater freedom to refine the language and develop compositional constructs in order to facilitate their easier programming in future. The following example is a specification of a loan in the RISLA [2] domain specific language for financial products. The language and syntax are much more accessible to financial engineers than an equivalent general purpose implementation, and certainly by comparison to COBOL. The implementation is achieved in COBOL by compiling the DSL code to produce COBOL code. New products or changes to products can be defined easily if the changes are at the DSL level, such as specifying a minimum transaction size or maximum number of transactions in a given time, but changes to the scheme by which interest is calculated or the addition of fees or taxes, for example in a cross-border context, would require changes to how the terms are interpreted. Changes such as these could happen without any change to the product specification, and therefore it would be inappropriate to change the definition of products at the DSL level to achieve such changes. If the interpretation of the DSL terms could be changed as we have proposed, this would allow the change to be effected at the appropriate level of abstraction Figure 2 shows an example of Domain Specific code used to define a loan product in the RISLA language. To change the program so that only transactions of over 1000 euro could proceed is trivially easy, by changing the relevant number. Other changes, such as adding a transaction fee or tax, require changes to the implementation of one or more keywords, in the case of RISLA, this would be in COBOL, to which it is compiled. In the solution which we propose, the relevant interpretation is changed at runtime by reconfiguring the interpreter.
Towards Dynamic Evolution of Domain Specific Languages
149
product LOAN declaration contract data PAMOUNT : amount STARTDATE : date MATURDATE : date INTRATE : int-rate RDMLIST := [] : cashflow-list ... registration define RDM as error checks "Date not in interval" in case of (DATUM < STARTDATE) or (DATUM >= MATURDATE) "Negative amount" in case of AMOUNT <= 0.0 "Amount too big" in case of FPA(RDMLIST >> []) > 0.0 RDMLIST := RDMLIST >> [] Fig. 2. Part of a Domain Specific Program
More significantly, some changes, which could be catered for at the Domain Specific Language level, are more appropriately handled at the interpretation level. If for example, there was a taxation primitive in the DSL, and a tax was levied on all financial products, it would not be necessary to redesign the language in order to implement the change, but it would be desirable. Implementing the levy as an inbuilt part of initialisation of or some other operation on any financial product would localise the change in an Aspect Oriented way, saving effort on the part of the programmers, and guaranteeing the reliability and uniformity of the change. Consider the implementation of a levy at a Domain Specific Program level. Code to handle the deduction of the levy would have to be added to the definition of every product where the levy would apply. If tax relief were subsequently granted on certain classes of products these would have to be modified once more. In a case where an adaptation affects more than one Domain Specific Program, the atomicity of effecting the change through varying the interpretation may be of great benefit. A change to a transformation in a Model Driven Development scenario would have the same effect on one program, whose execution could be stopped, as a change in interpretation. This would represent another useful application of the concept of Evolving DSLs, as the performance of a transformed model would be faster than an interpreted version, but would not provide the benefits of atomic runtime adaptation of multiple applications. 2.2
A Multi-system Programming Paradigm
A solution of this kind produces a programming paradigm where languages can evolve organically to adapt to changing contexts. A potential application for this
150
P. Laird and S. Barrett
is in the management of organisation wide adaptation in large enterprises. These enterprises generally have many software systems operating on their computers and many of these may access may access a common resource or service. This service could be mapped to a term in a domain specific language if the enterprise used a DSL to specify its software. The service may also be used by other clients. It may be desirable to change the meaning of the term such that it is executed by a different service, however replacing the service at its current location may not be appropriate as it has other clients. In a typical service oriented computing set up, the change would have to be specified in each program using the service which was to be changed. This could introduce inconsistency into the way some domain concept is handled by different applications. By requiring the interpretation of the term by a common interpreter, the change need only be implemented once. 2.3
Programming for Evolving DSLs
When a domain developer defines something in terms of the abstractions provided to him, he is in effect extending the language, as the interpreter refers to the program and to the domain specific language to find a definition for any construct it encounters. This language extension may be specific to the program or context in which it is used, but can be co-opted by future developers as part of a more specific language for related software. Underlying changes could be implemented by replacing a component with a polymorphic variant, or by aspect oriented or reflective interception and wrapping, but this should not concern the domain programmer. The interpreter could deal with more than one level of abstraction above the executing components, in order to represent each abstraction in terms of its component parts, rather than in terms of its atomic low-level components. Thus a transaction is defined in terms of reliable connections and simple instructions, below which issues such as the transaction commit protocol etc. are hidden.
3
Related Work
As well as languages to support multiple systems in large enterprises, we propose to examine the benefits of this programming paradigm in domains such as Computational Trust [7]. The implementation of terms in a Trust DSL may change rapidly based on changing conditions or in response to other factors. This makes Trust a suitable candidate for dynamic interpretation. Many proposed DSLs are implemented as library packages instead due to the difficulty in finding language developers with appropriate domain knowledge[11]. Formal domain modelling can only capture a snapshot of the requirements of a domain, causing modelling lag[15]. A dynamic domain specific language is a DSL based not upon such a snapshot, but one which can be updated as necessary. Keyword Based Programming [4], Intentional Programming[17] and Intentional Software[18] allow incremental development of domain specific languages, and support their specialisation through application specific language extensions. However
Towards Dynamic Evolution of Domain Specific Languages
151
these require generation, compilation and/or reduction steps, after which the running application cannot be adapted in this manner. Papadopoulos and Arbab[14] show how MANIFOLD can handle autonomic reconfiguration with regard to addition or removal of components. Our aim to automate the changes which would be needed to implement a change in the execution of the system. The Generic Modelling Environment [8] allows the construction of a modelling environment given a specification of a Domain Specific Modelling Language. This could be used to represent the domain program, the generation of generalpurpose language code would not allow later dynamic adaptation. Nakatani et al. [12,13] describe how lightweight Domain Specific Languages or jargons can be interpreted using a generic interpreter and language descriptions for the relevant jargons. While there is composition of different jargons to allow them to be used as part of an extended DSL, there is no attempt to modify a program through dynamically varying the interpretation of a term. Platform as a Service is an extension of Software as a Service which sees webbased application development environments hosted by a system provider. The resulting applications are often hosted in a Software as a Service manner by the same provider [19]. Software as a Service [16] is the provision of software through the internet or other network, in a service oriented way. The end user does not have to worry about hosting, updating or maintaining the software.
4
Conclusions
We have presented an outline framework for the design and maintenance of systems. Systems written in domain specific languages would be implemented through the runtime interpretation of that program so as to allow the reinterpretation of terms in the language. The design of domain specific languages from scratch would remain a significant task, as abstractions from the domain need to be captured in a form that can be used to develop programs, however maintenance becomes much easier, as parts of the language are redefined as required. There are several levels at which a programs execution can be changed. Software is written as an application in a programming language, the source code of which can be changed. If the program is interpreted, the virtual machine on which it runs can be altered, or if it is compiled changes can be made at compile time. The operating system itself can be changed, affecting program execution. The lower the level at which an adaptation is implemented, the wider the effects of that change will be, but the less expressive the specification of that change and the less program specific the change will be. We propose to introduce adaptation at a level below the source, but above the executing components. This is an appropriate level at which to implement certain forms of adaptation. Adding another layer naturally introduces an overhead, but we wish to establish whether the benefits to be gained from increased flexibility justify the overhead incurred. The adaptations for which this form of adaptation is best suited are functional, non-polymorphic, runtime adaptations. The framework could naturally support
152
P. Laird and S. Barrett
polymorphic or non-functional runtime adaptation also, however these alone would not justify the creation of a framework for adaptation as aspect oriented programming can perform most of these adaptations adequately. Overall code localisation would improve, as any change which is implemented through a change in interpretation prevents identical edits throughout the code. Dynamic AOP also requires consideration of all code previously woven at runtime during further evolution. The ability to redefine parts of the language in order to provide similar programs in a different context could lead to the budding off of new languages from a developed domain specific language. This would significantly lower the barrier to entry for any domain lacking the economies of scale required to justify DSL development, but which shared some high level abstractions with a related domain. Opening the interpretation of a DSL to runtime adaptation would allow the simultaneous adaptation of multiple applications running on a DSL platform. Delivering such a platform would take considerable resources in set-up and maintenance, but would ease the process of organisation-wide adaptation and increase its reliability and consistency.
References 1. Allen, P.: Service orientation: winning strategies and best practices. Cambridge University Press, Cambridge (2006) 2. Arnold, B., van Deursen, A., Res, M.: An algebraic specification of a language describing financial products. In: IEEE Workshop on Formal Methods Application in Software Engineering, pp. 6–13 (1995) 3. Barrett, S.: A software development process. U.S. Patent (2006) 4. Cleenewerck, T.: Component-based DSL development. In: Pfenning, F., Smaragdakis, Y. (eds.) GPCE 2003. LNCS, vol. 2830, pp. 245–264. Springer, Heidelberg (2003) 5. Irmert, F., Fisher, T., Meyer-Wegener, K.: Runtime adaptation in a serviceoriented component model. In: Proceedings of the 2008 international workshop on Software engineering for adaptive and self-managing systems (2008) 6. Kosar, T., L´ opez, P.E.M., Barrientos, P.A., Mernik, M.: A preliminary study on various implementation approaches of domain-specific language. Information and Software Technology 50(5), 390–405 (2008) 7. Laird, P., Dondio, P., Barrett, S.: Dynamic domain specific languages for trust models. In: Proceedings of the 1st IARIA Workshop on Computational Trust for Self-Adaptive Systems (to appear, 2009) 8. Ledeczi, A., Maroti, M., Bakay, A., Karsai, G., Garrett, J., Thomason, C., Nordstrom, G., Sprinkle, J., Volgyesi, P. (eds.): The Generic Modeling Environment, Workshop on Intelligent Signal Processing, Budapest, Hungary (2001) 9. Marinescu, D., Kroger, R.: State of the art in autonomic computing and virtualization. Technical report, Distributed Systems Lab, Wiesbaden University of Applied Sciences (2007) 10. Markus, M.L., Tanis, C.: The enterprise systems experience–from adoption to success. Framing the domains of IT research: Glimpsing the future through the past, 173–207 (2000)
Towards Dynamic Evolution of Domain Specific Languages
153
11. Mernik, M., Sloane, T., Heering, J.: When and how to develop domain-specific languages. ACM Computing Surveys 37(4), 316–344 (2005) 12. Nakatani, L.H., Ardis, M.A., Olsen, R.G., Pontrelli, P.M.: Jargons for domain engineering. SIGPLAN Not. 35(1), 15–24 (2000) 13. Nakatani, L.H., Jones, M.A.: Jargons and infocentrism. In: First ACM SIGPLAN Workshop on Domain-Specific Languages, pp. 59–74. ACM Press, New York (1997) 14. Papadopoulos, G.A., Arbab, F.: Configuration and dynamic reconfiguration of components using the coordination paradigm. Future Generation Computer Systems 17(8), 1023–1038 (2001) 15. Safa, L.: The practice of deploying dsm report from a japanese appliance maker trenches. In: Gray, J., Tolvanen, J.-P., Sprinkle, J. (eds.) 6th OOPSLA Workshop on Domain-Specific Modeling (2006) 16. SIIA. Software as a service: Strategic backgrounder. Technical report, Software and Information Industry Association (2001) 17. Simonyi, C.: The death of computer languages. Technical report, Microsoft (1995) 18. Simonyi, C., Christerson, M., Clifford, S.: Intentional software. In: Proceedings of the 21st OOPSLA conference. ACM, ACM, New York (2006) 19. Vaquero, L.M., Rodero-Merino, L., Caceres, J., Lindner, M.: A break in the clouds: towards a cloud definition. SIGCOMM Comput. Commun. Rev. 39(1), 50–55 (2009) 20. Voona, S., Venkataratna, R., Hoshing, D.N.: Cloud computing for banks. In: Finacle Connect (2009)
ScalaQL: Language-Integrated Database Queries for Scala Daniel Spiewak and Tian Zhao University of Wisconsin – Milwaukee {dspiewak,tzhao}@uwm.edu
Abstract. One of the most ubiquitous elements of modern computing is the relational database. Very few modern applications are created without some sort of database backend. Unfortunately, relational database concepts are fundamentally very different from those used in generalpurpose programming languages. This creates an impedance mismatch between the the application and the database layers. One solution to this problem which has been gaining traction in the .NET family of languages is Language-Integrated Queries (LINQ). That is, the embedding of database queries within application code in a way that is statically checked and type safe. Unfortunately, certain language changes or core design elements were necessary to make this embedding possible. We present a framework which implements this concept of type safe embedded queries in Scala without any modifications to the language itself. The entire framework is implemented by leveraging existing language features (particularly for-comprehensions).
ScalaQL: Language-Integrated Database Queries for Scala
155
terms. All of the conflict between the dissonant concepts is relegated to a discrete segment of the application. This is by far the simplest approach to application-level database access, but it is also the most error-prone. Generally speaking, this technique is implemented by embedding relational queries within application code in the form of raw character strings. These queries are unparsed and completely unchecked until runtime, at which point they are passed to the database and their results converted using more repetitive and unchecked routines. It is incredibly easy even for experienced developers to make mistakes in the creation of these queries. Even excluding simple typos, it is always possible to confuse identifier names, function arities or even data types. Worse yet, the process of constructing a query in string form can also lead to serious security vulnerabilities — most commonly SQL injection. None of these problems can be found ahead of time without special analysis. The Holy Grail of embedded queries is to find some way to make the host language compiler aware of the query and capable of statically eliminating these runtime issues. As it turns out, this is possible within many of the .NET language family through a framework known as LINQ [8]. Queries are expressed using language-level constructs which can be verified at compile-time. Furthermore, queries specified using LINQ also gain a high degree of composability, meaning that elements common to several queries can often be factored into a single location, improving maintainability and reducing the risk of mistakes. It is very easy to use LINQ to create a trivial database query requesting the names of all people over the age of 18: var Names = from p in Person where p.Age > 18 select p.Name; This will evaluate (at runtime) an SQL query of the following form: SELECT name FROM people WHERE age > 18 Unfortunately, this sort of embedding requires certain language features which are absent from most non-homoiconic [10] languages. Specifically, the LINQ framework needs the ability to directly analyze the structure of the query at runtime. In the query above, we are filtering the query results according to the expression p.Age > 18. C# evaluation uses call-by-value semantics, meaning that this expression should evaluate to a bool. However, we don’t actually want this expression to evaluate. LINQ needs to somehow inspect this expression to determine the equivalent SQL in the query generation step. This is where the added language features come into play. While it is possible for Microsoft to simply extend their language with this particular feature, lowly application developers are not so fortunate. For example, there is no way for anyone (outside of Sun Microsystems) to implement any form of LINQ within Java because of the language modifications which would be required. We faced a similar problem attempting to implement LINQ in Scala.
156
D. Spiewak and T. Zhao
Fortunately, Scala is actually powerful enough in and of itself to implement a form of LINQ even without adding support for expression trees. Through a combination of operator overloading, implicit conversions, and controlled callby-name semantics, we have been able to achieve the same effect without making any changes to the language itself. In this paper, we present not only the resulting Scala framework, but also a general technique for implementing other such internal DSLs requiring advanced analysis and inspection prior to evaluation. Note that throughout this paper, we use the term “internal DSL” [4] to refer to a domain-specific language encoded as an API within a host language (such as Haskell or Scala). We prefer this term over the often-used “embedded DSL” as it forms an obvious counterpoint to “external DSL”, a widely-accepted term for a domain-specific language (possibly not even Turing Complete) which is parsed and evaluated just like a general-purpose language, independent of any host language. In the rest of the paper, Section 2 introduces ScalaQL and shows some examples of its use. Section 3 gives a general overview of the implementation and the way in which arbitrary expression trees may be generated in pure Scala. Finally, Section 4 draws some basic comparisons with LINQ, HaskellDB and similar efforts in Scala and other languages.
2
ScalaQL
The entire ScalaQL DSL is oriented around a single Scala construct: the forcomprehension. This language feature is something of an amalgamation of Haskell’s do-notation and its list-comprehensions, rendered within a syntax which looks decidedly like Java’s enhanced for-loops. One trivial application of this construct might be to construct a sequence of 2-tuples of all integers between 0 and 5 such that their sum is even: val tuples = for { x <- 0 to 5 y <- 0 to 5 if (x + y) % 2 == 0 } yield (x, y) There are really three separate components to this syntax. The first is the generator (e.g. x <- ...), which sets up the local variable x containing the current element in the comprehension. The second component is the filter (if ...), which defines the conditions under which this comprehension holds. Finally, we have the yield clause, which defines the result in terms of the variables set up by the generator(s). There may be any number of generators and filters, but only one yield. Every for-comprehension is parsed into a corresponding series of calls to methods flatMap, map and filter.1 The map and filter methods are standard 1
Unless the for-comprehension lacks a yield, in which case foreach replaces flatMap and map.
ScalaQL: Language-Integrated Database Queries for Scala
157
higher-order utility functions. The flatMap method is effectively Scala’s version of Haskell’s >>= operator (monadic bind ). It is defined for collections as a composition of the map and flatten functions. By rewriting for-comprehensions in terms of other language elements at parse time, Scala empowers third-party frameworks (such as ScalaQL) to exploit the syntax simply by implementing the relevant methods. Altogether, this syntax provides a way of working with Scala collections in an almost declarative fashion reminiscent of a query language. In fact, it is possible to make use of for-comprehensions to perform SQL-like queries against Scala collections. For example: // regular Scala collections, not ScalaQL val people: List[Person] = ... val companies: List[Company] = ... val underAge = for { p <- people c <- companies if p.company == c if p.age < 14 } yield p This expression yields a List of all people under the age of 14 who are employed by some company. If we were to formulate this same query in SQL, the result would be something like this: SELECT p.* FROM people p JOIN companies c ON p.company_id = c.id WHERE p.age < 14 Intuitively, for-comprehensions are a natural syntactic device for representing declarative queries against generic collections. ScalaQL makes it possible to use that same syntax to represent database queries. Using ScalaQL, we can take our query example from earlier and slightly adapt it into something that will actually run against a database: val underAge = for { p <- Person c <- Company if p.company is c if p.age < 14 } yield p Recall that for-comprehensions are translated into a corresponding series of calls to flatMap, map and filter. In this case, the first (outermost) call to flatMap will be targetted on the Person object. This is what allows ScalaQL to “hijack”
158
D. Spiewak and T. Zhao
the for-comprehension syntax. Person must implement — or inherit from a type which implements — the flatMap and map methods such that an an abstract representation of the query is produced (see Section 3). The primary syntactic difference between this and the same query run against Scala List(s) is the use of the is operator (rather than ==) to test equality. This is necessary because of the way that Scala handles the == method.2 Amazingly enough, it is the only syntactic concession made by the framework. All other String, Int and Boolean operators work exactly as expected. For example, the < operator is used above to compare p.age to the integer literal, 14. The above expression will produce an instance of Query[Person], one which will produce a sequence of Person entities when evaluated. ScalaQL does not evaluate queries at declaration point. Instead, evaluation is deferred until the query is actually used as a sequence. For example: underAge foreach { p => println(p.firstName + ' ' + p.lastName) } The foreach method is not declared for type Query. When the Scala compiler sees this invocation, it determines that an implicit conversion from Query[Person] to Seq[Person] is required in order to make everything work. This implicit conversion is transparently injected into the bytecode by the Scala compiler and invoked at runtime just prior to the invocation of foreach. It is this implicit conversion, defined by ScalaQL, which handles the query evaluation. The primary advantage to this deferred evaluation is it allows queries to be treated compositionally. For example, we might want to construct a query which finds all of the under-age employees working at MegaCorp. Rather than redundantly defining the query constraints for under-age workers, we can simply build our new query by composing with the old: val megaCorpEmps = for { p <- underAge if p.company.name is "MegaCorp" } yield p If we were to evaluate the megaCorpEmps query, it would execute SQL against the database very similar to the following: SELECT p.* FROM people p JOIN companies c ON p.company_id = c.id WHERE p.age < 14 AND c.name = 'MegaCorp' 2.1
Projection
So far, all of the queries we have expressed using ScalaQL have had a very simple yield statement, producing an instance of Query parameterized against 2
Unlike other symbolic methods, Scala defines == as an alias for equals. Our experiments revealed some bugs in Scala’s type checker when either equals or == are defined to return anything other than Boolean (unrelated and well-formed sections of code would arbitrarily fail to type-check).
ScalaQL: Language-Integrated Database Queries for Scala
159
an entity type. ScalaQL is also capable of projecting on single fields as well as arbitrary record types defined as anonymous classes. This makes it possible to define type safe projections with arbitrary fields. Single-field and single-expression projection works exactly as expected. We define our yield clause in terms of the row locals defined in the generators (e.g. p or c), using fields, operators and values in the same fashion as in the filters. For example: val names = for { p <- Person if p.age > 18 } yield p.lastName This defines an instance of type Query[Varchar] which produces the last names of all of the people in the database over the age of 18. With a few slight modifications, we can actually produce the concatenation of the first and last names in the standard “Last, First” format: val names = for { p <- Person if p.age > 18 } yield p.lastName + ", " + p.firstName When evaluated, this query will execute SQL similar to the following: SELECT CONCAT(CONCAT(p.last_name, ', '), p.first_name) FROM people p WHERE p.age > 18 One particularly thorny aspect of projection which has been a difficult area for similar query DSLs in the past is that of multi-field projection. In SQL, it is possible to construct a query which produces a subset of the resulting fields; not just one field, but several. This is difficult because it requires the ad-hoc definition of new record types corresponding to the fields in question. While classes are technically a form of record type, very few languages sufficiently facilitate the definition of classes on a case-by-case basis. When each query requires a different record type (class) for its projection, query definition becomes a very tedious affair. Fortunately, Scala provides a lightweight syntax for defining Java-style anonymous inner-classes which extend AnyRef. This syntax (which actually comes from C#) makes it easy to define new classes at query-site without becoming syntactically burdensome: val people = for { p <- Person if p.age > 18 } yield new { val firstName = p.firstName val lastName = p.lastName }
160
D. Spiewak and T. Zhao
This query selects only the first name and last name fields from the people table. The new { ... } syntax defines a new anonymous inner-class containing two fields: firstName and lastName. This type will be used to populate the query results. Thus, the type of the people value is Query[$t], where $t is the type of the anonymous inner-class (this type is hidden by Scala’s type inference, hence the use of the “$t” notation). We can demonstrate this fact by iterating over the query results and accessing fields: people foreach { p => println(p.firstName + ' ' + p.lastName) }
3
Implementation
The most important guiding concept of ScalaQL’s implementation is that of the abstract query tree, which is similar in principle to an abstract syntax tree used in the implementation of most programming languages. Unlike most internal DSLs, ScalaQL does not immediately evaluate the invocation syntax into a final result. Instead, it creates an abstract representation of the desired query in an AST-like structure. This structure is what is actually contained by a value of type Query. When the Query is converted to a Seq, the abstract query tree is converted into the corresponding SQL, which is evaluated against the database to produce the final result. The query tree is composed of three elements: views, projections and expressions. Views directly correspond to relations in relational algebra and may be either tables or queries (another abstract query tree). Projections have three different forms, each corresponding to one of the three different projection types supported by ScalaQL: single field, single table and field subset. Projections may also contain expressions in cases where the yield clause is not a simple field or entity: for { p <- Person } yield p.firstName + " " + p.lastName Expressions are where most of the interest lies. The addition of abstract expression trees as first-class values was one of the primary changes in C# 3.0 as required by LINQ. Since Scala does not have this feature, we must find a way to construct expression trees using a different approach. The solution is a combination of implicit conversions and operator overloading. In the above example, we have given the sub-expression p.firstName + " ". While p.firstName may appear to be a field of type String, it actually has type Varchar, which extends the StringExpression class. This class defines a number of methods, including +, an operator which takes another StringExpression as a parameter. We have defined an implicit conversion from type String to StringExpression, allowing literal strings to be concatenated onto abstract StringExpression(s). The result of this + method is an abstract expression node, AddStr, which also extends StringExpression.
ScalaQL: Language-Integrated Database Queries for Scala
161
Of course, strings are not the only data type manipulated by SQL expressions. For this reason, we have also created implementations for NumericExpression, BooleanExpression and TimeExpression. Each of these classes defines operator methods according to how their respective type is expected to behave. Thus, NumericExpression defines +, *, % and more, while BooleanExpression defines &&, || and so on. Every expression class extends Expression, which defines operator methods common to all expressions: is and !=. All of these operations return abstract expression nodes representing the specific operation in question. These nodes each resolve to a different SQL operation or function, making it possible to effectively compile Scala expressions into SQL at runtime. In a sense, the expression DSL parses code which appears to be conventional String, Int and Boolean expressions into a structure very reminiscent of a compiler’s abstract syntax tree. This tree can then undergo a code generation phase, which produces the corresponding SQL. Type safety is ensured by the fact that the operator methods in each expression class will only accept certain parameter types. Thus, it is impossible to concatenate a StringExpression and a NumericExpression; the + operator method in StringExpression only accepts another StringExpression. Inherited operator methods like is and != are guaranteed type safety through the use of an abstract type declared in the Expression superclass. This type effectively allows the parameters for any operator methods in Expression to vary covariantly with subtyping, ensuring that it is impossible to test a NumericExpression and a BooleanExpression for equality. The other advantage to this approach in general (besides type safety) is that it allows optimizations and other in-depth analysis to be performed against the abstract expression tree prior to resolution (code generation). Normally, a DSL evaluates directly to its final result, making it very difficult to perform any sort of non-trivial processing on the instructions. This is because direct evaluation effectively restricts any processing to a single pass over the instructions. By evaluating to an intermediate form (the expression tree), we make it possible to perform multi-pass analysis (including optimization) against a complete representation of the DSL instructions.
4
Related Work
SQLJ [9] embeds SQL into Java and is statically typed. However, dynamic queries are not supported as every SQLJ query is converted in a pre-compilation step. While not technically a language extension, SQLJ is certainly not “plainold Java”. SchemeQL [12] is similar to SQLJ in that it processes embedded query statements using an external preprocessor, but without providing any static typing. Safe Query Object [1] achieves many of the same goals as SQLJ, all while working within regular Java syntax. Users specify queries using special Java classes which are compiled into JDO queries. Safe Query Object also supports a wide variety of query operations including existential quantification, parameters and dynamic queries. However, like SQLJ, a special compilation step is
162
D. Spiewak and T. Zhao
required to perform the conversion. As mentioned previously, systems such as LINQ [8] support SQL-like queries through language extensions. Java Language Extender [11] is another framework which operates in this fashion. Of all of the projects in this field, HaskellDB [6] is likely the most similar to our approach in that it functions as an internal domain-specific language. Operations such as filter, join and conditionals are all supported in a statically checked, type safe environment provided by Haskell’s type system. However, Haskell imposes heavier restrictions on function overloading than does Scala. Thus, HaskellDB is forced to use operators like .+. instead of the more familiar + when summing query values. Also, Scala’s implicit conversions are in some ways more powerful than Haskell’s type classes. ScalaQL allows the use of integer literals directly in query expressions, while HaskellDB requires the explicit use of the constant function. Related to HaskellDB is the Pan language [3]. While Pan has very little to do with database queries, it does demonstrate the power of internal DSL construction with an intermediate form. Like ScalaQL, Pan relies on carefully-constructed ADTs to statically ensure well-formedness of DSL expressions. The authors of Pan also discuss ways in which the intermediate form of the DSL may be leveraged in the implementation of advanced optimizations and analyses. The AraRat [5] framework provides similar query functionality in C++ through the use of preprocessor directives, operator overloading and templates. Its focus is primarily on directly representing relational algebra within the syntax of C++, rather than a more “familiar” dialect like SQL. Thus, a join is represented using the * operator, rather than through a more mainstream nomenclature. AraRat does share what is perhaps ScalaQL’s most important feature in that it represents views in their abstract form, allowing queries to be highly compositional and easily optimized. AraRat provides a large amount of type safety in the construction and composition of queries, but it does not extend that safety to the evaluation of those queries and subsequent parsing of the results. Queries are simply converted to char* using the asSQL() function. This differs from ScalaQL, which converts abstract views into properly type safe sequences during evaluation. This limitation is not entirely surprising given the fact that C++ lacks a generic database access framework like JDBC. Various non-academic efforts have also been made to solve this problem of language embedded queries based on real-world requirements. Ambition [2] is a widely-used internal DSL for Ruby which provides a very natural syntax for constructing queries. Notably, its core framework is not restricted to merely database access; it has also been applied to other query domains such as LDAP and XPath. However, as can be expected from a framework designed for a dynamically-typed language like Ruby, Ambition provides no static guarantees regarding query correctness. A project very similar to ScalaQL has been developed independently by Stefan Zeiger [13]. Like ScalaQL, this project aims to provide a framework for type safe queries within Scala using for-comprehensions. However, despite this similarity, there are some important differences. ScalaQL makes use of the pseudo-monadic
ScalaQL: Language-Integrated Database Queries for Scala
163
filter operation for declaring query conditionals, allowing the use of the if syntax in for-comprehensions. Zeiger’s framework defines a separate series of methods for this (though it can use filter for some conditionals). Projection differs greatly between the frameworks, with ScalaQL relying on anonymous inner-classes while Zeiger’s framework uses field combinators to generate arbitrary views (e.g. firstName - lastName - age).
5
Summary
In this paper, we have given a brief overview of the ScalaQL framework, focusing specifically on static type safety and syntactic intuitiveness. By exploiting the existing for-comprehension construct, ScalaQL blends seamlessly with conventional query-like operations performed on Scala collections. We predict that ScalaQL — or something like it — will become an important part of generalpurpose Scala ORM frameworks in the future.
References 1. Cook, W.R., Rai, S.: Safe Query Objects: Statically typed objects as remotely executable queries. In: Proceedings of the International Conference on Software Engineering (ICSE), pp. 97–106 (2005) 2. Defunkt, C.: Ruby’s Ambition (2008), http://ambition.rubyforge.org/ 3. Elliott, C., Finne, S., De Moor, O.: Compiling Embedded Languages. Journal of Functional Programming 13(03), 455–481 (2003) 4. Fowler, M.: Domain Specific Language (2007), http://www.martinfowler.com/bliki/DomainSpecificLanguage.html 5. Gil, J.Y., Lenz, K.: Simple and safe SQL queries with C++ templates. In: GPCE 2007: Proceedings of the 6th international conference on Generative programming and component engineering, pp. 13–24 (2007) 6. Leijen, D., Meijer, E.: Domain specific embedded compilers. In: Proceedings of the 2nd Conference on Domain-Specific Languages, pp. 109–122 (1999) 7. Maier, D.: Representing database programs as objects. In: Advances in database programming languages, pp. 377–386. ACM, New York (1990) 8. Meijer, E., Beckman, B., Bierman, G.M.: LINQ: Reconciling object, relations and XML in the .NET framework. In: Proceedings of the ACM Symposium on Principles Database Systems (2006) 9. Melton, J., Eisenberg, A.: Understanding SQL and Java together: a guide to SQLJ, JDBC, and related technologies. Morgan Kaufmann, San Francisco (2000) 10. Steele Jr., G.L.: Common LISP: the language. Digital Press (1984) 11. Van Wyk, E., Krishnan, L., Bodin, D., Johnson, E.: Adding domain-specific and general purpose language features to java with the java language extender. In: Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications, pp. 728–729 (2006) 12. Welsh, N., Solsona, F., Glover, I.: SchemeUnit and SchemeQL: Two little languages. In: Third Workshop on Scheme and Functional Programming (2002) 13. Zeiger, S.: A Type-Safe Database Query DSL for Scala (2008), http://szeiger.de/blog/2008/12/21/ a-type-safe-database-query-dsl-for-scala/
Integration of Data Validation and User Interface Concerns in a DSL for Web Applications Danny M. Groenewegen and Eelco Visser Software Engineering Research Group, Delft University of Technology, The Netherlands [email protected], [email protected]
Abstract. Data validation rules constitute the constraints that data input and processing must adhere to in addition to the structural constraints imposed by a data model. Web modeling tools do not address data validation concerns explicitly, hampering full code generation and model expressivity. Web application frameworks do not offer a consistent interface for data validation. In this paper, we present a solution for the integration of declarative data validation rules with user interface models in the domain of web applications, unifying syntax, mechanisms for error handling, and semantics of validation checks, and covering value well-formedness, data invariants, input assertions, and action assertions. We have implemented the approach in WebDSL, a domain-specific language for the definition of web applications.
1 Introduction The engineering of web applications requires catering for a number of different concerns including data models, user interfaces, actions, data validation, and access control. In the mainstream technology for web application development these concerns are supported by loosely coupled languages that require abundant boilerplate code and lack static verification. The domain-specific language engineering challenge for the web application domain [21] is to realize a concise, high-level, declarative language for the definition of web applications in which the various concerns are supported by specialized sub-languages, yet linguistically integrated, and from which implementations can be derived automatically. This requires investigation and understanding of, and the design of appropriate domain-specific languages for each of the sub-domains of the web application domain. Moreover, it requires the seamless linguistic integration of these separate languages that ensures the consistency of models in the different domains and that leverages their combination. This research program is relevant for the discovery of good abstractions for the web engineering domain. It is also relevant as a case study in the systematic development of families of domain-specific languages. In previous work we have studied the domains of data models and user interface definitions [21], access control [6], and workflow [7], the results of which have been implemented as sub-languages of the WebDSL language [22]. In this paper, we address the domain of data validation and its interaction with the user interface. The core of a data-intensive web application is its data model. The web application must be organized to preserve the consistency of data with respect to the data model during updates, deletes, and insertions. The core consistency properties of a data model are formed by structural constraints, that is, the data members of and relations between M. van den Brand, D. Gaˇsevi´c, J. Gray (Eds.): SLE 2009, LNCS 5969, pp. 164–173, 2010. c Springer-Verlag Berlin Heidelberg 2010
Integration of Data Validation and User Interface Concerns
165
entities. Some consistency properties cannot be expressed as structural constraints. Furthermore, some data integrity constraints do not pertain directly to persistent data. Data validation rules constitute the constraints that data input and processing must adhere to in addition to the structural constraints imposed by the data model. A high-level web engineering solution should provide a uniform and declarative validation model that integrates with the other relevant technical models. In addition to ensuring data consistency by enforcing a validation model, the integration of data validation in a web application requires a mechanism for reporting constraint violations to the user, indicating the origin of the violation in the user interface with a sensible error message and consistent styling. Model-driven methodologies such as OOHDM [18], WebML [4], UWE [10], OOWS [15], and Hera [20] do not make data validation concerns explicit in their models. When generating code from models, as demonstrated for UWE [11], WebML [2], and Hera [5], validating data requires an escape from model to code, hampering full code generation and model expressivity. In this paper, we present a language design that integrates declarative data validation rules with user interface models in the domain of web applications, unifying syntax, mechanisms for error handling, and semantics of validation checks, and that covers value well-formedness, data invariants, input assertions, and action assertions. We have implemented the approach in WebDSL [21], a domain-specific language for the definition of web applications. The main contributions of this paper are (1) the design of abstractions for data validation in web applications for concise and uniform specification of value well-formedness, data invariants, input assertions, and action assertions, (2) the seamless integration of data validation rules and user interface definitions, and (3) an example of the integration of models for multiple technical domains. In the next section we give a brief introduction to WebDSL and the running example used in the rest of the paper. Section 3 discusses validation features necessary for web applications, namely value well-formedness, data invariants, input assertions, and action assertions. Section 4 discusses related and future work, and Section 5 concludes.
2 WebDSL WebDSL [21] is a domain-specific language for the development of web applications that integrates data models, user interface models, user interface actions, styling, access control [6], and workflow [7]. While these different concerns are supported by separate domain-specific sub-languages, the static semantics of the language enforces the integrity of the different concerns of an application model. What distinguishes WebDSL from web application frameworks in general purpose languages [9,13,16] is static verification and abstraction from accidental complexity (boilerplate code). Compared to web modeling tools [19,11,14,2], WebDSL combines high expressivity with good coverage (customization options). The WebDSL compiler generates a complete implementation in Java or Python. In this section we give an overview of the features of WebDSL needed in this paper and introduce the running example used to discuss data validation in this paper. We illustrate the various categories of data validation with a small user management application. The example application consists of two data model entities, namely User
166
D.M. Groenewegen and E. Visser
entity User { username :: String email :: Email }
entity UserGroup { name :: String (id) members -> Set<User> }
and UserGroup (Fig. 1). Data model definitions describe the persistent data model in a WebDSL application. Data model entities consist of properties with a name and a type. Types of properties are either value types (indicated by ::) or associations to other entities defined in the data model. Value types are basic data types such as String and Int, but also domain-specific types such as Email that carry additional functionality. Associations are composite (the referer owns the object, indicated by <>) or referential (the object may be shared, indicated by ->). Associations can be to collections such as Set or List, demonstrated by the members property of the UserGroup entity. Page definitions in WebDSL describe the web pages that allow users to view and modify data model entities. Page definitions consist of the name of the page, the names and types of the objects passed as parameters, and a presentation of the data contained in the parameter objects. For example, the editUser(u:User) definition in Fig. 1 creates a page for editing the properties of User entity u. WebDSL provides basic markup operators such as group and label for defining the structure of a page. Navigation is realized using the navigate element, which takes a link text and a page with parameters as arguments. Furthermore, page definitions can be reused by declaring them as template. Templates can be included in page definitions by supplying the associated parameters. In addition to presenting data objects, pages can also modify objects. For example, the content of a User entity can be modified with the editUser page. The page element input(u.username) declares an appropriate form input element based on the type of its argument; in this case a text field. A data modification is finalized by means of an action, which can apply further modifications to the objects involved. For example, in the save action the changes to the User are saved. The return statement of an action is used to realize page flow by specifying the page and its arguments where the browser should be directed after finishing the action.
3 Validation Abstractions Data validation is required in multiple contexts in web applications. In this section we distinguish four variants, show how these are expressed in WebDSL using declarative data validation rules, and how error messages are integrated in the user interface.
Integration of Data Validation and User Interface Concerns entity User {
username :: String (id)
password :: Secret
extend entity User { username(validate(isUnique(),"Username is taken")) validate(password.length >= 8, "Password needs to be at validate(/[a-z]/.find(password), "Password must contain validate(/[A-Z]/.find(password), "Password must contain validate(/[0-9]/.find(password), "Password must contain }
email :: Email
167
}
least 8 characters") a lower-case character") an upper-case character") a digit")
Fig. 2. Data invariants for User entity validation
3.1 Value Well-Formedness Value well-formedness checks verify that a provided input value conforms to the value type. In other words, the conversion of the input value from request parameter to an instance of the actual type must succeed. This type of validation is usually provided by libraries or frameworks. However, it has to be declared explicitly, and possibly at each input of a value of the type. In WebDSL, value well-formedness rules are checked automatically. WebDSL supports types specific for the web domain, including Email, URL, WikiText, and Image. Automatic value well-formedness constraints for all value types provides decent input validation by default. Moreover, these built-in type validation checks and messages can be customized in an application. The editUser page in Fig. 1 consists of a form with labeled inputs for the User entity properties. The save action persists the changes to the database, provided that all validation checks succeed. (Changes to existing entities are automatically stored in WebDSL, new entities need to be saved explicitly using the save() method.) Since well-formedness validation checks are automatically applied to properties, the email property is validated against its well-formedness criteria. The result of entering an invalid email address is shown in the screenshot: a message is presented to the user and the action is not executed. 3.2 Data Invariants Data invariants are constraints on the data model, i.e. restrictions on the properties of data model entities. These validation rules can check any type of property, such as a reference, a collection, or a value type. By declaring validation in the data model, the validation is reused for any input or operation on that data. In Ruby on Rails [16] data invariants can be defined in a ‘validate’ method of the active record class, which
168
D.M. Groenewegen and E. Visser
entity UserGroup {
name :: String (id) moderators -> Set<User>
owner -> User members -> Set<User>
memberLimit :: Int }
extend entity UserGroup { validate(owner in moderators, "Owner must always be a moderator") validate(owner in members, "Owner must always be a member") validate(members.length <= memberLimit, "Exceeds member limit") } define page editUserGroup(ug:UserGroup) { form { group("User Group") { label("Name") { input(ug.name) } label("Member Limit") { input(ug.memberLimit) } label("Moderators") { input(ug.moderators) } label("Members") { input(ug.members) } action("Save", save()) } } action save() { return userGroup(ug); } }
Fig. 3. Data invariants for UserGroup entity validation
then gets called by the framework when validation is required. Multiple checks in a validation method tangle validation for different properties. The Seam [9] framework supports the specification of data invariants declaratively through annotations. However, these annotations consist of a limited number of built-in checks and an escape to specify a custom class that handles validation for a property. In the worst case each validation rule needs a separate class, incurring the syntactic overhead of Java class declarations several times. Validation rules in WebDSL are of the form validate(e,s) and consist of a Boolean expression e to be validated, and a String expression s to be displayed as error message. Any globally visible functions or data can be accessed as well as any of the properties and functions in scope of the validation rule context. Validation checks on the data model are performed when a property on which data validation is specified is changed and when the entity is saved or updated. Validation is connected to properties either by adding the validation in the property annotation or by referring to a property in the validation check. More specific validation checks are supported which are only checked when the entity is in a certain state, these are validatesave, which is checked when an entity is saved for the first time, validateupdate, checked on any update, and validatedelete, checked before deleting the entity. The validation mechanism takes care of correctly presenting validation errors originating from the data model. For form inputs causing data invariant violations the message is placed at the input being processed. When data model validation fails during the execution of an action, the error is shown at the corresponding button. Fig. 2 presents an extended User entity with several invariants and a password property. The username property has the id annotation, which indicates the property is
Integration of Data Validation and User Interface Concerns
169
unique and can be used to identify this entity type. The isUnique member function (a generated function that takes into account the existence of an ’id’ property) is called to verify this constraint. The password property is annotated with validation rules that express requirements for a stronger password. By declaring validation rules in the entity, explicit checks in the user interface can be avoided. Both the WebDSL page definition and the resulting web application page are shown below the entity definition. Fig. 3 shows more advanced validation rules, which express dependencies between the properties of an entity. The UserGroup entity is extended with an owner reference, a moderators set, and a memberLimit value. The editUserGroup page allows the owner to edit some of the UserGroup properties. The validation rule on the moderators set expresses that the owner should always be in this set of moderators (similarly, the owner should always be a member). The member set is constrained in size based on the memberLimit value. Validation rules that cover multiple properties, such as the ’owner in moderators’ check, are performed for all input components of properties the validation is specified on. However, the checks can be added to a single property as well, in order to specialize the error message. 3.3 Input Assertions Input assertions are necessary when the validation rule targets an input that is not directly connected to the persisted data model. These types of constraints are easy to address in the form environment itself. For example, a validation check in XForms [1] verifies properties of the entered form data. The model in XForms, on which validation is specified, is a model of the input data produced by the form. Unfortunately, such form validation solutions are not integrated with validation on the application data model. For example, an input for an entity produces the identifier as form data, in the XForms model it is just a String, but in the application data model it is an entity reference. Validation checks in WebDSL pages have access to all variables in scope, including page variables and page arguments. The placement and order of validation rules does not influence the results of the checks. Visualization of errors resulting from validation in forms are placed at the location of the validation declaration. Usually such a validation rule is connected to an input, which can be expressed by placing the validation rule as a child element of input. The example in Fig. 4 demonstrates the final addition to the user edit form, an extra password input field in which the user must repeat the entered password. This validation cannot be expressed as a data invariant, since the extra password field is not part of the User entity. Therefore, the rule is expressed in the form directly, where it has access to the page variable p. This variable contains the repeated password whereas the first password entry is saved in the password field of User entity u. When entering a different value in the second field the validation error is presented, as can be seen in the screenshot. 3.4 Action Assertions Action assertions are predicate checks at any point in the execution of actions and functions for verification during the processing of inputs. The action processing needs to be
170
D.M. Groenewegen and E. Visser define page editUser(u:User) { var p: Secret; form { group("User") { label("Username") { input(u.username) } label("Email") { input(u.email) } label("New Password") { input(u.password) } label("Re-enter Password") { input(p) { validate(u.password == p, "Password does not match") } } action("Save", action{ } ) } } }
Fig. 4. Form validation with input assertions define page createGroup() { var ug := UserGroup {} form { group("User Group") { label("Name") { input(ug.name) } label("Owner") { input(ug.owner) } action("Save", save()) } } action save() { validate(email(newGroupNotify(ug)) ,"Owner could not be notified by email"); return userGroup(ug); } }
aborted, reverting any changes made, and the validation message has to be presented in the user interface. This type of validation is not directly supported in existing solutions, requiring an investment in finding appropriate hooks in the implementation. For example, Ruby on Rails [16] assumes validation is specified in data model classes, errors are passed through those model classes and the form mechanism is built around that. There is no mechanism for a validation check as part of a controller action, this requires a lowlevel encoding that passes the check result and error message, or wrapping validation in a data model class. WebDSL supports this type of validation transparently using the same validation rules. The errors resulting from action assertion failures are displayed at the place the execution originated, e.g. above the submit button which triggered the erroneous action.
Integration of Data Validation and User Interface Concerns
171
Fig. 5 provides an example of action assertion. On the right is a page definition for a createGroup page which allows creating new UserGroup entities. The constraint expressed in the save action is that creating a new group requires email notification to the specified owner (which might not be the user executing this operation). The newGroupNotify email definition retrieves an email address from its UserGroup argument (through ug.owner.email) and tries to send a notification email to the owner of the new group. When this fails, for instance because there is no mail server responding to the email address, the call returns false and the validation check produces the error. This result is shown on the left in the screenshot. Generic error handling, such as problems with a database commit, can also be expressed using action assertions. The web application can then display an error message in the form instead of redirecting to a default error page. 3.5 Messages This section has described assertions that report erroneous behavior in actions. Related to such action assertions, is a generic messaging mechanism for giving feedback about the correct execution of an action. This requires a place to show messages, for instance by adding a default message template at the top of each page. Furthermore, the message should be declared in the action code. An example of such messaging is shown in Fig. 6. The save action of the editUser page gives a message to the page redirected to, namely user. The result of the executed action is shown on the left. 3.6 Validation Mechanics A page request in WebDSL is processed in the following five phases: Convert request parameters: check value well-formedness validation rules for page arguments and input parameters, then convert these to the correct types. Update model values: check data invariants for input data, and then insert in data model entities. Validate forms: check input assertions in page definitions. Handle actions: perform action, abort if an action assertion fails (in that case no changes are made to the data model). Render or redirect: show page, including produced validation errors. Redirect if an action executed successfully.
4 Discussion Web Modeling Tools Several model-driven methodologies for creating web applications have been proposed in recent years, including OOHDM [18], SHDM [12], WebML [4], UWE [10], OOWS [15], and Hera [20]. WebDSL goes beyond being a methodology for designing web applications and providing a path to actual implementation by leveraging full code generation. The transformation from problem space to solution space is completely automated. In this paragraph we discuss how these methodologies and their tools relate to WebDSL in general, and data validation integration in particular. The Hera Presentation Generator [5] allows modeling forms to support editing data in the session. The persisted domain data of the application cannot be changed. Hera-S [19] also incorporates persisting form input data through update queries. The only example in the paper of such an update shows incrementing a view counter, a simple
172
D.M. Groenewegen and E. Visser
operation that does not process form input data. Kraus et al. [11] present the generation of partial web applications from UWE models. An application skeleton is generated including JSP pages and navigation between them. Forms and input data are not discussed, which probably means it is part of the custom code. HyperDe [14] is a tool that allows online creation of web applications designed with the SHDM method. The paper shows an example of an input field for a person’s email address. This involves manual construction of data binding (showing the email and reading it from the submit data) and does not indicate how validation of that input can be performed. WebRatio [2] is a tool for generating web applications based on the WebML method. The conceptual WebML models do not model data validation concerns, while WebRatio does have form validation features. These can be directly mapped to validation features in the underlying Struts [3] framework. Validation which goes beyond the form, such as querying the database, has to be implemented in a Struts validator class. This implementation requires intricate knowledge of the translation process and implementation platform. From our study of the literature we conclude that declarative modeling of data validation is ignored in model driven web engineering. As a result, validation concerns require an escape from model to code, hampering full code generation and model expressivity. Future Work. The current validation model focuses on verifying that the data satisfies a set of constraints. Actions that break these constraints are forbidden and result in an error message. An alternative approach would be to solve constraints automatically [8] and repair data so that it complies with the constraints or to suggest such repairs to the user. Since most inputs in web application forms are strings, expressivity of validation rules could be increased by incorporating a domain-specific language for string constraints. Scaffidi et al. [17] demonstrate that parsing technology can provide rich string input validation and feedback.
5 Conclusion The domain-specific language engineering challenge for the web application domain [21] is to realize a concise, high-level, declarative language for the definition of web applications in which the various concerns are supported by specialized sub-languages, yet linguistically integrated, and from which implementations can be derived automatically. This paper presents a solution for the integration of data validation, a vital component of web applications, into a web application DSL that includes data models, user interfaces, and actions. This solution unifies syntax, mechanisms for error handling, and semantics for data validation checks covering value well-formedness, data invariants, input assertions, and action assertions. Our approach improves over current web modeling tools by providing declarative data validation rules from which a complete implementation is generated. Unlike web application frameworks, our solution supports different kinds of data validation uniformly. The integration of data validation rules into WebDSL, a web application DSL that supports data models, user interfaces, and actions, allows web application developers to take a truely model-driven approach to the design of web applications, concentrating on the logical design of an application rather than the accidental complexity of low-level implementation techniques.
Integration of Data Validation and User Interface Concerns
173
References 1. Boyer, J.M. (ed.): XForms 1.0, 3rd edn. W3C Recommendation (2007) 2. Brambilla, M., Comai, S., Fraternali, P., Matera, M.: Designing web applications with WebML and WebRatio. In: Web Engineering: Modelling and Implementing Web Applications, pp. 221–260 (2007) 3. Brown, D., Davis, C., Stanlick, S. (eds.): Struts 2 in Action. Manning Publ. Co. (2008) 4. Ceri, S., Fraternali, P., Bongio, A.: Web Modeling Language (WebML): a modeling language for designing Web sites. Computer Networks 33(1-6), 137–157 (2000) 5. Frasincar, F., Houben, G., Barna, P.: HPG: the Hera Presentation Generator. Journal of Web Engineering 5(2), 175 (2006) 6. Groenewegen, D.M., Visser, E.: Declarative access control for WebDSL: Combining language integration and separation of concerns. In: Schwabe, D., Curbera, F. (eds.) International Conference on Web Engineering (ICWE 2008), July 2008, pp. 175–188 (2008) 7. Hemel, Z., Verhaaf, R., Visser, E.: WebWorkFlow: An object-oriented workflow modeling language for web applications. In: Czarnecki, K., Ober, I., Bruel, J.-M., Uhl, A., V¨olter, M. (eds.) MODELS 2008. LNCS, vol. 5301, pp. 113–127. Springer, Heidelberg (2008) 8. J¨arvi, J., Marcus, M., Parent, S., Freeman, J., Smith, J.N.: Property models: from incidental algorithms to reusable components. In: GPCE, pp. 89–98 (2008) 9. Kittoli, S. (ed.): Seam - Contextual Components. A Framework for Enterprise Java. Red Hat Middleware, LLC (2008) 10. Koch, N., Kraus, A., Hennicker, R.: The authoring process of the UML-based web engineering approach. In: Web-Oriented Software Technology (2001) 11. Kraus, A., Knapp, A., Koch, N.: Model-driven generation of web applications in UWE. In: Model-Driven Web Engineering (MDWE 2007), Como, Italy (July 2007) 12. Lima, F., Schwabe, D.: Application modeling for the semantic web. In: Latin AmericanWeb Congress (LA-WEB 2003), Washington, DC, USA, p. 93. IEEE Computer Society, Los Alamitos (2003) 13. MacDonald, M., Szpuszta, M.: Pro ASP. NET 3.5 in C# 2008. Apress (2007) 14. Nunes, D., Schwabe, D.: Rapid prototyping of web applications combining domain specific languages and model driven design. In: International Conference on Web Engineering (ICWE 2006), pp. 153–160 (2006) 15. Pastor, O., Fons, J., Pelechano, V.: OOWS: A method to develop web applications from weboriented conceptual models. In: Web Oriented Software Technology (IWWOST 2003), pp. 65–70 (2003) 16. Ruby, S., Thomas, D., Heinemeier Hansson, D.: Agile Web Development with Rails, 3rd edn. Pragmatic Programmers (2009) 17. Scaffidi, C., Myers, B.A., Shaw, M.: Topes: reusable abstractions for validating data. In: ICSE 2008, pp. 1–10 (2008) 18. Schwabe, D., Rossi, G., Barbosa, S.: Systematic hypermedia application design with OOHDM. In: Proceedings of the the seventh ACM conference on Hypertext, pp. 116–128. ACM, New York (1996) 19. van der Sluijs, K., Houben, G., Broekstra, J., Casteleyn, S.: Hera-S: web design using sesame. In: International Conference on Web Engineering (ICWE 2006), pp. 337–344 (2006) 20. Vdovjak, R., Frasincar, F., Houben, G., Barna, P.: Engineering semantic web information systems in Hera. Journal of Web Engineering 2, 3–26 (2003) 21. Visser, E.: WebDSL: A case study in domain-specific language engineering. In: L¨ammel, R., Visser, J., Saraiva, J. (eds.) Generative and Transformational Techniques in Software Engineering II. LNCS, vol. 5235, pp. 291–373. Springer, Heidelberg (2008) 22. Visser, E., et al.: WebDSL, 2007–2009, http://webdsl.org
Ontological Metamodeling with Explicit Instantiation Alfons Laarman and Ivan Kurtev Department of Computer Science, University of Twente, the Netherlands {a.w.laarman,kurtev}@ewi.utwente.nl
Abstract. Model Driven Engineering (MDE) is a promising paradigm for software development. It raises the level of abstraction in software development by treating models as primary artifacts. The definition of a metamodel is a recurring task in MDE and requires sound and formal support. The lack of such support causes deficiencies such as conceptual anomalies in the modeling languages. From philosophical point of view metamodels can be seen as metaconceptualizations. Metalanguages have to provide constructs for building ontological theories as a base for modeling languages. This paper describes a new metalanguage derived from the study of Formal Ontology. This metalanguage raises the level of abstraction of metamodels from pure abstract syntax to semantics descriptions based on ontologies. Thus, the language developers can make conscious choices for their modeling concepts and can explicitly define important relations such as instantiation and generalization. With this metalanguage, we aim at a precise conceptual and formal foundation for metamodeling. Keywords: Metamodeling, ontologies, instantiation semantics.
Ontological Metamodeling with Explicit Instantiation
175
Guizzardi shows that the ontological meaning of models based on formal ontology cannot be retained when these models are expressed in UML. The UML language has several anomalies that decrease the quality of models. Examples of such anomalies are construct overloading, construct redundancy, and construct incompleteness. The current metamodeling practice demonstrated by the metalanguages from the MOF family does not consider the ontological foundations of (meta-) modeling. Since the MOF corresponds to UML infrastructure, any modeling language (domainspecific or general-purpose) can potentially suffer the same anomalies found in UML. The described problems emerge due to two reasons: lack of clear understanding of the metamodeling activity regarding its ontological foundation and lack of constructs in the current metalanguages to express required explicit information. We address the problems described above by proposing a view on the content of metamodels and a new metalanguage. In our approach, metamodels are lifted from pure abstract syntax definitions to expressions of metaconceptualizations based on a foundational ontology. We retain the structural definition of a language and enhance it with ontological meaning. The philosophical justification of our approach comes from the statement of Quine that in every language an ontology can be found [19]. Thus, the metamodeling activity is a task that identifies and specifies the world structures that are of an interest to solve a given problem. The metalanguage has to be capable to express such structures. We use a simple foundational ontology Four-category Ontology to build a new metalanguage. We propose Ontology Grounded Metalanguage (OGML) as an experimental language for studying the definitions of metamodels based on ontological principles. In OGML, linguistic and ontological instantiations are treated uniformly from technical perspective. Both are defined on the basis of the explicit instanceOf definition construct in OGML. The paper is organized as follows. Section 2 clarifies the meaning of the concepts used in the paper. Section 3 presents the Four-category ontology and compares it with existing foundational ontologies. Section 4 describes OGML by examples. Section 5 discusses the main open issues and positions our approach within the existing work. Section 6 concludes the paper.
2 Conceptual Background The title of this paper refers to terms that are interpreted in different ways in the literature: ontology (ontological), metamodel(-ing), and instantiation. We give a short background on these terms and give our understanding. A commonly accepted notion of metamodel is that it is a model of models expressed in a given language. Thus, a metamodel defines the constraints for all the admissible models expressed in the language. Often, the metamodel is regarded as a definition of the abstract syntax of the language. The term ontological metamodeling (and ontological instantiation) was introduced by Atkinson and Kühne in [2]. They distinguish between linguistic and ontological metamodeling. Fig. 1 illustrates the distinction between them in the context of the three-levels MOF architecture.
176
A. Laarman and I. Kurtev
Fig. 1. Linguistic and ontological instantiation
Linguistic metamodeling is used to define metamodels of languages. The instances of metamodels are models at M1 obtained by linguistic instantiation. Linguistic metamodeling defines the form that a statement (model) in a language may take. Linguistic instanceOf delimits metalevels (e.g. M1 and M2). Ontological metamodeling allows the type/instance relation to exist within a single metalevel. In Fig. 1 the object Lassie is an instance of the class Collie. The instanceOf relation is called ontological and it is concerned with the content that a statement (model) has by representing a particular domain. The ontological instanceOf partitions models into ontological levels (e.g. O1 and O2) within a single linguistic level. The linguistic instanceOf is defined by the metalanguage used to define metamodels (for example, MOF) and the ontological instanceOf is defined by a particular modeling language (for example, UML). Guizzardi [10, 8] studies the relation between metamodels and ontologies. He recognizes two distinct purposes of metamodels: as a definition of the abstract syntax and as a definition of the world view underlying the language. Assume that we would like to define a language that describes state of affairs in a given domain (Fig. 2). 'RPDLQ 2QWRORJ\
UHSUHVHQWHGE\
'RPDLQ &RQFHSWXDOL]DWLRQ
UHSUHVHQWHGE\
LQVWDQFH2I
LQVWDQFH2I
$SDUWLFXODUVWDWH RIDIIDLUV
UHSUHVHQWHGE\
'RPDLQ $EVWUDFWLRQ
/DQJXDJH 0HWDPRGHO
LQVWDQFH2I
UHSUHVHQWHGE\
0RGHO
Fig. 2. Domain conceptualization and metamodel
The middle column in Fig. 2 represents domain abstractions and a domain conceptualization. They are conceptual entities in the modeler’s mind. In order to communicate them we define a language to be used to specify models. In Fig.2, we show the language metamodel. Guizzardi understands the term metamodel as a specification of the world view of the language, that is, the description of what a language can describe in terms of real world phenomena. The capability of the language to express certain domain is measured by comparing the elements of the metamodel to the elements of the representation of the domain conceptualization called domain ontology. Here the domain ontology is supposed to be the best possible representation of the
Ontological Metamodeling with Explicit Instantiation
177
domain conceptualization. The smaller the gap between the domain ontology and the metamodel is the more precisely the models can represent the real world phenomenon in the domain. Unfortunately, current practice of metamodeling in MDE mostly treats metamodels as definitions of the abstract syntax. Metamodelers are not aware of the real world meaning of the language constructs. The result is decreased quality of the models due to anomalies in the modeling languages. Metalanguages such as MOF are not expressive enough to articulate the difference between various modeling constructs. Consider for example the model elements Collie and Lassie in Fig. 1. They are instances of MOF classes, that is, they are MOF objects. However, the real world meaning is rather different. Lassie represents an individual, a concrete collie. Collie represents the characteristics of all the dogs of this breed, that is, it captures the universal properties of the collies. In a MOF-like architecture, this difference is not expressible. Furthermore, the metatypes Class and Object classify types and individuals respectively, so they are different. Both are instances of MOF Class and consequently indistinguishable by the MOF-based tools. Finally, the definition of the ontological instanceOf in UML is just a MOF association and is treated as any other association in the UML metamodel. We aim at retaining ontological properties of the metamodels by treating them as representation of the language underlying world view. Therefore, metamodels become more than descriptions of the abstract syntax of a language. They are enriched with explicit knowledge of the ontological nature of their constructs. When we talk about explicit instantiation, we mean that a metamodeling language provides us with a firstclass construct for defining ontological instantiations according to the understanding of Kühne.
3 Approach In MDE, metamodels are expressed in a language called metalanguage. Current metalanguages are mainly object-oriented due to pragmatical reasons such as familiarity to the developers and tool support. If we perceive a metamodel as something more than a structural definition, then we need to study the requirements for a suitable metalanguage. Consider the upper layer in Fig. 2. The domain ontology is an artifact expressed in a language. What is the domain conceptualization of this language? What is the “ideal” ontology that captures this conceptualization? According to Guizzardi, we can apply the pattern in Fig. 2 by treating domain conceptualizations as a domain of study. The result of the application is shown in Fig. 3. The set of various domain-specific conceptualizations is conceptualized in a domain-independent metaconceptualization. The representation of this metaconceptualization as an ontology is called Foundational Ontology. It is derived from the study of Formal Ontology. Several authors provide concrete versions of Fig. 3. Wand [20] uses the Bunge-Wand-Weber (BWW) ontology as a foundational ontology and UML as a language for expressing domain models. Guizzardi performs a similar study on UML by using Unified Foundation Ontology (UFO) as a foundational ontology. The two approaches study the ontological correctness of UML metamodel.
178
A. Laarman and I. Kurtev
Fig. 3. Ontologies and metaconceptualization
We aim at formulating language metamodels by using a vocabulary derived from a foundational ontology. In this way, the constructs of metamodels become instances of the most fundamental and domain-independent ontological categories. For example, UML Class and ER Entity are classified as constructs that are used to represent classifiers (or universals). Although they belong to different languages, they have a similar ontological nature. In this way, metamodels carry additional ontological information that can be used to align and compare metamodels with each other as well as to a given foundational ontology. The approach for treating metamodels as representation of metaconceptualizations leads to the following interpretation of the metalevels: • M1: models that represent reality. They are expressed in a modeling language; • M2: metamodels of modeling languages that represent the real world view embodied in the language; • M3: a metametamodel of a metalanguage. The metalanguage is used to express various worldviews. It is derived from a metaconceptualization, which in turn is derived from a foundational ontology; To proceed with this approach we need to select a Foundational Ontology. We examined several existing foundational ontologies: UFO, DOLCE [5], BWW. We used the following criteria for selecting a foundational ontology: • The ontology should be simple; • The constructs should be familiar to the developers; • The ontology should allow expressing the metamodels of the major existing programming, data description, and modeling languages, both general purpose and domain specific; Considering these requirements we opt for a descriptive minimalistic ontology, like in the approach of Guizzardi and Wand et al. Also because our work can be considered as an initial experiment in applying formal ontology theory in metamodeling, we chose a small foundational ontology called Four-category Ontology (FCO). For the sake of minimality, we did not include the refined concepts of universals such as sortal, role, category, etc. found in UFO. In FCO, the basic distinction is between individuals and universals as the most fundamental entities of being. Figure 4 depicts the concepts in this ontology. Individuals are classified as Substantial and Moment individuals. Substantial individual or just substance is something that can exist by itself without depending on the existence of other individuals. In the programming languages and modeling languages, substantial individuals are usually represented as objects (e.g. Java object and UML object).
Ontological Metamodeling with Explicit Instantiation
179
Fig. 4. The Four-Category Ontology
Moments are individuals that exist in other individuals. Moments cannot exist standalone, they are existentially dependent on at least one individual (called bearer). The relation between a moment and its bearer(s) is called Inherence relation. Moments may inhere in more than one individual. In programming and modeling languages, moments are called in various ways: slot and link in UML, field in Java, etc. Universals are entities that can be instantiated in individuals. The individuals that exemplify a universal have something in common. For example, things that consist of matter have a mass. In this case mass is a universal. Universals are classified into substantial and moment universals. Substantial universals are exemplified by substantial individuals and moment universals are exemplified by moment individuals. Instantiation relation is the relation between an individual and a universal. Universals have their representatives in the existing computer languages. UML classes correspond to substantial universals. UML attributes and associations correspond to moment universals.
4 Ontology Grounded Metalanguage OGML is our experimental metalanguage based on FCO. It helps the language developers to make conscious choices for their modeling concepts and enforces the definition of important relations such as instantiation and generalization. In the current section, we introduce OGML by defining the metamodel of a tiny subset of UML, called Simple UML. The metamodel of the language is shown in Fig. 5 (left part) together with an example model (right part) and instanceOf relations. The upper part represents class diagrams and the lower part object diagrams. A metamodel expressed in OGML consists of definitions. Definitions describe how a particular language conceptualizes the world by defining the structure of universals and individuals. In addition to this, a metamodel may define explicitly the instantiation and generalization relation of the language. UML classes, for example, the Crocodile, are substantial universals from ontological point of view. We instantiate the OGML construct SubstantialDefinition to express that the element Class in the UML metamodel defines the structure of substantial universals (lines 1-2). Classes have attributes, which are in turn moment universals, expressed as instances of the OGML construct MomentDefinition (lines 4-9). The relation between a moment definition and the substantial definition(s) is called characterization relation. The definition of Attribute states the fact that concrete attributes are attached to a single class. Since characterization relation connects two constructs, it has two roles: the universalDefinitionRole and a momentDefinitionRole. To define UML Association
180
A. Laarman and I. Kurtev
Fig. 5. The example language SimpleUML
then we could instantiate MomentDefinition with two characterization relations. This expresses the fact that the instances of associations (called links in the context of UML) are moments that inhere in two individuals. It should be noted that OGML allows a moment definition to characterize another moment definition. This ultimately allows a moment to inhere in another moment. This is a major difference with the BWW ontology where properties do not have properties. The definition how UML represents individuals follows a similar structure. Substantial individuals are defined by instantiating ObjectDefinition (lines 11-12) and moments are defined by a PropertyDefinition (lines 14-17). The fact that UML slots inhere in UML objects is expressed by the dependsOn clause. 1. SubstantialDefinition Class { 2. } 3. 4. MomentDefinition Attribute { 5. attribution universalDefinition = "Class" 6. universalDefinitionRole = "owner" 7. momentDefinitionRole = "attributes" 8. multiplicity = 1-*; 9. } 10. 11. ObjectDefinition Object { 12. } 13. 14. PropertyDefinition Slot { 15. value : String; 16. dependsOn Object role = "slots" multiplicity = *; 17. }
An important construct in OGML is the definition of instanceOf relations. In the terminology of Fig. 1, OGML metamodels defines instantiation relation as a first class construct. Let us consider the definition of UML instanceOf. We need to express the facts that (a) classes are instantiated to objects and attributes to slots; (b) an individual can be queried for the values of its moments and the values obey certain constraints. The concrete syntax is illustrated in the following listing. Line 2 states that every class is instantiated to an object. In this case, substantial universals are instantiated to substantial individuals.
Ontological Metamodeling with Explicit Instantiation
181
1. Relations UMLInstanceOfAssociationsOnLinks { 2. c : Class -> o : Object { 3. } 4. a : Attribute -> s : Slot { 5. attribution { 6. naming name <- a.name; 7. valuing [a.lowerbound .. a.upperbound] s.value; 8. typing a.type; 9. } 10. } 11. }
OGML allows substantial universals to be instantiated to other universals, thus achieving a multilevel ontological metamodeling according to Fig. 1. Line 4 states the attribute moment universals are instantiated to slots. If an UML object has a set of slots then the object may be queried by using the name of the slot, which is obtained as the name of the defining attribute (line 6). The value of the slot is stored in its value property (line 7). Lines 7 and 8 also specify multiplicity and typing constraints. Querying the value of a moment is based on the concept of attribute function used in BWW ontology. For each moment, at least one attribute function is defined. In our example, slots are unary moments and only one attribute function is needed. If a moment inheres in more than one individual then an attribute function is defined per characterization relation. Note that line 5, explicitly names the characterization to which the attribute function is assigned (attribution). OGML explicitly defines its own instanceOf relation following the same idea illustrated in the SimpleUML example. Hence, from the perspective of the tools using models, there is no technical difference between the linguistic and ontological instantiations. We built a tool [18] that allows expressing OGML metamodels and conforming models in a concrete syntax. By having two models that are related either by linguistic or ontological instanceOf, and the metamodel of their language, the tool is capable of checking the conformance between the models by using a single algorithm. The tool provides full OCL support with an extension for dealing with multiple classifications of a given model element (for example, the crocodile Jena is an instance of Object from the point of view of OGML and an instance of Crocodile from UML point of view).
5 Discussion and Related Work The design of OGML raises multiple questions. The first question is the choice of a foundational ontology. We opted for FOC due to its simplicity and the observation that its constructs are usually represented in some form in many computer languages. However, in the current version of OGML it is not possible to treat properly primitive data types such as integers, booleans, etc. They are equalized to substantial individuals, which is ontologically debatable. OGML needs to be extended with constructs for defining abstract entities, for example, mathematical structures. The second question is how to incorporate a full-fledged Foundational Ontology such as UFO. One possibility is to extend OGML. This will result in a large metametamodel with many constructs needed for conceptual modeling only. Another possibility is to define a foundational ontology as a metamodel. In any case, committing to a certain foundational ontology as a theoretical base for OGML poses an immediate limitation that all the models in the modeling space become more or less
182
A. Laarman and I. Kurtev
aligned with the world view of one ontology. However, there may be other foundational ontologies that are perfectly possible alternatives. The third question is about how OGML relates to the existing self-reflective metametamodels. OGML is defined as a self-reflective metametamodel [17]. This definition poses interesting challenges that deserve a separate paper, and is intentionally omitted here due to a lack of space. We claim that technically the linguistic and ontological instantiations are the same, at least because they are all expressed by a single OGML construct. On the other hand, the work by Kühne [14, 15] and Gasevic [6] indicate the opposite. We have to clearly state that we do not claim conceptual equivalence between the two types of instantiations. In [14, 15, 6] they are distinguished mainly on the basis of the nature of the represented systems by the so-called represents or μ relation. In our work, we do not represent this relation, hence this difference is not apparent. Furthermore, our understanding of instanceOf is a shortcut similar to the conformantTo relation used by Bezivin, Favre, and Gasevic. When we say that an object o is an instance of class C according to a certain definition of the instanceOf relation, we mean the following. o is a member of the extension of C, where the membership is checked on the basis of the semantics of OGML instanceOf definition construct (encoded in the tool) and the intensional representation of C. On the other hand, the intensional representation of C perceived simply as an expression in a given language may be a member of the extension of another class. Clearly, Gasevic made this same distinction. It should be noted that the difference between the ontological and linguistic instantiations and the nature of a metamodel are still debatable [12]. A language with at least three levels of ontological instantiation may allow representation of MOF, MOF metamodels, and MOF models in a single level. Then, the linguistic instantiation in the context of MOF becomes ontological. Thus, these two concepts appear to be relative. It is beyond the scope of this paper (and the space does not permit) to discuss this issue. Atkinson and Kühne [1] propose an approach for multilevel metamodeling in which a modeling construct is assigned with a potency that indicates how many times it can be instantiated. Although this seems reasonable from technical point of view, there is no guidance to the modeler how to assign the potency value. We believe that considering the ontological nature of a modeling construct is a clearer way to reason about the instantiations.
6 Conclusions In this paper, we proposed a view on metamodeling that treats metamodels as specifications of the world view embodied in a modeling language. This view is regarded as a metaconceptualization and is expressed in a metalanguage called OGML built upon a foundational ontology. As such, metamodels are more than just a definition of the abstract syntax of a language. In addition, we provide a construct for explicit definition of instantiation relation for the modeling languages and it is applied to the OGML itself. This enables support of ontological metamodeling based on formal ontology theory and uniform treatment of linguistic and ontological instantiation in the modeling tools. We envision at least two promising applications of this approach: interoperability in the line of [3] and enhancing the set of transformation scenarios in MDE as
Ontological Metamodeling with Explicit Instantiation
183
described in [16]. These two applications together with a proper formalization of OGML are the main directions for a future research.
References 1. Atkinson, C., Kühne, T.: The Essence of Multilevel Metamodeling. In: Gogolla, M., Kobryn, C. (eds.) UML 2001. LNCS, vol. 2185, pp. 19–33. Springer, Heidelberg (2001) 2. Atkinson, C., Kühne, T.: Model-driven development: a metamodeling foundation. IEEE Software 20(5), 36–41 (2003) 3. Atzeni, P., Cappellari, P., Torlone, R., Bernstein, P.A., Gianforme, G.: Model-independent schema translation. VLDB J. 17(6), 1347–1370 (2008) 4. Degen, W., Heller, B., Herre, H., Smith, B.: GOL: toward an axiomatized upper-level ontology. In: FOIS 2001, pp. 34–46 (2001) 5. Gangemi, A., Guarino, N., Masolo, C., Oltramari, A.: Sweetening WORDNET with DOLCE. AI Magazine 24(3), 13–24 (2003) 6. Gasevic, D., Kaviani, N., Hatala, M.: On Metamodeling in Megamodels. In: Engels, G., Opdyke, B., Schmidt, D.C., Weil, F. (eds.) MODELS 2007. LNCS, vol. 4735, pp. 91–105. Springer, Heidelberg (2007) 7. Guarino, N., Welty, C.A.: A Formal Ontology of Properties. In: Dieng, R., Corby, O. (eds.) EKAW 2000. LNCS (LNAI), vol. 1937, pp. 97–112. Springer, Heidelberg (2000) 8. Guizzardi, G.: Ontological Foundations for Structural Conceptual Models. PhD thesis. University of Twente (2005) ISBN 90-75176-81-3 9. Guizzardi, G., Ferreira Pires, L., van Sinderen, M.: An Ontology-Based Approach for Evaluating the Domain Appropriateness and Comprehensibility Appropriateness of Modeling Languages. In: Briand, L.C., Williams, C. (eds.) MoDELS 2005. LNCS, vol. 3713, pp. 691–705. Springer, Heidelberg (2005) 10. Guizzardi, G.: On Ontology, ontologies, Conceptualizations, Modeling Languages, and (Meta)Models. In: DB&IS 2006, pp. 18–39 (2006) 11. Heller, B., Herre, H.: Ontological Categories in GOL. Axiomathes 14, 71–90 (2004) 12. Hesse, W.: More matters on (meta-)modelling: remarks on Thomas Kühne’s "matters". Software and System Modeling 5(4), 387–394 (2006) 13. Jouault, F., Bézivin, J.: KM3: a DSL for Metamodel Specification. In: Gorrieri, R., Wehrheim, H. (eds.) FMOODS 2006. LNCS, vol. 4037, pp. 171–185. Springer, Heidelberg (2006) 14. Kühne, T.: Matters of (Meta-)Modeling. Software and System Modeling 5(4), 369–385 (2006) 15. Kühne, T.: Clarifying matters of (meta-) modeling: an author’s reply. Software and System Modeling 5(4), 395–401 (2006) 16. Kurtev, I., van den Berg, K.: MISTRAL: A Language for Model Transformations in the MOF Meta-modeling Architecture. In: MDAFA 2004, pp. 139–158 (2004) 17. Laarman, A.W.: An Ontology Based Metalanguage with Explicit Instantiation. Master’s thesis, University of Twente (2009) 18. OGML website, http://wwwhome.cs.utwente.nl/~laarman/ogml/ (retrieved at 15-9-09) 19. Quine, W.V.O.: Ontological relativity’ and other essays. Columbia University Press, New York (1969) 20. Wand, Y., Storey, V., Weber, R.: An Ontological Analysis of the Relationship Construct in Conceptual Modeling. ACM Trans. DB Syst. 24(4), 494–528 (1999)
Verifiable Parse Table Composition for Deterministic Parsing August Schwerdfeger and Eric Van Wyk Department of Computer Science and Engineering University of Minnesota, Minneapolis, MN {schwerdf,evw}@cs.umn.edu
Abstract. One obstacle to the implementation of modular extensions to programming languages lies in the problem of parsing extended languages. Specifically, the parse tables at the heart of traditional LALR(1) parsers are so monolithic and tightly constructed that, in the general case, it is impossible to extend them without regenerating them from the source grammar. Current extensible frameworks employ a variety of solutions, ranging from a full regeneration to using pluggable binary modules for each different extension. But recompilation is time-consuming, while the pluggable modules in many cases cannot support the addition of more than one extension, or use backtracking or non-deterministic parsing techniques. We present here a middle-ground approach that allows an extension, if it meets certain restrictions, to be compiled into a parse table fragment. The host language parse table and fragments from multiple extensions can then always be efficiently composed to produce a conflict-free parse table for the extended language. This allows for the distribution of deterministic parsers for extensible languages in a pre-compiled format, eliminating the need for the “source code” grammar to be distributed. In practice, we have found these restrictions to be reasonable and admit many useful language extensions.
1
Introduction
In parsing programming languages, the usual practice is to generate a single parser for the language to be parsed. A well known and often-used approach is LR parsing [1] which relies on a process, sometimes referred to as grammar compilation, to generate a monolithic parse table representing the grammar being parsed. The LR algorithm is a generic parsing algorithm that uses this table to drive the parsing task. However, there are cases in which it is desirable to generate different portions of a parser separately and then put them together without any further monolithic analysis. An example can be found in the case of extensible programming languages, wherein a host language such as C or Java is composed with several extensions, each possibly written by a different party. The
This work was partially funded by the National Science Foundation grants #0347860 and #0429640.
M. van den Brand, D. Gaˇ sevi´ c, J. Gray (Eds.): SLE 2009, LNCS 5969, pp. 184–203, 2010. c Springer-Verlag Berlin Heidelberg 2010
Verifiable Parse Table Composition for Deterministic Parsing
185
connection tripdb with table trip_log ; class TripLogData { boolean examine_trips ( ) { rs = using tripdb query { SELECT dist, time FROM trips WHERE time > 600 } ; boolean res = false ; foreach (int dist, int time) in rs { Unit