System Level Design
.NET
with
Technology Edited by
El Mostapha Aboulhamid Frédéric Rousseau
Boca Raton London New ...
111 downloads
1320 Views
4MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
System Level Design
.NET
with
Technology Edited by
El Mostapha Aboulhamid Frédéric Rousseau
Boca Raton London New York
CRC Press is an imprint of the Taylor & Francis Group, an informa business
CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2010 by Taylor and Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number: 978-1-4398-1211-2 (Hardback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright. com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging‑in‑Publication Data System level design with .NET technology / El Mostapha Aboulhamid, Frederic Rousseau. p. cm. Includes bibliographical references and index. ISBN 978‑1‑4398‑1211‑2 (hardcover : alk. paper) 1. Systems on a chip‑‑Design and construction. 2. System design. 3. Microsoft .NET Framework. I. Aboulhamid, El Mostapha, 1951‑ II. Rousseau, Frederic, 1967‑ TK7895.E42S9744 2010 004.2’1‑‑dc22 Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com
2009026890
Dedication
For Karine, my parents, and all my family, for their help and support. Fr´ed´eric Rousseau To my spouse and my mother, to all those who helped me, influenced me, or endured me throughout all these years, I express my profound gratitude. El Mostapha Aboulhamid
Contents
Preface About the Editors Contributor Biographies 1
xi xiii xv
Introduction Fr´ed´eric Rousseau, James Lapalme, and El Mostapha Aboulhamid 1.1 Needs of a Complete and Efficient Design Environment . . . . . . 1.1.1 The .NET Framework . . . . . . . . . . . . . . . . . . . . . 1.1.2 Characteristics Expected from a Design Environment . . . . 1.1.3 ESys.NET: A .NET Framework Based Design Environment 1.1.4 Our Design, Simulation and Verification Flows . . . . . . . 1.2 Design Flow with ESys.NET . . . . . . . . . . . . . . . . . . . . . 1.2.1 Modeling and Specification . . . . . . . . . . . . . . . . . . 1.2.2 Our System Design Flow . . . . . . . . . . . . . . . . . . . 1.2.3 Analysis of the Design Flow . . . . . . . . . . . . . . . . . 1.3 Simulation Flow with ESys.NET . . . . . . . . . . . . . . . . . . . 1.3.1 Building the Simulation Model . . . . . . . . . . . . . . . . 1.3.2 Separation of Concerns between Models and Simulation . . 1.3.3 Towards a Multi-Level Simulation Model . . . . . . . . . . 1.4 Observer-Based Verification Flow with ESys.NET . . . . . . . . . 1.4.1 Overview of the Observer-Based Verification Flow . . . . . 1.4.2 Building and Binding the Verification Engine to the Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.3 Comparison with the Same Verification Flow in SystemC . . 1.4.4 Towards a Powerful Verification Flow . . . . . . . . . . . . 1.5 Conclusion and Book Organization . . . . . . . . . . . . . . . . .
1
21 22 23 23
I
Modeling and Specification
27
2
High-Level Requirements Engineering for Electronic System-Level Design Nicolas Gorse 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Framework . . . . . . . . . . . . . . . . . . . . . . . . . .
1 3 7 8 11 12 13 13 16 17 17 17 18 20 20
29 29 31 31
v
vi
System Level Design with .NET Technology
2.3
2.4
2.5
2.6 3
2.2.2 Software Engineering Approaches . . Proposed Solution . . . . . . . . . . . . . . 2.3.1 Formalism . . . . . . . . . . . . . . . 2.3.2 Linguistic Pre-Processing . . . . . . . 2.3.3 Consistency Validation . . . . . . . . 2.3.4 Elicitation of Missing Functionalities Experimental Results . . . . . . . . . . . . . 2.4.1 Automatic Door Controller . . . . . . 2.4.2 Industrial Router . . . . . . . . . . . 2.4.3 RapidIO . . . . . . . . . . . . . . . . Linking to a UML-Based Methodology . . . 2.5.1 Integrated Methodology . . . . . . . 2.5.2 Case Study . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
The Semantic Web Applied to IP-Based Design: A Discussion on IP-XACT James Lapalme, El Mostapha Aboulhamid, and Gabriela Nicolescu 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Models of Architecture and XML . . . . . . . . . . . . . . . 3.2.1 GSRC and MoML . . . . . . . . . . . . . . . . . . . 3.2.2 Colif and Middle-ML . . . . . . . . . . . . . . . . . . 3.2.3 Premadona . . . . . . . . . . . . . . . . . . . . . . . 3.3 SPIRIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 IP-XACT Metadata Format . . . . . . . . . . . . . . . 3.3.2 Tight Generator Interface (TGI) . . . . . . . . . . . . 3.3.3 Semantic Consistency Rules (SCR) . . . . . . . . . . 3.4 The Semantic Web . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Resource Description Framework . . . . . . . . . . . 3.4.2 RDF Schema . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Web Ontology Language (OWL) . . . . . . . . . . . . 3.4.4 SPARQL . . . . . . . . . . . . . . . . . . . . . . . . 3.4.5 Tool for the Semantic Web: Editors and Jena . . . . . 3.4.6 SWRL and Jena rules . . . . . . . . . . . . . . . . . . 3.5 XML and Its Shortcomings . . . . . . . . . . . . . . . . . . 3.5.1 Multiple Grammars . . . . . . . . . . . . . . . . . . . 3.5.2 Documentation-Centric . . . . . . . . . . . . . . . . . 3.5.3 Biased Grammar Model . . . . . . . . . . . . . . . . 3.5.4 Limited Metadata . . . . . . . . . . . . . . . . . . . . 3.6 Advantages of the Semantic Web . . . . . . . . . . . . . . . 3.6.1 Richer Semantic Expressivity . . . . . . . . . . . . . 3.6.2 Separation between Semantics and Encoding . . . . . 3.6.3 Federated Data Model . . . . . . . . . . . . . . . . . 3.6.4 Simpler Data Manipulation . . . . . . . . . . . . . . . 3.7 Case Study – SPIRIT . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
33 35 35 38 40 42 43 43 46 48 49 50 51 52 55
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
55 57 57 59 59 61 61 62 63 63 64 65 66 68 69 70 72 72 73 74 75 76 76 77 77 78 79
3.7.1
Advantages Applied to Version Management (SPIRIT 1.2 to SPIRIT 1.4) . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.2 Advantages Applied to Modeling . . . . . . . . . . . . . . . 3.7.3 Impact on TGI . . . . . . . . . . . . . . . . . . . . . . . . 3.7.4 Implications for SPIRIT Semantic Constraint Rules (SCRs) . 3.7.5 Dependency XPath . . . . . . . . . . . . . . . . . . . . . . 3.8 Cost of Adoption . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
II 5
Translating Design Pattern Concepts to Hardware Concepts Luc Charest, Yann-Ga¨el Gu´eh´eneuc, and Yousra Tagmouti 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Object-Oriented Translations . . . . . . . . . . . . . . . . . 4.2.1 Translation of Classes and Their Members . . . . . . 4.2.2 Translation of Object Encapsulation . . . . . . . . . 4.2.3 Translation of Object Instantiation . . . . . . . . . . 4.2.4 Translation of Object Method Calls . . . . . . . . . 4.2.5 Translation of Polymorphism . . . . . . . . . . . . . 4.2.6 Translation of Inheritance and Casting Operations . . 4.3 Constraint and Assumptions for Design Pattern Synthesis . 4.3.1 Constraint: Dynamism of the Hardware . . . . . . . 4.3.2 Assumption: Compiled Once . . . . . . . . . . . . . 4.3.3 Assumption: Limited Number of Objects . . . . . . 4.3.4 Assumption: Pattern Automatic Recognition Problem 4.3.5 Translation Cost versus Performance . . . . . . . . . 4.4 Design Pattern Mappings . . . . . . . . . . . . . . . . . . . 4.4.1 Creational Patterns . . . . . . . . . . . . . . . . . . 4.4.2 Structural Patterns . . . . . . . . . . . . . . . . . . 4.4.3 Behavioral Patterns . . . . . . . . . . . . . . . . . . 4.5 Operational Description of Design Patterns . . . . . . . . . 4.5.1 PADL in a Nutshell . . . . . . . . . . . . . . . . . . 4.5.2 PADL in Details . . . . . . . . . . . . . . . . . . . 4.5.3 PADL by Examples . . . . . . . . . . . . . . . . . . 4.5.4 MIP . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.5 ESys.NET Code Generation . . . . . . . . . . . . . 4.6 Related Work & Background . . . . . . . . . . . . . . . . . 4.6.1 Object Oriented Synthesis & Patterns in Hardware . 4.6.2 Original Patterns . . . . . . . . . . . . . . . . . . . 4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
80 80 82 83 85 86 88 88 89
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simulation and Validation Using Transaction-Based Models for System Design and Simulation Amine Anane, El Mostapha Aboulhamid, Julie Vachon, and Yvon Savaria
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
89 92 92 92 93 94 94 97 98 98 98 99 99 100 100 101 101 103 103 104 104 106 109 111 111 111 113 113
115 117
vii
viii
System Level Design with .NET Technology 5.1 5.2 5.3
. . . . . . . . . . . . .
117 119 122 123 127 129 134 135 139 141 145 150 153
Simulation at Cycle Accurate and Transaction Accurate Levels Fr´ed´eric P´etrot and Patrice Gerin 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Short Presentation of the Cycle Accurate and Transaction Accurate Abstraction Levels . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Cycle Accurate Simulation . . . . . . . . . . . . . . . . . . . . . . 6.3.1 General Description . . . . . . . . . . . . . . . . . . . . . . 6.3.2 System Properties . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 Formal Model . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.4 Simulator Implementation . . . . . . . . . . . . . . . . . . 6.4 Transaction Accurate Simulation . . . . . . . . . . . . . . . . . . . 6.4.1 General Description . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 Native Simulation for MPSoC . . . . . . . . . . . . . . . . 6.5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . .
155
5.4
5.5 5.6 6
7
Introduction . . . . . . . . . . . . . . . . Motivations . . . . . . . . . . . . . . . . Transaction Model . . . . . . . . . . . . 5.3.1 STM Concurrent Execution . . . . 5.3.2 STM Implementation Techniques 5.3.3 STM Implementation Examples . STM Implementation Using .NET . . . . 5.4.1 SXM Transactional Memory . . . 5.4.2 NSTM Transactional Memory . . 5.4.3 PostSharp . . . . . . . . . . . . . 5.4.4 STM Framework . . . . . . . . . Experimental Results . . . . . . . . . . . Conclusion and Future Work . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
An Introduction to Cosimulation and Compilation Methods Mathieu Dubois, Fr´ed´eric Rousseau, and El Mostapha Aboulhamid 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Cosimulation . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Preliminaries: Managed and Unmanaged Code . . . . 7.2.2 Same Binary File . . . . . . . . . . . . . . . . . . . . 7.2.3 Shared Memory . . . . . . . . . . . . . . . . . . . . . 7.2.4 TCP/IP . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.5 COM . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.6 Static Function . . . . . . . . . . . . . . . . . . . . . 7.2.7 Pinvoke . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.8 Managed Wrapper . . . . . . . . . . . . . . . . . . . 7.2.9 Comparison of Cosimulation Implementations . . . . . 7.3 Compiler Framework . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Common Intermediate Format . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
155 156 157 157 158 158 162 167 167 169 171 175 177
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
177 180 181 182 182 182 183 184 185 187 189 191 191
7.4 8
III 9
7.3.2 Internal Data Structures . 7.3.3 Code Generation . . . . 7.3.4 Compiled RTL . . . . . 7.3.5 Compiled TLM . . . . . Conclusion . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Timing Specification in Transaction Level Models Alena Tsikhanovich, El Mostapha Aboulhamid, and Guy Bois 8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Expressing Timing . . . . . . . . . . . . . . . . . . . . . . 8.3 Timing Analysis . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Linear Constraint Systems . . . . . . . . . . . . . . 8.3.2 Max Constraint Systems . . . . . . . . . . . . . . . 8.3.3 Max-Linear Systems . . . . . . . . . . . . . . . . . 8.3.4 Min-Max Constraint Systems . . . . . . . . . . . . . 8.3.5 Min-Max-Linear Constraint Systems . . . . . . . . . 8.3.6 Assume-Commit Constraint Systems . . . . . . . . . 8.3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . 8.4 Min-Max Constraint Linearization Algorithm . . . . . . . . 8.4.1 Min-Max Constraint Linearization . . . . . . . . . . 8.4.2 Algorithm Optimization . . . . . . . . . . . . . . . 8.4.3 Experimentations . . . . . . . . . . . . . . . . . . . 8.5 Timing in TLM . . . . . . . . . . . . . . . . . . . . . . . . 8.5.1 Timing Modeling at CP+T Level . . . . . . . . . . . 8.5.2 Communication Exploration at PV and PV+T Levels 8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
203 . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
Practical Use of ESys.NET ESys.NET Environment James Lapalme and Michel Metzger 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 My First Model . . . . . . . . . . . . . . . . . . 9.2.2 Modeling Concepts . . . . . . . . . . . . . . . . 9.2.3 Process Method . . . . . . . . . . . . . . . . . . 9.2.4 Signals . . . . . . . . . . . . . . . . . . . . . . 9.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Simulator Semantics and Construction . . . . . . 9.3.2 Semantics . . . . . . . . . . . . . . . . . . . . . 9.4 Verification . . . . . . . . . . . . . . . . . . . . . . . . 9.4.1 Overview . . . . . . . . . . . . . . . . . . . . . 9.4.2 Case-Study Model: The AHB-Lite Bus . . . . . 9.4.3 How to Specify Properties . . . . . . . . . . . . 9.4.4 Verifying Temporal Properties during Simulation
193 199 199 200 202
203 204 207 208 209 211 215 216 217 221 221 221 225 227 230 231 232 238
239 241 . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
241 242 242 246 253 253 260 260 261 270 270 270 272 275
ix
x
System Level Design with .NET Technology
9.5
9.4.5 Linking Different Tools . . . . . . . . . . . . . . . . . . . . 277 9.4.6 Observing Results . . . . . . . . . . . . . . . . . . . . . . 278 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
References
281
Index
297
Preface
The introduction of VHDL in 1987 and SystemC in 1999 gave a big boost to the electronic design community and played an important role in the development of system level design. We were involved with both processes early on. Rich with the experience with these two environments, we wanted to explore new frontiers that can enhance these systems and hopefully constitute a synergy with them. This resulted in the development of ESys.NET in 2003. This book had its origin in the overall work done at the Universit´e de Montr´eal, on the system level design environment named ESys.NET. It is based on the .NET framework and brings better management of metadata, introspection, and interoperability between tools. Interoperability is one of the most important aspects of frameworks such as .NET. It enabled us to develop, for example, assertions-based observers of ESys.NET models without any interference with the modeler. This can be seen as enabling separation of concerns. Encouraged by our experience with ESys.NET, we continued our efforts to try to build a bridge between advances in the software community and the needs in the EDA community for new ideas and algorithms. We pursued the development of our environment by exploring new mechanisms such as transaction modeling to help in distributed simulation, or Web semantics to help with IP (Intellectual Property) reuse. The collaboration between the SLS group of TIMA in Grenoble (France) and the LASSO group at the Universit´e de Montr´eal was a determining factor in the completion of this work. While the two groups have the same global objectives, they have complementary strengths. The LASSO group is more focused on modeling and verification, while the SLS group has valuable expertise in architecture, System-on-Chip, and code generation. Both also have a common interest in accurate and efficient simulation. Sabbatical stays and exchanges helped to strengthen this collaboration. This work summarizes our efforts and covers three main parts: (1) modeling and simulation, including requirements specification, IP reuse and applications of design patterns to Hardware/Software systems; (2) simulation and validation, covering transaction-based models, accurate simulation at cycle and transaction levels, cosimulation and acceleration technique and timing specification and validation; (3) practical use of the ESys.NET environment concludes this work. We would like to thank all the authors for their timely response and the numerous iterations to complete their respective chapters. Readers are encouraged to visit the companion Web site http://www.esys-net.org/ and send us their comments to enrich it.
xi
About the Editors El Mostapha Aboulhamid Universit´ e de Montr´ eal - Canada
El Mostapha Aboulhamid is active in modeling, synthesis and verification of hardware/software systems. He obtained an engineering degree from ENSIMAG, France in 1974 and a Ph.D. from Montr´eal University in 1984. He is currently a professor at the Universit´e de Montr´eal. He worked in the 1980s and early 1990s on built-in self-test techniques, design for testability, multiple fault automatic test generation and complexity of test. He has been involved in the current methodology of design of hardware/software systems since its early beginning in the 1980s and 90s with the introduction of VHDL. He helped in the acceptance of this methodology in Canada, by collaborating with industrial partners and by delivering intensive courses on modeling and synthesis both in academia and industrial settings. He also collaborated on the standardization of SystemC. He was the director of GRIAO, a multi-university research center which led to the creation of the current ReSMiQ Research Centre. He has supervised more than 80 graduate students. Dr. Aboulhamid has been general or technical program chair of many conferences, such as ISSS/CODES, ICECS, NEWCAS, ICM, AICCSA. He also served on steering or program committees of different international conferences. In 2003 his team developed ESys.NET as an environment for modeling and simulation. He is looking into ways of using distributed simulation to overcome the bottleneck caused by simulation of large digital systems. He is also interested in advance software approaches in system level design and reuse. He has multiple collaborations nationally and abroad on different aspects of system-on-chip modeling and verification and has been an invited professor at both the Universit´e de Lille and the Universit´e de Grenoble in France. Fr´ed´eric Rousseau Laboratoire TIMA UJF/INPG/CNRS - France
Fr´ed´eric Rousseau has been a professor since 2007 (and associate professor since 1999) at the University Joseph Fourier (UJF) of Grenoble - France where he teaches computer science and he has been a researcher in the TIMA lab since 1999. He received the engineer degree in computer science and electrical engineering from UJF in 1991 and a Ph.D. in computer science in 1997 from the University of Evry (near Paris). His research interests have concentrated on system-on-chip design and architecture, and more precisely the design and validation of hardware/software interfaces. He is now focusing on prototyping, software code generation for the multiprocessor system-on-chip, and communication on such systems. He also served on program committees of different international conferences, workshops or symposiums. In 2006, he spent one year of sabbatical at the Universit´e de Montr´eal, working on ESys.NET.
xiii
Contributor Biographies Amine Anane Universit´ e de Montr´ eal - Canada
Amine Anane is a Ph.D. student in the Department of Computer Science and Operations Research of the Universit´e de Montr´eal. He received the computer science engineer degree from the Faculty of Science of Tunis in 1998. He worked as an IT consultant for 7 years before joining the Universit´e de Montr´eal for an M.S. degree in computer science. Since he obtained an accelerated admission to the Ph.D. program in 2006, he has been with the LASSO laboratory which is interested in the formal design and verification methods of microelectronics systems. His research is related to the study of a design methodology suitable to formal verification and correct-by-construction incremental refinement. Guy Bois ´ Ecole Polythechnique de Montr´ eal - Canada
Guy Bois is a professor in the Department of Computer Engineering at Ecole Polytechnique de Montr´eal. His research interests include hardware/software codesign and coverification for embedded systems. Guy Bois has a Ph.D. in computer science from the Universit´e de Montr´eal. Luc Charest Universit´ e de Montr´ eal - Canada
Luc Charest is currently operational research developer and works on crew management algorithms aiding the airline industry for AD OPT, a Kronos division in Montreal. With a strong C++/software engineering background, Luc Charest was one of the first to introduce design patterns to the system design domain. Graduated from the Universit´e de Montr´eal in 2004, he then made a postdoc at the LIFL of Universit´e des Sciences et Technologies de Lille (France) in 2005. Some of his current work and other theme interests are operational research and functional programming. Mathieu Dubois Universit´ e de Montr´ eal - Canada
Mathieu Dubois is a Ph.D. candidate in computer science at the University of Montr´eal. He holds a B.Ing. and a M.Sc.A. in electrical engineering from, respectively, the Ecole de Technologie Sup´erieure de Montr´eal and the Ecole Polytechnique de Montr´eal. His research interests include heterogeneous compilation and the acceleration of discrete-event simulations.
xv
xvi
System level design with .Net technology
Patrice Gerin Laboratoire TIMA INPG/UJF/CNRS - France
Patrice Gerin received an M.S. degree in microelectronics from the University Joseph Fourier and is currently working toward the Ph.D. degree from INPG, Grenoble, France. From 1999 to 2005, he worked as an embedded software engineer in the industry. He is currently with the System-Level Synthesis Group in the TIMA Laboratory, Grenoble, France. His research interests include hardware/software simulation and embedded software validation in MPSoC design. Nicolas Gorse Universit´ e de Montr´ eal - Canada
Nicolas Gorse obtained a Ph.D. in computer science from the University of Montr´eal in 2006. He is currently working within the Analog Mixed Signal Group at Synopsys, Inc. His research interests are simulation, verification and formal methods. Yann-Ga¨el Gu´eh´eneuc Ecole Polythechnique de Montr´ eal - Canada
Yann-Ga¨el Gu´eh´eneuc is associate professor in the Department of Computing and Software Engineering of Ecole Polytechnique de Montr´eal where he leads the Ptidej team on evaluating and enhancing the quality of object-oriented programs by promoting the use of patterns at the language-, design-, or architectural-levels. James Lapalme Universit´ e de Montr´ eal - Canada
James Lapalme obtained a Ph.D. from the Universit´e de Montr´eal. He has spent most of his graduate research on the application of modern software engineering technologies to embedded systems design. He developed ESys.NET in the context of his master’s thesis. Over the past couple of years he has become increasingly interested in the use of semantic Web technologies for the development of CAD tools. In addition to his academic career, James is a professional in the private sector specializing in the areas of enterprise architecture and enterprise information management. Michel Metzger Universit´ e de Montr´ eal - Canada
Michel Metzger is a research-and-development engineer at STMicroelectronics Canada. His research interests include system-level design and verification of embedded platforms. Michel Metzger has an engineering degree in computer science from Ecole Sup´erieure en Sciences Informatiques, Sophia-Antipolis, France and a Master of Science from the University of Montr´eal. Gabriela Nicolescu ´ Ecole Polythechnique de Montr´ eal - Canada
Gabriela Nicolescu is currently associate professor at Ecole Polytechnique de Montr´eal teaching embedded systems design and real-time systems. She received
List of authors
xvii
a degree in engineering and Ms.S. from the Polytechnic University of Romania in 1998. She received her Engineer Doctor degree from the National Polytechnique Institute, Grenoble, France in 2002. Her research work is in the field of specification and validation of heterogeneous systems, and multiprocessor system-on-chip design. Fr´ed´eric P´etrot Laboratoire TIMA INPG/UJF/CNRS - France
Fr´ed´eric P´etrot is professor in computer architecture at the ENSIMAG, a school of higher education of the Institut Polytechnique de Grenoble since 2004. He is also heading the System Level Synthesis Group of the TIMA laboratory, in Grenoble, France. Prior to this position, Fr´ed´eric P´etrot was assistant professor in the SoC design Lab of the University Pierre et Marie Curie, Paris, France, where he was a major contributor of the Alliance VLSI CAD System (Ph.D. in 1994 on this topic), and of the disydent digital design environment. Yvon Savaria ´ Ecole Polythechnique de Montr´ eal - Canada
Professor Yvon Savaria holds a Canada Research Chair in architecture and design of advanced microelectronic systems. He has 27 years of experience with IC design and testing and has conducted research on design methods of digital, analog, and mixed signal integrated circuits and systems. He has extensively published on a wide range of microelectronic circuits and system design methods. He has active collaborative projects with several organizations and was a founder of LTRIM, a Polytechnique spin-off that commercialized an invention resulting from his research at Polytechnique and for which he obtained an NSERC Synergy award in 2006. Yousra Tagmouty Universit´ e de Montr´ eal - Canada
Yousra Tagmouty received a M.Sc. in computer science from the Universit´e de Montr´eal in 2008. Her thesis describes a meta-model to specify the behavior of the solutions of design patterns and/or of object-oriented programs. She applied this meta-model to specify several design patterns and to automatically generate the corresponding source code in ESys.NET. She works in a software company in Montreal. Alena Tsikhanovich Universit´ e de Montr´ eal - Canada
Alena Tsikhanovich is the Ph.D. candidate of the Department of Computer Science and Operations Research of the Universit´e de Montr´eal. She has received the bachelor degree in mathematics from the Belarus State University and the master degree in computer science from the Universit´e de Montr´eal. Her current research projects are related to the domains of hardware/software system modeling and timing verification based on temporal constraint analysis.
xviii
System level design with .Net technology
Julie Vachon Universit´ e de Montr´ eal - Canada
Julie Vachon is an associate professor of computer science at the Universit´e de Montr´eal. She is a member of the GEODES software engineering laboratory. Her current research interests include formal software specification and verification (model-checking and theorem proving), aspect orientation, feature interaction detection, transaction models and distributed systems verification.
1 Introduction Fr´ed´eric Rousseau Laboratoire TIMA UJF/INPG/CNRS - France James Lapalme DIRO Universit´e de Montr´eal - Canada El Mostapha Aboulhamid DIRO Universit´e de Montr´eal - Canada 1.1 1.2 1.3 1.4 1.5
Needs of a Complete and Efficient Design Environment . . . . . . . . . . . . . . . . . . . Design Flow with ESys.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulation Flow with ESys.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Observer-Based Verification Flow with ESys.NET . . . . . . . . . . . . . . . . . . . . . . . . Conclusion and Book Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1
Needs of a Complete and Efficient Design Environment
1 11 17 20 23
The electronics industry is driven by two opposite forces: the continuous increasing demands for full features products and the continuously decreasing time customers are willing to wait to receive these products. This reality has pushed the industry to develop hardware platforms which support greater amounts of parallelism in order to implement an increasing number of product features. Moreover, in order to cope with time-to-market constraints, there is a tendency for manufacturers to design very flexible platforms which facilitate reuse for multiple products. MPSoCs are a perfect example of these powerful and flexible platforms. MultiProcessor System-on-Chips (MPSoCs) cannot be designed with classic design flows and traditional Computer Aided Design (CAD) tools because of the considerable amount of software and complex hardware elements which define them. With each new generation, MPSoCs increase in complexity to meet market demands; these have the adverse effect that they are more and more difficult to design and to design for. It has become urgent to decrease the time between system specification and physical implementation. The required time between these two steps depends greatly on the methodology which is used, on the quality of supporting CAD tools as well as their integration.
1
2
System Level Design with .NET Technology
Reducing the time between initial specification and final implementation does not mean that any implementation is acceptable; final system quality is still expected. This supposes that a portion of the design time is spent executing simulations, or more generally in a verification process. The simulation of MPSoC based system is a challenge that only increases as we move forward in the design process. Typically, Register Transfer Level (RTL) simulation is not possible for the entire system. In addition, it requires emulation or prototyping in conjunction with a significant size of test vectors. An alternative way to ensure the quality of these types of systems is verification. Since formal verification techniques require a significant amount of specialized knowledge on the part of designers and testers, assertion-based verification techniques seem promising techniques for widespread use. A complete design environment, in our view, must provide a design flow which is based on an effective design methodology, supported simulation approach and verification techniques (e.g., assertion-based technique) which are efficient. The design flow should be composed of several refinements which are mainly achieved with the help of automatic refinement tools. The final physical implementation is achieved by starting with an initial specification which is refined by going through the refinement step sequence of the design flow. The simulation approach should consist of deriving a simulation model from a specification. The complexity of this model and its homogeneity in terms of concepts and languages depends on how deep one is in the design process as well as the nature of the initial specification. Finally, an assertionbased verification technique requires that observers monitor the simulation model during execution and validate that it conforms to properties/constraints defined by the designer. In addition, we believe that a modern design environment should integrate these three aspects (design methodology, simulation and verification) in a manner that achieves independence between them; this is called separation of concerns. If separation of concerns is achieved, an efficient simulation model would not include any unexpected information coming from the modeling process for example. Also, the verification process should not necessitate any modifications to the system model for we consider that this process is a part of the system design environment and not the system model. This independence supposes that the designer may go from one aspect to another quickly and is aided by automation tools and programming techniques provided by the design environment. Within the context of system design, simulation and verification, it can be said that these three problems are orthogonal. Design environments are quite difficult to build and their design has tremendous impact on their effectiveness, their ease of use, their ability to promote good designs and their capability to be extended easily. The software community, over the past decade, has invested a great deal of effort in the domain of software design. Moreover, the conflicting needs of the software industry, for “rapid time-to-market,” quick design and implementation solutions which have a low “cost of ownership,” flexible and may easily evolve have caused the emergence of novel software engineering technologies. Examples of such software engineering technologies are patternoriented software designs, modern development platforms such as .NET, Semantic Web technologies, and modern requirement specification and management.
Introduction
3
Despite all the efforts of the community, SoC designers still need new design environment solutions. This is mainly because a set of requirements, that is mandatory for an efficient modeling, simulation and verification solution, is still not provided by a single existing environment. Moreover, the design of many available environments has various drawbacks which we believe hinder their evolution towards fulfilling all the set of requirements. Our work on Embedded System design with .NET (ESys.NET) is focused on solving the complete design environment problem. The first version of ESys.NET was developed in late 2003 on top of .NET framework, and seems to be a good solution that permits independence between the design, the verification, and the simulation aspects. It also possesses key characteristics which are required in order to define it as a next generation design environment. This introduction presents the reason behind the need for a new design environment. It also presents how an approach such as ESys.NET which encompasses a methodology, a set of tools and a framework all based on .NET is well positioned to become tomorrow’s design environment. This Section then explains properties and characteristics expected from a design environment, gives our methodology and design flow, and describes ESys.NET. Section 1.2 details our design flow. Section 1.3 focuses on an efficient simulation approach and Section 1.4 focuses on assertionbased verification techniques. Section 1.5 discusses how ESys.NET gets most of the required properties and presents the overall book organization. Before delving into the core of the manner we wish to present an overview of the .NET framework which has made our approach possible. The next section will present the .NET framework with a focus on some key capabilities which it offers and which are key to our approach.
1.1.1
The .NET Framework
Virtual machines, intermediate languages and language independent execution platforms are not new. They were present with UNCOL in the 1950’s to the JVM in the 1990’s. Researchers have been fascinated with these concepts because they permit an alternative path to native compilers that have several benefits. The .NET core represented by the Common Language Infrastructure (CLI) is a new virtual machine execution platform which was standardized in December 2001 by ECMA and in April 2003 by ISO [5]. 1.1.1.1
General Presentation of .NET Framework
The .NET Framework is a new platform that simplifies component-based application development for the highly distributed Internet environment. What sets the .NET framework apart from its rivals (such as the Java platform) is that its core, the Common Language Infrastructure (CLI), was designed from the ground up to be a multilanguage environment [99]. At the center of the CLI is a unified type system, the Common Type System (CTS), and a Common Intermediate Language (CIL), which supports high-level notions (e.g., classes) and which is platform and programming language independent. The CTS establishes a framework enabling cross-language integration, type safety and high performance code execution. The CLI has four main
4
System Level Design with .NET Technology
components: • The Common Type System. The Common Type System (CTS) provides a rich type system that supports the types and operations found in many programming language families. It is intended to support the complete implementation of a wide range of programming languages. • Metadata. The CLI uses metadata to describe and reference the types defined by the Common Type System. Metadata is stored (“persisted”) in a way that is independent of any particular programming language. Thus, metadata provides a common interchange mechanism for use between tools that manipulate programs (compilers, debuggers, etc.). Metadata is also used to increase the CIL representation of a source code. • The Common Language Specification. The Common Language Specification (CLS) is an agreement between language designers and framework (class library) designers. It specifies a subset of the CTS Type System and a set of usage conventions. Languages provide their users the greatest ability to access frameworks by implementing at least those parts of the CTS that are part of the CLS. Similarly, frameworks will be most widely used if their publicly exposed aspects (classes, interfaces, methods, fields, etc.) use only types that are part of the CLS and adhere to the CLS conventions. • The Virtual Execution System. The Virtual Execution System (VES) implements and enforces the CTS model. The VES is responsible for loading and running programs written in CIL. It provides the services needed to execute managed code and data, i.e., automatic memory management (Garbage Collection), thread management, metadata management, etc. The VES also manages the connection at runtime of separately generated code modules through the use of metadata (late binding). The CLI also gives the specification number of class libraries providing important functionalities such as thread interaction and reflection. It also provides XML [205] data manipulation, text management, collection functionality, web connectivity, etc. Alongside the CLI core, .NET Framework presents a set of classes that add supplementary features such as web services, native and web forms, transaction, scalability and remote services, etc. 1.1.1.2
The C# Language
The C# [2] language is a simple, modern, general-purpose object-oriented programming language that has become an ECMA and ISO standard [5]. It was intended for developing software components suitable for deployment in distributed environments. Although most C# implementation (Microsoft , DotGNU [3]) used the CLI standard for its library and runtime support, other implementations of C# need not, provided they support an alternate way of getting at the minimum CLI features required by this C# standard. In order to give the optimum blend of simplicity, expressiveness and performance, C# supports many software engineering principles such
Introduction
5
as strong type checking, array bounds checking, detection of attempts to use uninitialized variables and automatic garbage collection [18]. C# is intended for writing applications for both hosted and embedded systems ranging from the very large that use sophisticated operating systems, down to the very small having dedicated functions. Although C# applications are intended to be economical with regards to memory and processing power requirements, the language was not intended to compete directly on performance and size with C or assembly language. 1.1.1.3
Advanced Programming Features
Since the C# language relies on a runtime with the CLI’s features, it inherits interesting characteristics such as a unified type system, thread and synchronization support, and automatic memory management just to name a few. It is sometimes hard to separate the C# language and the CLI because they are quite symbiotic so .NET/C# or CLI/C# will sometimes be used throughout this document. There are three advanced programming features that .NET/C# support that have considerable impact on software design: reflectivity, attribute programming and events/delegates. 1.1.1.4
Introspection and Reflectivity
A program that can explicitly see, understand and modify its own structure is said to have introspective capabilities [71] [144]. Reflectivity is a property that a program may possess that permits its structure to be accessible to itself. The information that is accessible through introspection is called meta-information or metadata. Metadata permits the creation of simple but powerful tools that help the design and development of software such as debuggers, class browsers, object inspectors and interpreters. There exist many languages such as Java and C# that are said to be reflective because they provide meta-information to programs written with them. Most reflective languages implement the reflection property by the means of a supporting run-time like the Java JVM or the .NET CLR, in this way separating the metainformation from the base program. These concepts are illustrated in the reflection capabilities of the C# programming language where it is possible to query the CLI to know the structure of an object. To such a query, the CLI returns an object that is an instance of a metaclass named Type that fully describes the type. 1.1.1.5
Attribute Programming
Both the C# and the CLI standards defined a method for adding declarative information (metadata) to runtime entities. Since the .NET Framework has at its core the CLI, it also has metadata support. The mechanism through which metadata may be added to a program is called attribute programming [144]. Attributes can be added to all the elements of a program except the body of properties and methods. It is even possible to add declarative information to the assembly, which is a unit of deployment that is similar to an .exe or .dll file on the Windows platform. As mentioned before, attributes in .NET may be used to add extra information about elements in a program but they also provide an elegant, consistent approach to adding declarative
6
System Level Design with .NET Technology
information to runtime entities that permit a new way of designing software. The mechanism to retrieve these attributes (metadata) at runtime has also been standardized, permitting software components developed by different teams or even companies to interact and discover each other through metadata. Metadata may even be used to control how the program interacts with different runtime entities. It is this capability that we exploit in ESys.NET. 1.1.1.6
Delegates
Callbacks are an important concept in the implementation of event handling. Here is a good informal definition for the concept of a callback: A scheme used in event-driven programs where the program registers a subroutine (a “callback handler”) to handle a certain event. The program does not call the handler directly but when the event occurs, the run-time system calls the handler, usually passing its arguments to describe the event. Most modern programming languages have constructs that permit the implementation of callbacks such as function pointers in C++ and interfaces in Java [18]. The .NET Framework and C# use delegates [157] to address event handling. The concept of delegates improves upon function pointers by being object-oriented and type-safe and improves upon interfaces by allowing the invocation of a method without the need for inner class adapters. Also, delegates are inherently multicasting - a delegate contains a list of bounded methods that are invoked in sequence when the delegate is invoked. Another interesting difference between a delegate and a function pointer is that the delegate may contain an instance method in its invocation list, not only a static method as with function pointers, because the delegate keeps the information of the object that the method should be called on. There are three steps in defining and using delegates: declaration, instantiation, and invocation. Delegates are declared using delegate declaration syntax. 1.1.1.7
Delegates and Reflectivity
A powerful combination is the use of reflection in collaboration with delegates. Compared to most language .NET/C# permits methods to be bound to a delegate at runtime. For example, in C++, the name of the method that is bound to a function pointer must be known at compile time, but in .NET/C# it is possible to create a delegate type object with a static method of the Delegate class. The method takes as parameters an object and the name of the method which should be bound to the created delegate. With reflectivity, it is possible at runtime to discover the names of the various methods that an object supports, so it is possible to dynamically discover an object’s method and bind it to a delegate. Dynamic method discovery and delegate creation are useful because they enable a simple and elegant solution for implementing entry points in a simulation kernel for third-party tools. We also use them to dynamically create the processes of our simulation model.
Introduction 1.1.1.8
7
C#/.NET 2.0 and Generics
At the end of 2005, Microsoft released the next official versions of .NET and the C# programming language, both versioned 2.0. Of the many enhancements made to .NET and C#, the implementation of generics types is especially important. Generic programming, popularized by C++ (templates), is a programming paradigm used by statically type languages in which a piece of software is specified in a way abstracting type information. When a piece of generic software must be used, a programmer must specify a type binding which specializes the software for a given type. Most often, the compiler will duplicate the original generic code but with the type information added in order to enforce static typing. Both Java 1.5 and C++ used this kind of compile time resolution in order to implement the generic programming paradigm. The designers of .NET took a different approach then the above when implementing generics. The .NET technology is built from the ground up on metadata. In .NET, when a piece of software is compiled, it is transformed into a language agnostic intermediate format called CIL. The CIL instruction-set is based on an abstract stack machine. The intermediate format contains a lot of metadata about the structure of the software that was compiled. The concept of generics was implemented as an extension of the metadata and instruction-set. Because of its implementation strategy, .NET generics are resolved at runtime and not compile time. This makes a big difference at runtime. Through the use of reflection, it is possible to determine if an object is an instance of a generic type as well as the bound types of a generic instance. It is also possible to dynamically bind a generic type and create instance of that binding. This implementation of generics allows the runtime analysis of generic bindings, the creation of new bindings and the instantiation of those bindings which are very powerful features that we shall explore later in the article. These capabilities, to our knowledge, are unique for a statically type programming environment. Moreover, the implementation of generics proposed by .NET permits the definition of constraints in order to restrict the types that may be bound to a generic definition.
1.1.2
Characteristics Expected from a Design Environment
In the domain of system-on-chip design, simulation and verification, many efforts have been invested and several contributions have been proposed. Designers currently have at their disposal efficient standard solutions for design or modeling, simulation and verification (e.g., VHDL, Verilog, SystemC) but none are close to perfect. Most currently available design/simulation/verification solutions fall into one of two categories: those based on a framework approach and those based on a domain specific language. The first group may be represented by solutions such as SystemC, ESys.NET, JHDL and Plotemy 2. The second group may be represented by solutions such as SpecC, VHDL, and SystemVerilog. Because of the huge complexity of hardware and software components in MPSoC, the design of such a system is based on CAD tools to give greater place to component reuse, and to speed up the overall design process. Moreover, all the steps of the design flow may need one or several CAD tools. Therefore, a design environment is
8
System Level Design with .NET Technology
necessary to integrate all these CAD tools in only one environment, and to facilitate the interaction between all of them. What are the characteristics expected for such an environment? At first, this design environment should support the fastest and best way to go through the design flow to get as fast as possible the final implementation. Such an environment will permit SoC design at a very high level of abstraction and will support automatic abstraction refinement through several levels. This supposes that the best CAD tools are available in the environment. In the general way, we expect from a design environment three qualities: easiness of use, performance, and ability to extend and enrich the design flow with third party tools. Easiness of use concerns its ability to facilitate the work of the designer. Such a design environment should provide an efficient way to go through the design flow to get as fast as possible the final implementation, starting from the specification. Of course, the best quality of the final implementation is always awaited. At first, an efficient design environment supposes that good CAD tools, for specification, modeling, refinement, simulation, and verification are part of the flow, and that they are able to work all together. Secondly, this supposes as well that the designer could use his favorite or well-suited languages to specify or model his system. Indeed, part of the system specification may require a specific semantic or a specific language, which could be different for the rest of the system. For instance, a system specification needs an IP (Intellectual Property) written in HDL for the implementation model, or in C++ language for its behavioral model. Performances are also awaited not only for the final implementation, but also with fast design flow, efficient simulation or cosimulation and verification processes. A low level of abstraction model contains so many implementation or architectural details that the overall system simulation is too slow, and in practice, not useful. New techniques with mixed abstraction levels may help to speed up the simulation. For example, replacing some components with their Transaction Level Model (TLM) is one solution to increase the simulation performance. The ability to support extensions with third party CAD tools allows improving the overall quality of the design flow. New CAD tools appear every year in the everincreasing CAD tools concurrent domain that may enrich the design flow. Due to the .NET introspection capabilities (see Section II.2.4), the connection of new tools in our design environment is not a tedious task. It requires only few developments, but the main property is that our design environment kernel (mainly for simulation) is no longer modified. Frameworks such as SystemC are more difficult to interact with. The main drawback of SystemC is its lack of introspection. This is mainly because SystemC is based on C++, which has limited introspection capabilities [71].
1.1.3
ESys.NET: A .NET Framework Based Design Environment
Built on top of the .NET Framework, ESys.NET is a system-level modeling and simulation environment, which therefore benefits from the above facilities. In terms of programming, ESys.NET is a class library. In terms of simulation semantics,
Introduction
9
ESys.NET is based on an event simulation kernel. The rest of this section gives the necessary features to well understand our methodology and tools, but more details and explanations about ESys.NET can be found in [137]. 1.1.3.1
Model Representation
A high level specification model can be written using a programming language supported by .NET. Since these languages were not intended for SoC design, a software framework, which implements specific SoC concepts such as modules, communication channels, ports, and system clock, is required. At this level of system specification, .NET brings several advantages, the most important being: • The thread management functionality, which can be exploited for software component descriptions, • The metadata, which may be used to annotate models, • The garbage collector, which alleviates the designer memory management task. . . . In terms of modeling, we represent systems as a set of interconnected modules communicating through communication channels by the intermediary of their interfaces. Modules may be hierarchical – composed of several modules – or they may be a leaf in the hierarchy and only consist of an elementary behavior, which may be described with one or several processes. Processes may be methods (that cannot be explicitly suspended) or threads (that may be suspended and reactivated) or light threads, called fibers, that cannot be preempted. The same concepts (module, communication channel, and interface) may be represented at different abstraction levels. Interfaces are directly provided by the .NET Framework. An interface is composed of a set of declarations of methods, but provides no implementation for these methods. Our environment unifies the concept of high-level interfaces and ports. In fact, ports are implemented as predefined interfaces provided by ESys.NET (e.g., inBool, outBool, inoutBool and inInt). 1.1.3.2
Metadata and Attribute Programming
The CLI standards defined a way for adding declarative information (metadata). Since the .NET has at its core the CLI, it also has metadata support. The mechanism through which metadata may be added to a program is called attribute programming. Attributes can be defined and added to basically all the elements of a program. We believe that a standard mechanism for adding metadata to the description is vital in order to create better Electronic Design Automation (EDA) tools. Indeed, the source code description could be used by different tools (simulators, synthesis tools . . . ), and adding specific information could be useful. The mechanism for retrieving these attributes (metadata) has also been standardized, permitting software components developed by different teams or even
10
System Level Design with .NET Technology
companies to interact and discover each other through metadata. Metadata may even be used to control how the program interacts with different run-time entities; it is this capability that we will exploit. 1.1.3.3
Introspection and Reflection
Generally the capabilities of EDA tools depend heavily on the accuracy to collect information concerning the system to design and/or the design requirements. The main source of information is the specification or the model source code. Tools may extract information by the means of static analysis of a model specification and may even extract more information by inferring data from this static analysis. However, static analysis may be very tedious and confines the model to be predetermined before an EDA tool can be of use, leaving no room for dynamic model construction and analysis; all dynamic elements of a model may not be determined (like signal values, resolution of polymorphism, etc.). Reflection and automated introspection fill the gap left by static analysis and are regarded as necessary for the development and use of EDA tools. Introspection is the ability of a program to provide information about its own structure during runtime. Automated introspection is possible thanks to reflection mechanisms provided by programming environments [71]. In the system-level design context, various kinds of important information may be reflected. The three main information categories are (i) design information (structural and behavioral), (ii) run-time infrastructure information, and (iii) modeling information provided by attribute programming or other means. This information when reflected can allow one to design EDA tools to navigate, manipulate, compose, and connect components, verify the interface compatibilities, and synthesize appropriate interfaces. 1.1.3.4
The Core of ESys.NET
The core of ESys.NET is based on a set of classes, which encapsulate the concepts of modules, communication channels, signals, interfaces, and events. Userdefined modules are obtained by derivation from the abstract BaseModule class. User-defined communication channels are derived from the abstract BaseChannel class (derived from the BaseModule class). The BaseChannel class offers the functionality of updating channels at the end of a simulation cycle. User-defined signals derive from the abstract BaseSignal class. This class provides the functionality of storing information that will be readable only at the next simulation cycle. It also provides a transaction event (that indicates that a new data is stored in the signal and its value is equal to the precedent stored value) and a sensitive event (that indicates that a new data is stored and its value is different from the precedent stored value). ESys.NET is able to detect all instances present within the model and registers them automatically. All modules, channels, events, or interfaces instantiated within the hierarchy of a user module are automatically registered in the simulator’s database. This is possible because of the reflection mechanism provided by .NET. In addition,
Introduction
11
it is possible to access information stored in the simulator about the system and execution status (e.g., the current simulation time and the current module name). One of the important characteristics of ESys.NET is that it offers the designer the possibility to easily specify execution directives by tagging the different concepts in the specification. These directives concern the association of a thread or parallel method (MethodProcess) semantics to a class method, the addition of a sensitivity list for a method or a thread, the call of methods before or after the execution of a certain process, and the execution of a class method at a specific moment during the execution. This was implemented by exploiting attribute programming provided by .NET. Our system specification is compiled into a CIL file, which can be seen as a metadata-oriented textual representation, and contains all the information (class hierarchy, metadata. . . ) to produce an executable representation that still contains all these information to be collected by reflexivity.
1.1.4
Our Design, Simulation and Verification Flows
Our design methodology looks like a traditional system design flow, starting from a usually non-executable specification with constraints and requirements. Some transformations lead us to an executable specification. Then, for complex system, in a partitioning step it is decided which part of the system will be implemented in hardware and which part will be implemented in software. Figure 1.1.a) that represents such a flow hides communication design between these parts. Hardware and software are then designed separately, involving a large number of specialists from different design spheres. A typical software programmer is generally not qualified to design specialized hardware, and vice versa. The integration of components from both design flows can be a difficult task and is one of the major challenges in hardware/software system design. At each step of this flow, the designer may run a simulation in order to validate his refinement or to check if his model still respects constraints and requirements. To do that, one should build the simulation model. The difficulty to get this model depends on where one is in this design process, as running software application including operating software on a hardware model of processor is not as trivial as running a high-level algorithm in C. So, the simulation flow, depicted in Figure 1.1.b), takes all the required models from the design flow, and determines what is needed to build the entire simulation model depending on concepts and semantic of the different models. Our verification flow (Figure 1.1.c) is based on the simulation model. All the assertions to be verified are transformed in software programs, called observers, and gathered in the verification engine. Taking advantages of .NET capabilities, the design environment may send information to the verification engine at each step of the simulation. Each time the verification engine receives information, it checks if all the concerned assertions are still true. By this way, the simulation model is no longer modified; neither is the simulation kernel. All the details of this verification flow will be given later in Section 1.4.
12
System Level Design with .NET Technology
FIGURE 1.1: Design, simulation and verification flows
1.2
Design Flow with ESys.NET
The main objective of our design flow is to generate an implementation, starting with a specification written in high level programming languages. The implemented flow has been developed to show the feasibility of such a flow and to highlight that .NET gets capabilities to be the base of a design environment. Other researches have
Introduction
13
tried to achieve the same goal of specifying a system with a high-level programming language and to generate a complete synthesizable description. In [96], the C language is used to specify a hardware/software system to be implemented on a NAPA 1000 chip, which is composed of a RISC processor and reconfigurable logic. The hardware/software partitioning is specified using pragma directives. C and structural VHDL code are generated by their tool. In [208], a similar methodology is used, but smaller code fragments are compiled to produce specialized operations that are used to speed up frequently executed instructions. In [154], a large number of high-level programming languages are supported to specify a system, since their tool uses a modified version of the GCC compiler suite to produce code. They present a flexible methodology where the user specifies a hardware/software partitioning; the software is executed by an ARM processor and the hardware is implemented on reconfigurable logic known as RED. The tool is able to give feedback on this partitioning by profiling the system. The user is then able to modify the system accordingly. Finally, in [47], authors present a tool that supports a Java bytecode specification of a system. Based on the parallelism of the examined code, the tool automatically determines which parts of the system should be implemented in hardware on reconfigurable logic. Software parts of the system, modified to include communications to the reconfigurable units, are compiled for a traditional processor where a JVM is available. Our approach differs from previous work since it uses the .NET framework. Its capabilities are used to ease the development of the tool and to provide a powerful but simple development environment for the users.
1.2.1
Modeling and Specification
The system specification has always been a problem. The informal nature of requirements prevents using systematic validation techniques, leaving errors that can be propagated through the design process. Chapter 2 provides a solution for modeling and validation of system requirements. When requirements have been validated, one can build a system model using high abstraction levels. The Transaction Level Modeling (TLM) is one solution. At the higher levels, the system behavior is represented by a network of parallel processes that exchange complex or high level data structures. The system description is then refined by adding functional or temporal details yielding to the construction of timed or untimed models. This facilitates the design space exploration. TLM with .NET will be detailed in Chapter 8.
1.2.2
Our System Design Flow
Our proposed design flow is illustrated on Figure 1.2. A system is specified using a high-level programming language supported by the .NET framework. By using attributes to add custom metadata, the designer indicates parts of the system to be implemented in hardware or software. The system is compiled to CIL by .NET tools. Our tool takes as input the produced CIL specification and automatically generates a
14
System Level Design with .NET Technology Specification, simulation, verification .NET Framework (multi language) CIL CIL to IR IR Hardware/software partitioning Software part specification
Hardware part specification
Compilation IR to assembly language
IR to synthesizable VHDL using CASM Synthesizable VHDL Hardware synthesis
Microprocessor synthesizable model
Prototyping on FPGA Prototype
FIGURE 1.2: Our system design flow
software executable, to be run on a processor, and custom hardware blocks, synthesizable on an FPGA board (with restriction in features to be implemented). The tool processes the input system with three different stages: the translation of the CIL to an intermediate format, which reads and analyzes the input to build an internal representation (IR) of the system. The software flow produces an executable for the code of methods labeled as software. The hardware flow generates the custom hardware blocks for the methods identified as hardware. Using the appropriate tools, the whole system can then be synthesized on the FPGA board. 1.2.2.1
Common Intermediate Language (CIL)
Used as the input language of our tool, the CIL is a low-level, platform independent, object-oriented, stack-based, assembly-like language “executed” by the VES (it is in fact JIT compiled to native code). The CIL supports a set of basic types defined by the CLI (integers, floating-point numbers, and reference types used to access objects) and user-defined types. Our use of the CIL leads to the ability to take as input systems specified in a wide array of languages, as well as being able to use introspection mechanisms to easily investigate the structure of the system being processed. Introspection helps to build the internal representation of the input system including information about classes, methods and fields. In addition, as metadata is used to partition the system, it is easy to identify which methods should be converted to software or hardware.
Introduction 1.2.2.2
15
Intermediate Representation (IR) Generation
An intermediate representation of the methods code is necessary for an easier translation of methods to hardware or software. The IR we use must be register based, to simplify the conversion of the CIL (stacked-based) to a register-based processor assembly code; it must also expose instruction-level parallelism for an optimal implementation of hardware methods. The IR is built from a Control-Flow Graph (CFG) representation of the code of methods, where a basic block of the CFG is the basic processing element of the computation. The IR is represented by Data-Flow Graphs (DFG), where the input data and intermediate node leafs are the operations executed on the data. Many trees can be produced for a basic block because some operations do not produce any result (like “store” instructions). The trees are sorted according to the order of its operations in the original CIL. The conversion of the stack-based representation to a registerbased representation is done using a traditional technique of simulating the effects of each CIL instruction on the stack. Registers are also assigned during the process, although, at that moment, we suppose an infinite number of registers, which will be called symbolic registers. So each node in the IR has an assigned symbolic register where the result of its operation shall be stored temporarily. 1.2.2.3
Software Design Flow
The software flow takes as input the IR of the code of the software-identified methods, and produces an assembly code file (for the Microblaze in our case). From the IR of each basic block in the CFG, the assembly code is generated. This process involves a traversal of the IR graph, where code for the operands of a node is produced before the code of the node itself. Code is always generated in an ordered fashion according to the original CIL code, because of the order in which an IR node stores its operands. The registers used in the assembly code are the symbolic registers produced earlier. Register allocation is the next phase in the flow and has the goal of assigning physical registers, of which there is a limited number, to the symbolic registers. A linear-scan algorithm is currently used, for its ease of implementation and its good results, although a graph-coloring scheme should produce better results. The final step is the generation of the code file. 1.2.2.4
Hardware Design Flow
The hardware is described in CASM (Channel-based Algorithmic State Machine), an intermediate-level hardware description language [11]. CASM is based on the ASM concept but it has been extended to support higher abstraction levels. CASM is capable to handle typed data token transfers and processing over self-synchronized channels. Moreover, state calls and returns allow the easy implementation of methods and full recursion is supported. The tool automatically generates clean synthesizable RTL VHDL and its associated SystemC model.
16
System Level Design with .NET Technology
CASM’s intermediate abstraction level makes it an ideal candidate to automate the refinement from high-level languages supported by .NET environment and lowlevel synthesizable description. From the IR, our tool needs to produce ASMs that achieve the same functionalities as described in the high-level specification. ASMs are produced by identifying the inputs, which are the arguments of the methods, the output, which is the return value, the local signals, which are the local variables of the method, and the states of the ASM. States are created by scheduling IR instructions from the IR dependency graph. IR instructions are scheduled at different steps, where a given IR instruction is only scheduled if its predecessors in the IR dependency graph have already been scheduled. That way, instruction-level parallelism is exploited by executing multiple instructions simultaneously. The steps of the scheduling will be the states of the ASM. For each state, the next state is either the one corresponding to the next step of the scheduling, or another if the state contains a control-flow modifying instruction (a branching instruction). In that case, the next step is the one containing the instruction targeted by the branching instruction. For each step of the ASM, CASM code is easily generated to produce the complete ASM structure, thanks to the high-level syntax and semantic of the CASM language. Finally, states need to be added to receive arguments of the method and to send the result once computed. A new state is added per incoming argument, in which the argument is assigned to a local variable. One last state is needed to send back the result where it belongs. 1.2.2.5
Target Architecture and Prototyping Platform
For the implementation of the system running the software and containing the custom hardware blocks, we use a Xilinx Virtex II Pro FPGA. Using Xilinx’s Embedded Development Kit (EDK) [8], a system can be easily designed and implemented on the target board. Our system implements a soft-core Microblaze processor running the software. The custom hardware blocks are synthesized in the logic cells nearby the processor. The hardware blocks communicate with the processor through dedicated two-way channels, the Fast Simplex Link (FSL) interfaces. These channels allow 32-bit data to be sent to the hardware blocks with only one assembler instruction. In the middle of the channel is a queue that stores data temporarily, until the hardware block is ready to receive the data. A maximum of eight custom hardware blocks can be connected to the processor through the FSL interfaces. Existing FSL interfaces greatly simplify the communication synthesis between hardware and software parts.
1.2.3
Analysis of the Design Flow
More experiments are being conducted to better judge the efficiency of this flow and associated tool. However, our primary goal is not to prove that the tool can lead
Introduction
17
to great optimizations of a system, but rather to show the feasibility of our approach by providing a working version of the tool. As of now, our tool already supports a large subset of the CIL, including but not limited to arithmetic and logic operations on many data types, control-flow modifications and object-oriented features. We intend to extend this subset to accept a wider range of system specifications. In our case, hardware/software interfacing has not been a problem. The main reason is that communication mechanisms are integrated in the Microblaze architecture, which greatly simplifies the communication refinement or synthesis and a simple bus was used as a communication network. We are currently working in heterogeneous multiprocessor architecture, based on Microblaze and PowerPC processors, as well as an efficient communication system. Also, the generation of the software executable uses traditional, but simple, methods. The code produced by our tool could surely benefit from better generation techniques to run faster. The same could be said for the generation of the hardware, where simple scheduling algorithms are used for the creation of the ASM states. But none of industrial existing tools address this problem, even if we are going to a synthesizable VHDL model afterwards. An original way of using this design flow is based on Design Pattern concept. Design pattern represents a general solution to an occurring problem in object-oriented software design. They capture design expertise in reusable for both hardware and software design flows. Applying design pattern concept in our flow is described in Chapter 4.
1.3 1.3.1
Simulation Flow with ESys.NET Building the Simulation Model
ESys.NET is an event based simulator. The simulation kernel is an important aspect of the environment because it is at its heart and it has many important functions such as elaboration (and instantiation) of the simulation model, and the process scheduling. Elaboration is the activity which takes as input a specification model and generates a simulation model. Process scheduling is the core activity of the simulator which executes the simuation model. Chapter 9 and Part II of this book will provide much more detail about the simulation flow.
1.3.2
Separation of Concerns between Models and Simulation
The intermediate format of .NET (CIL) is used as a neutral and public format between tools. The CIL model is complete in the sense that all of the metadata present within the model description and code structure and organization are kept. We say also that the CIL model is clean because very little nonmodel-dependent information
18
System Level Design with .NET Technology
is present. In order to illustrate the two points, we will take SystemC as an example. SystemC, based on a C++ library, has inherited all the wonders of the C++ language such as speed, power, and flexibility, but has also inherited all its evils such as error-prone, complex, and, most importantly, lacking run-time features like memory management, type verification, and introspective capabilities. When C++ IP blocks are compiled, a lot of information is lost: structures are flattened, abstract data structures are minimized, and it is not possible to obtain precise information on elements found in the model. Thus, even though C++ IP block are fast to simulate, this reduced observability makes them hard to manipulate. In regard to “cleaness” in SystemC’s, its source code has a fairly clean layout that can be used for static analysis, but when it is compiled all the macros are expanded adding a lot of code that has nothing to do with the model, but is necessary for SystemC’s elaboration phase and for binding the model to the simulator kernel. Tools that must analyze SystemC IP blocks must deal with a polluted model description. Our methodology offers a fairly clear separation of concerns between description models and simulation kernel. We believe that the strong separation is better than the environment for component-based approaches. The work done in SoCML [140] provides a view of the next generation of ESys.NET where complete separation of concerns is achieved.
1.3.3
Towards a Multi-Level Simulation Model
System Level Description Language (SLDL) is ideal for hardware description at different abstraction levels, but a gap still remains for software modeling and simulation. An illustrative example is given by SystemC, which lacks in features to support embedded software modeling. In SystemC software and hardware threads are scheduled in the same way. The SystemC scheduler is proper to hardware and is not preemptive which means it is not possible to simulate multi-task complex software applications. Even in the version coming out of SystemC, they are still having trouble with process control (suspend, resume, kill. . . ), dynamic task creating, scheduler modeling, dynamic creation of primitive (mutex, semaphore. . . ) and preemption [95]. 1.3.3.1
Behavioral Model Simulation
The simulation model building presented previously concerns mainly the simulation of the behavioral specification. All .NET threads, representing methods of modules and channels, are scheduled following the ESys.NET algorithm in .NET environment. At this level, the hardware and software part of the system have not been defined, and only the design functionality can be verified. The scheduling mechanism is launched by starting each created thread. The .NET scheduler takes care of scheduling the different threads and time constraints are added by using .NET Thread’s libraries. Since there is no hardware or software distinction, there is no communication mechanism required. By an iterative process and using profiling tools, the designer is able to create
Introduction
19
different threads simulating the behavior of their designs and identify functionality errors, the critical sections, performance deficiencies and the required synchronization resources to get a better idea in the partitioning of the application. 1.3.3.2
Hardware and Software Model Simulation
After hardware/software partitioning, the system model is composed of hardware part and software part. At this level, for the software part, the operating system is not yet decided and the hardware architecture executing the software is still completely abstracted. The software application calls OS functions defined by OS independent API. Presently, we have an initial library programmed to simulate the behavior of an OS which contains tasks, scheduler, semaphores, mailboxes and mutexes. Hardware components, still represented as .NET threads, are scheduled using ESys.NET algorithm. The interface between hardware and software is simple since they are sharing the same environment and several variables. Consequently, we use a high level native model for software components. This model aids developers in exploring and validating different OS services as well as exploring different OS without the effort of learning how the different APIs of the OS work. Since the target architecture is still completely abstracted, developers do not need to find any OS port, as they are able to simulate their behavior. For OS exploration, our platform has to provide simulation libraries consisting in functional models for different OS. 1.3.3.3
Hardware and Software with Specific OS Simulation
At this level, hardware components are still scheduled using ESys.NET algorithm. Software components are supposed to run under a selected OS. For our simulation, the port of the chosen OS for win32 is required, and it allows different optimizations. The scheduling mechanism and time constraints are activated by calling the function of the selected OS. The application is tested with the selected OS, but the final target architecture is still abstract. There is a difference in the interface between the hardware and the software. Mainly for simplicity, we chose to keep ESys.NET for the hardware description, as ESys.NET provides some fast interoperability techniques (PInvoke, Callback. . . ) between the different parts. The hardware model may interrupt the software at any time by using the OS interrupt mechanism. 1.3.3.4
Cosimulation
The environment is flexible enough to test the hardware component with a SLDL/HDL like SystemC or VHDL. For cosimulation, hardware and software components communicate by using .NET interoperability techniques [8]. Since unmanaged code, which is code that is not managed by the .NET environment, is being used for the software aspect (usually described in C, C++ or assembly language) and managed code for the hardware (encapsulated in ESys.NET thread), some interop-
20
System Level Design with .NET Technology
erability techniques offered by the .NET environment must be used. Given that one is now in the .NET environment, COM technology could be used, and one is able to interact with tools like ModelSim. An other solution of cosimulation is given in Chapter 7, allowing cosimulation of multi-language descriptions.
1.4 1.4.1
Observer-Based Verification Flow with ESys.NET Overview of the Observer-Based Verification Flow
The verification layer in ESys.NET is based on the observer paradigm. The state of the model under simulation is verified by the verification engine (the observers) which observes the runtime evolution of the system and checks it against a set of formal properties. Figure 1.3 shows the links between simulation and verification flows. Figure 1.3.A is the actual ESys.NET simulation process and Figure 1.3.B describes our verification process. The properties to be verified (Figure 1.3.B) are expressed using Linear Temporal Logic (LTL) with a properties editor. LTL formulae are stored in a text file apart from the model (properties file). Each formula is then transformed into an automaton (i.e., an observer) to be later executed in parallel with the simulation. An event of the system model, such as the rising edge of a clock signal for instance, is used to synchronize the system model under simulation with observer automata. When a given event occurs, each observer automata executes transitions whose labels match the new current state of the model. If an automaton has no such transition, the execution fails and the property observed by this automaton is declared “invalid.” In fact each property to be verified is transformed in one automaton, and then in one observer. Observers are gathered in a verification engine. At the end of each step of simulation, the verification engine receives a synchronization signal from the simulator, updates the information required, and runs observers. The implementation of the LTL based verification tool using reflection is described in Chapter 9. As illustrated, the simulation flow of ESys.NET remains independent of the verification process. It is important to mention that the verification process modifies neither the model nor the simulator. This is possible thanks to introspection capabilities of .NET. Introspection is the ability of a program to provide information about its own structure during runtime. Automated introspection is possible thanks to reflection mechanisms provided by programming environments [71]. All the information is extracted from the compiled version of the model, no matter the language used, as long as it is supported by .NET. In fact, the source code of the system model is not needed to perform verification since introspection on a standard intermediate format can retrieve all the required information.
21
Internal representation generation
Introduction
A
B
ESys.NET Simulation Flow
Verification Flow
Esys.NET System Model C# source
Properties Editor
Compilation Properties File Esys.NET System Model CIL code & metadata
Intro spec 1 tion
Transformation into automata
Binding phase
Executable System Model
Execution Phase
Model Instantiation And Elaboration
Esys.NET Simulator
2 Introspection
3 Callbacks
Observers Internal Representation
Verification Engine
FIGURE 1.3: Simulation and verification flows
1.4.2
Building and Binding the Verification Engine to the Simulation Model
The construction of the observers needs the structure of the simulation model, mainly to check if signals or events used in the properties to be verified really exist in the model (Figure 1.3, step 1). To do that, and to avoid redundant operations, a design structure tree is built which represents the module’s hierarchy and contains the signals, the events and the observable variables of the model. This information is available in the type structure that defines the model (system model after compilation in A). The tree construction is greatly facilitated by the use of reflection. It does not require the tedious implementation of a parser. The design structure tree allows getting static information about the model. Some others, for example the value of a
22
System Level Design with .NET Technology
signal, can only be found dynamically, still with the help of introspection capabilities. The verification engine should keep the path to model object to get this dynamic information during simulation. While the observers are built, the model is instantiated and elaborated by the ESys.NET kernel. The resulting executable model is browsed (Figure 1.3, step 2) to retrieve data that will be used during simulation to evaluate the state of the model. Indeed, observers need to be connected to the elaborated model before the start of the simulation (Figure 1.3, step 2). For now, observer operands only contain paths to model objects but are not bound to an instance of the model. The current reference to the object is required to get the value of the operands and thus evaluate the property. Once again, introspection is used, given a reference to an object and the name of one of its field, .NET offers a mechanism to obtain the value of the field. Since a reference to the top-level and the path of the operands are available, a reference to the object pointed by the path can be obtained. Each observer is synchronized by an event of the model. This defines the execution step of the automaton and a sampling of tested variables and signals. When the synchronization event occurs, properties of the executable transitions are evaluated and valid transitions are performed. Observers are executed in a synchronous way with the simulator, in order to check the model when it is stable. As simulation and verification are executed onto a sequential machine, we chose to suspend simulation during observers evaluation. So, their execution time increases the simulation time. With a distributed execution of the verification engine onto another machine, the simulation time remained unchanged. The ESys.NET simulator offers a method to bind a procedure to any event of the model. To do so, a reference to the event object is needed. When the synchronization event occurs, the observer is flagged to be executed once the model will be stable. The state of the model during a simulation cycle is difficult to evaluate since it highly depends on the scheduling of processes. The ESys.NET simulator provides a cycleEnd hookpoint triggered when the model is stable. The callback methods registered within the ESys.NET simulator are called when the verification engine applies the observers on the simulated model (Figure 1.3, step 3).
1.4.3
Comparison with the Same Verification Flow in SystemC
Other frameworks such as SystemC are more difficult to interact with. This is mainly due to the fact that SystemC is based on C++ which has limited introspection capabilities [71]. Additional libraries to support reflection are needed. In SystemCXML [165] Doxygen’s parser is used to extract an Abstract System Level Description (ASLD) and capture it into a XML file. The PINAPA project [153] adds a patch to the GNU C++ compiler and to SystemC. The compiler is used to get the abstract syntax tree (AST) of SystemC processes. This approach is justified by the fact that the binary format resulting from a compilation highly depends on the compiler used and the architecture of the computer it targets and a lot of information about the model is lost during this process. Developing a tool to directly explore the object file would be tedious. Thus, it is quite inevitable to use the source code files to get the information. However SystemC
Introduction
23
core library allows basic introspection thanks to the get child objects method [160] which returns all sub-components of a given element in a design. This approach only allows retrieving SystemC objects (modules, signals, clocks. . . ); internal variables are thus still hidden. Beside this, the design of SystemC’s simulator makes it difficult to integrate external tools. The modification of the simulator source code is often the only solution [165]. Basic callbacks are available inside the design but are not usable from external libraries. The SystemC Verification (SCV) standard provides an API to develop test-benches at the transaction level [7]. It is black-box verification since the internal behavior of the system is not taken into consideration, contrary to the approach presented in this paper. Nonetheless, introspection is used on input data to support arbitrary data types. The strategy exploits template mechanism to extend data type with introspection capabilities. The drawback is that one needs to describe manually user defined types. As noted, some significant effort is required to provide introspection and extend SystemC. On the opposite, developing a verification tool with ESys.NET is facilitated since introspection and hooks are directly available in the libraries.
1.4.4
Towards a Powerful Verification Flow
Timing analysis and verification are part of the system design. There is no methodology for timing specification in TLM till now. So, in Chapter 8, a model of timing specification is presented. Our verification tool could be enriched by some other assertions techniques. To do so, we discuss in Chapter 9 the extension of ESys.NET by System Verilog Assertions (SVA).
1.5
Conclusion and Book Organization
We have shown in this introduction that the .NET Framework and more specifically ESys.NET are able to support an entire system level design flow, from specification to software and hardware implementation, with some specific features such as observer based verification. As said in subsection 1.2.3, the hardware design flow was simple but at the same time, in our opinion, powerful since it is based on very advanced software techniques and environments. It has the potential to encompass all the aspects of EDA at different levels of abstractions. The rest of this book will detail different aspects of the system design flow and often will show what an environment such as .NET can bring to solve the problem. It is divided in three main parts: Modeling and Specification, Simulation and Validation and Practical Use of ESys.NET. Part I is composed of three chapters. Chapter 2, entitled High-Level Requirements Engineering for Electronic System-Level Design, considers what would be the first steps in designing a complex system. We present a methodology relying on the ap-
24
System Level Design with .NET Technology
plication of requirements engineering principles to the hardware/software electronic systems co-design cycle. This methodology provides a solution for the modeling and validation of hardware/software electronic system requirements. By mixing linguistic and formal techniques, it provides a language that is both accessible to designers and formal enough to permit automatic validation. Modeling consists in describing system’s functionality using formally structured natural language constructs. These are then automatically translated into a pure formal representation, thus permitting automatic analysis. Validation is achieved using consistency rules. Finally, an elicitation of missing functionality is performed using Boolean algebra. Chapter 3, entitled The Semantic Web Applied to IP-Based Design: A Discussion on IP-XACT, first assesses the current state of IP reuse in the EDA, by showing that the sharing of data in the EDA industry often relies on the XML technology stack for the management of metadata and its exchange between tools. It then presents the added advantages of Web Semantics for such a task. With the advent of the SemanticWeb technology stack, XML is no longer the best way to achieve metadata sharing. The Semantic Web stack is solely focused on the definition and the exchange of unambiguous semantics, comparatively to the XML stack which is focused on data serialization. The benefits of the Semantic Web stack over XML are illustrated on IP-XACT as a case study. A discussion on how the various SPIRIT standards could benefits from its use is given. Chapter 4, entitled Translating Design Pattern Concepts to Hardware Concepts, states that mixed hardware/software systems raise the level of generality of the hardware part and the level of abstraction of the software part of the systems. Thus, they suggest that mainstream software engineering techniques and good practices, such as design patterns, could be used by system designers to design and implement their mixed hardware/software systems. It then presents a proof of concept on translating the solutions of design patterns into hardware concepts to alleviate the system designers work and, thus, to accelerate the design of mixed hardware/software systems. In our opinion, this chapter opens the path towards a new kind of hardware synthesis. Part II of this book is related to simulation and validation. It contains 4 chapters. Chapter 5, entitled Using Transaction-based Models for System Design and Simulation, explores the concept of transactions as a concurrency control mechanism. This concept has been widely studied in the domain of distributed programming and database transaction management. It shows how this concept can be levered up to answer methodological needs of microsystems design. Inadequacies of SystemC for parallel simulation and for system level untimed modeling are illustrated. Ways to alleviate theses inadequacies by the use of the transaction-based model is presented. Finally, implementating a Software Transactional Memory (STM) using the .NET framework in order to simulate a Transaction-based model is given. Chapter 6, entitled Simulation at Cycle Accurate and Transaction Accurate Levels, focuses on the simulation of multiprocessors SoC, including hardware and software parts, and more precisely first on the Cycle Accurate Bit Accurate (CA) level, in which the exact behavior of the communication protocol in terms of signaling is detailed, and the Transaction Accurate (TA) level, in which the communication is
Introduction
25
based on transactions that hide the details of the protocol behind function calls. While no .NET implementation is given, this chapter presents important concepts to anyone interested in system level design methodologies. The targeted levels of abstraction will be also addressed in the following chapters. Chapter 7, entitled An Introduction to Cosimulation and Compilation Methods, first explores different mechanisms of cosimulation mainly between SystemC and ESys.NET; then moves to ways of accelerating this simulation at Register Transfer and Transaction levels of modeling. An approach to simulate models by using an internal representation and templates is proposed. Templates are created to respect the semantics of simulation at a given level of abstraction and with a desired accuracy (cycle accurate, transactions. . . ). Chapter 8, entitled Timing Specification in Transaction Level Models, explores ways of expressing timing, and annotating models with timing information. The focus is put on algorithms for verification of timing, their complexity and the way they allow to avoid inconsistent integration of components due to incompatible timings in communication. Finally, Part III contains one chapter (Chapter 9) about the practical use of ESys. NET. Examples are given to illustrate how a system designer would use ESys. NET for system-level design and verification. In addition, the design and architecture of the modeling, simulation and verification implementations are presented.
Part I
Modeling and Specification
2 High-Level Requirements Engineering for Electronic System-Level Design Nicolas Gorse Universit´e de Montr´eal - Canada
2.1 2.2 2.3 2.4 2.5 2.6
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linking to a UML-Based Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29 31 35 43 49 52
This work would not have been done without the great help of Pascale B´elanger regarding her competence in Linguistic [97, 98] and Alexandre Chureau for his work on the UML-Based Methodology [98]. It has been supervised by Yvon Savaria and El Mostapha Aboulhamid.
2.1
Introduction
Hardware/software electronic systems co-design relies on a well-established cycle, through which designers employ mature development frameworks [188][36] [161][207] that have been deeply supported and improved over the years by industry. Despite such frameworks, hardware/software co-design is currently crossing the border of a major crisis: the size and complexity of current electronic systems that keeps growing deeply affects validation phases of the development process; formal techniques hit irreducible combinatorial explosions, while simulation-based procedures require painfully large execution times [132]. The community generally agrees that the design process must deal with descriptions expressed at higher levels of abstraction [83]. We especially believe that a key solution resides in the ralidation of requirements, i.e., ensuring that they respect certain quality standards such as consistency and completeness. Although multiple validation methods are available along the design cycle, no methodology is provided for the validation of requirements. This relates notably to
29
30
System Level Design with .NET Technology
the lack of formalism in their expression. Indeed, the informal nature of requirements formulation typically prevents using systematic validation techniques, thus leaving them at the mercy of sporadic manually adhoc validation, or even no validation at all. Consistency errors can thus propagate through the design cycle, hence being detected only much later, with significant additional costs. Moreover, the gap between methods used to capture requirements and those used in the modeling phase is very large; designers usually jump directly from unstructured textual documentation (when they exist) to executable specifications written in languages such as Verilog [161]. Errors can thus easily be introduced during the production of the first executable model of some system. Validating requirements may enhance their quality and, by transitivity, the quality of the modeling, that could result in a remarkable reduction of error detection costs. In addition, despite multiple validations occurring through the design cycle, some consistency errors may not be easily detected by traditional techniques, while an appropriate method could avoid them at the requirements stage. The development of adequate validation techniques for hardware/software requirements first necessitates a suitable formalism. This formalism must be accessible to designers. It must be light enough and avoid complex mathematical notations. At the same time, it must be formal enough to reduce the gap between requirements and executable models, as well as to permit automating the validation. In this chapter, we present a methodology relying on the application of requirements engineering principles to the hardware/software electronic systems co-design cycle. This methodology provides a solution for the modeling and validation of hardware/software electronic system requirements. By mixing linguistic and formal techniques, it provides a language that is both accessible to designers and formal enough to permit automatic validation. Modeling consists in describing system’s functionality using formally structured natural language constructs. These are then automatically translated into a pure formal representation, thus permitting automatic analysis. Validation is achieved using consistency rules. Finally, an elicitation of missing functionality is performed using Boolean algebra. Our solution is meant to be grafted to the high-level and system-level electronic systems development cycle as shown in Section 2.5. It targets embedded systems that are composed of both software and hardware, such as systems on a chip (SoCs). Even if designated by the term “electronic systems,” the designing level of abstraction of such systems is far above gate-level electronic specifications. Our team is, to our knowledge, the first to propose and develop such concepts to formalize and validate requirements for the hardware/software co-design field. The approach we propose allows designers to represent requirements using a common formalism. Moreover, it permits fast prototyping and early detection of consistency errors. We also target automatic derivation of test scenario skeletons in order to bridge the gap between requirements and modeling levels. Our main contribution with this chapter is that it presents a requirements engineering methodology that can be easily grafted on top of current existing methodologies such as ESys.NET [139]. This chapter is organized as follows. Section 2.2 introduces the framework on which this research relies, including a review of existing works,
High-Level Requirements
31
in comparison to our approach. Section 2.3 presents our methodology, followed by Section 2.4 that presents relevant experimental results. Last, but not least, Section 2.5 presents on-going research that links our work with an existing UML-based methodology. Section 2.6 concludes this chapter.
2.2
Background
We start by introducing the research framework: the hardware/software co-design cycle, with a foreword on the problem of requirements validation. We then present a review of software engineering methods used to address the problem of requirements validation in the software field. Last, we introduce the proposed solution that will be detailed in the remainder this chapter.
2.2.1
Framework
Hardware/software co-design targets the production of software applications and systems that are mapped on precise hardware platforms depending on specific needs. The development cycle is a top-down approach. As illustrated by Figure 2.1, it consists of an amalgam of both software and hardware development cycles. Starting from preliminary ideas, documents providing a complete overview of the system components and functionalities are written. Such documentation, also called informal requirements, consists of natural language documents, possibly garnished with tables, algorithms, finite state machines, and even sometimes implementation considerations. This documentation is the input of functional and transactional modeling stages [70], where a first high-level model of the system is produced. The components of the models are then partitioned to be implemented in hardware or software depending on constraints such as timing, scalability and modularity. The remainder of the cycle consists of two parallel branches, through which both hardware and software parts are refined, and verified via co-simulation and other well established techniques [192]. The target product can either consist of hardware components or platforms on which the software parts run. These can be intellectual property blocks, device drivers, systems on chips, processors, application specific integrated circuits, systems on hardware programmable boards such as FPGAs, or any other suitable combination [192]. Various development frameworks and tools are accessible to hardware and software engineers along the cycle. Design and refinement processes are partly, if not completely, automated, thus permitting a dramatic gain of productivity. A wide range of methods address the validation of hardware and software models. These methods are also backed by reliable tools. They range from simulation-based to pure formal approaches [32][81][132] and are accessible at the different levels of the cycle, thus permitting elimination of as many errors as possible along the development process.
the problem of requirements validation in the software field. Last, we introduce the proposed solution that will be detailed along the paper.
32
components or platforms on which the so These can be intellectual property block systems on chips, processors, appl integrated circuits, systems on hardwar 2.1. Framework boards such as field programmable gat other suitable combination [7]. Hardware/software co-design targets the production of Various development frameworks software applications and systems that are mapped on accessible to hardware and software eng precise hardware platforms depending on specific needs. cycle. Design and refinement processes The development cycle is a top-down approach. As completely, automated, thus permitting a illustrated by Figure 1, it consists of an amalgam of both productivity. A wide range of method System Level Design with .NET Technology software and hardware development cycles. validation of hardware and software methods are also backed by reliable to Preliminary ideas from simulation-based to pure form [8][9][10] and are accessible at the diffe cycle, thus permitting to eliminate as Requirements / system’s textual specifications possible along the development proce considerations enumerated, the hardw design cycle however presents sho Functional and transactional level modeling limitations in three respects: 1) The development and validation o Partitioning are degraded by their complexity Hardware (HW) Software (SW) formal verification and techniques need too much time HW design Co-simulation SW design complete their respective tasks. 2) The gap between requirements an Detailed Synthesis Compilation is very large. Hence, not only thei co-simulation an executable model is a tedious can easily be introduced during thi Placement 3) Validation of requirements is ma and routing Production test Execution are performed. Thus, consistency Timing validation and debug of undefined and conflicting fu Validation Fabrication lead to inaccurate models and b much later in the cycle, sometime a complete re-consideration of req Final product (IP, Driver, SoC, Processor, ASIC, FPGA, These issues are already very challen …) of complexity supported by today’s techn level of complexity is not static, as Moor Figure 1. The hardware/software co-design cycle FIGURE 2.1: The hardware/software co-design cycle
Despite these considerations, the hardware/software co-design cycle presents shortcomings and limitations in three respects: 1) The development and validation of current systems are degraded by their complexity and size. Both simulation-based techniques need too much time and memory to complete their respective tasks. 2) The gap between requirements and the first model with respect to the level of detail and applied rigor is very large. Hence, not only requirements refinement into an executable model is a tedious task, but errors can easily be introduced during this transition. 3) Validation of requirements, if even performed, is manual. Hence, consistency errors are often detected only much later in the cycle, when the executable specification is being validated. This slows the development; in worst cases engineers may have to regress to the requirements stage when consistency issues are detected. These issues are already very challenging at the level of complexity supported by today’s technologies, but that level of complexity is not static; as Moore’s law promises, technologies should keep growing in capacity at an exponential pace, doubling in transistor count every two years over the next decade [83] and possibly even further when nanoscale technologies will emerge. There is definitely a need for new
High-Level Requirements
33
approaches acting at higher levels of abstraction, e.g., at the requirements level. We believe that elements of the solution rely on the formalization and consistency validation of requirements. Our first concern is that formal requirements are a step towards the functional and transactional level models, between requirements and modeling phases. Our second concern is that formal requirements are suitable for automatic analysis, hence permitting to detect at least part of the possible consistency errors, thus preventing their propagation along the development process. Various requirements for engineering approaches exist in the software community; some may be suitable to tackle the problem addressed here. The next section reviews these approaches and concludes with a summary that positions our approach with respect to each one.
2.2.2
Software Engineering Approaches
Methods addressing requirements can be partitioned in distinct classes: 1) Scenario-based approaches represent requirements in terms of behaviors modeled in the form of scenarios. 2) UML-based techniques express structure and functionalities of a system using UML. 3) Linguistic-based approaches target the direct extraction and analysis of requirements from natural language documentation. 4) Formal methods consist of expressing requirements using process algebras or other mathematical formalisms. Scenario-based approaches use scenarios to express the functionalities of a system in the form of its possible behaviors. As a preliminary step for documentation and expression of requirements, scenarios can be used to derive an executable model as it is the case with Message Sequence Charts (MSC) [121] for SDL [80] models. Although some very high level notations such as Use Case Maps [21] have been proposed, most scenario-based approaches found in the literature are based on the principles of MSC. Validation of scenarios is an important concern. Several contributions such as [211] and [113] propose valuable tools and techniques where requirements are formalized as scenarios on which tools perform automated consistency validation. Such approaches directly address the problem of requirements validation. We however believe that using scenarios involves an early good knowledge of the modules and functionalities of the system under development. Indeed, even if they are expressed at a high level, scenarios detail components’ behaviors which may not even be suspected at the early stages of development, which reflect the large gap between systematic and textual specifications. A more abstract representation of functionalities, later refined into scenarios, is mandatory for hardware/software electronic systems development. UML-based approaches use the UML [12] language to express and document designs. UML provides use-cases, class, sequence, state-charts, activity, component, and deployment diagrams as well as a standard for technical exchange. UML is starting to be adopted by the hardware community and has recently been successfully applied to system-on-chip design [55]. On the software engineering side, new
34
System Level Design with .NET Technology
validation capabilities are added to existing UML frameworks. An example is VIATRA [63], a tool targeting formal verification, by using visual automated transformations. It is based on graphs’ manipulations and permits validation of consistency, completeness and dependability requirements in a UML model. Another example is the approach presented in [182], where validation is based on abstract states machines. However, although work on requirements validation is performed [12], we believe that, as for scenarios, UML is not abstract enough to address the kind of requirements we target, namely, almost pure textual specifications. Linguistic-based approaches seem to be more suitable to tackle the problem of textual requirements. Although somewhat informal, natural language based documents still follow grammatical and syntactical rules. Several linguistic techniques are available to perform relevant validations on documents. Therefore, using them to verify requirements seems perfectly natural. Indeed, recent papers present linguistic applications for the analysis of use-cases. Papers [82] and [177] both propose to capture requirements in the form of natural language use-cases and respectively offer semantic analysis and future development of validation methods. Toolkits are also under development. GOOAL [169] provides automatic transformation of natural language descriptions of a model into a static view. More validation oriented, PROSPER [68] reads formulas written in a restricted subset of English, translates them into CTL [81] and later exercises them against a CTL specification using the SMV model checker [81]. Even though the approaches enumerated above are promising, application of such pure linguistic principles is limited due to the imprecision of the natural language. The authors of PROSPER, who are using a restricted set of English, report ambiguity problems. We believe that linguistic methods must be used within a strict structural formalism and restricted to very simple sentences. Indeed, it is possible to apply linguistics methods to pure natural language documents. However, this leads to more ambiguities, thus involving human intervention during the process. We thus believe that linguistic methods must be restricted when applied to anything that is related to computer science / electronics. The PROSPER project [68] is heading in this direction and their research clearly shows that linguistic corpus must be restricted in some way. The class of formal approaches gathers various mathematics-based formalisms such as process algebras [132], CTL [81], and theorem proving [132]. Formal techniques have been experimented with, over more than two decades, for various validation purposes, and requirements validation is yet another application of these techniques. Light formal models for interpreting informal specifications exist [84]. For instance, formal frameworks ARCADE [28] and TROPOS [85] are dedicated to the capture and analysis of requirements. The strength of these approaches is undeniable; indeed, the formalisms on which the models rely permit their effective analysis. However, such methods require designers to acquire deep mathematical knowledge often leading to great adoption resistance from users. Despite the value of the contributions summarized, we believe that major gaps still persist. First, existing approaches are gathered into three orthogonal views; namely UML and scenarios, linguistic, and formal. This forces designers to adopt a particular view amongst the others while combining them might be a better solution. Second,
High-Level Requirements
35
approaches based on scenario and UML are too detailed and are thus separated from the typical level of abstraction at which requirements are expressed by a large gap. A third point is that linguistic-based methods are appropriate to bridge the gap, but their informal nature prevents them from being extensively used in validation in general and validation of textual requirements in particular. Fourth, although formal methods provide strong analysis possibilities, their formal nature provokes a major resistance. Last, but not least, there is, to our knowledge, no methodology providing the kind of missing functionalities elicitation we propose for hardware/software systems. The proposed solution, introduced in the next sub-section, mixes both linguistic and formal techniques to tackle the problems enumerated.
2.3
Proposed Solution
We propose to intersect the three orthogonal views enumerated above by defining a unified methodology, thus taking advantage of their respective strengths and capabilities. We consider the behavior of a model as a composition of actions, denoting functionalities provided by the system. Description of actions is totally abstracted from the way they are carried out. It only focuses on the state in which the system is before and after the action is performed. This is formally expressed by guarding an action with pre- and post-conditions, respectively denoting the state of the system before and after it is carried out. Such formal requirements are situated between informal requirements and specification. The formalization and validation processes are gathered in a top-down iterative sequence, illustrated by Figure 2.2. Informal requirements are prearranged using a formally structured natural language representation. The system is formalized with its structure being broken down into modules. Its behavior is represented by actions that can be performed by modules. States of the system are represented by sets of properties. The representation of the whole system is thus a collection of properties, actions, possible contradictions between properties, and constraints over composition of actions. All these constructs are formally structured while expressed with natural language sentences. Requirements are then fully formalized using a linguistic-based pre-processing that transforms natural language sentences into predicates, thus generating a set of fully formal requirements. Consistency validation and elicitation of missing functionalities can then be performed. Depending on the feedback, designers may decide to iterate again or go to the next phase of the development cycle.
2.3.1
Formalism
The proposed formalism relies on properties, actions, contradictions, and constraints. A property is bounded to a module and characterizes a state in which the latter can be. An action is also bounded to a module and represents some functional-
36
System Level Design with .NET Technology Requirements / system’s textual specifications
Formally structured natural language functional requirements
Linguistic pre-processing
Fully formal functional requirements
Feedback
Consistency validation Requirements engineering Missing requirements elicitation
Fully formal and verified functional requirements
Functional and transactional level modeling
FIGURE 2.2: Proposed requirements engineering methodology ity of the latter. A contradiction is a 2-tuple of properties considered as being contradictory with respect to each other. Finally, constraints are restrictions over properties and actions’ composition. To ease the capture of requirements, the formalism is dedicated to be seen as a simple structure for describing functionalities. This is enabled by the fact that names of properties, actions, and modules are represented using a restricted set of English. Thus, designers do not have to elaborate complex names, acronyms, or logic predicates to name the entities, states in which they can be, and actions they can perform. Automatic analysis is enabled by translating these sentences to a predicative form using the linguistic pre-processing presented in Section 2.3.3. 2.3.1.1
Properties
The set of all exploitable properties is denoted Π. A property is denoted π ∈ Π. It can be used to symbolize some state and can also be negated. The negation of a property thus belongs to the set of exploitable properties; ∀π ∈ Π, ∃¬π ∈ Π. A property is a composition of elements formally defined as π = [ν· µ· λπ · θ ], with N and M respectively denoting the sets of names and modules and where: ν ∈ N is the name of the property, µ ∈ M is the module to which the property is bounded, λπ ⊆ Π is a set of sub-properties, and θ is textual information. Depending whether π is atomic or not, the set λπ is defined as λπ = 0/ or λπ = {πm , . . . , πn }, where ∀i|m ≤ i ≤ n : πi ∈ Π. Sub-properties permit encapsulation, thus
High-Level Requirements
37
allowing different levels of granularity in requirements. Although it may limit analysis performance, the depth of recursivity that can be used to define sub-properties is unrestricted. The example below formalizes the idle states of a fictitious producerconsumer system. π0 = [idle · sys · {π1 , π2 } ·“system is idle”]. π1 = [idle · prod · 0/ ·“prod is idle”]. π2 = [idle · cons · 0/ ·“cons is idle”]. 2.3.1.2
Actions
The set of all actions is denoted A. An action is denoted α ∈ A and is a composition of elements formally defined as α = [ν· µ· ϕ· ψ· λα · θ ] where: ν ∈ N is the name of the action, µ ∈ M is the module that can perform the action, ϕ ⊆ Π and ψ ⊆ Π denote the sets of pre- and post-conditions, λα ⊆ A is a set of sub-actions, and finally, θ is additional textual information. Pre- and post-conditions respectively refer to conjunctions of properties that must hold before and after the execution of the action α. They are respectively defined as ϕ = {πk , . . . , πl } and ψ = {πm0 , . . . , πn0 }, where ∀i, j|k ≤ i ≤ l, m ≤ j ≤ n : πi , π 0j ∈ Π. λα is an ordered set of sub-actions that must be carried out within the execution of α. This set can be empty, thus defined as λα = 0, / or it can contain sequential or parallel sub-actions, thus either defined as λα = {αm , . . . , αn }seq , or λα = {αm , . . . , αn } par , where ∀i|m ≤ i ≤ n : αi ∈ A. As for properties, sub-actions provide encapsulation mechanisms and their depth of definitions is unrestricted. The example below formalizes a message exchange in a fictitious producer-consumer system. α0 = [snd-rcv · system · {π0 } · {π0 } · {α1 , α2 }seq ·””]. α1 = [snd · prod · {π0 } · {¬π1 } · 0/ ·“sending”]. α2 = [rcv · cons · {¬π1 , π2 } · {π0 } · 0/ ·“reception”]. 2.3.1.3
Contradictions
Contradictions between properties cannot be inferred automatically and must thus be enumerated by designers. The fact that two properties π1 , π2 ∈ Π are contradictory is expressed by 2-tuple denoted χ(π1 , π2 ) ∈ X where X denotes the set of all contradictions. As an example, let us consider two properties π1 , π2 ∈ Π defined as π1 = [idle · prod · 0/ ·“producer is idle”] and π2 = [sending · prod · 0/ ·“sending data”]. χ(π1 , π2 ) is thus equal to: χ ( [idle · prod · 0/ ·“prod is idle”], [sending · prod · 0/ ·“sending data”] ). Use variables permits stating that the same module cannot be idle and sending at the same time with a single expression: χ([idle · µ · 0/ · θ ], [sending · µ · 0/ · θ ’]) where µ ∈ M and θ and θ ’ are variables. The ability to automatically detect such contradictions can thus result in an important improvement of productivity. 2.3.1.4
Constraints
Deontic constraints denote restrictions over actions’ compositions. The operators are obligation, permission, and interdiction of the deontic logic [22]. The set of all deontic constraints is denoted ∆. Obligations, permissions, and interdictions are re-
38
System Level Design with .NET Technology
spectively denoted δo (λα ), δ p (λα ), δi (λα ) ∈ ∆ where: o, p, and i refer to the type of restriction and λα ⊆ A refers to the set of actions to which the constraint applies. For instance, δi ({α0 , α1 }seq ) expresses that the sequence of actions α0 , α1 is forbidden. Similarly, temporal constraints denote restrictions over properties’ presence in post-conditions. The operators used are always and eventually of the Linear Temporal Logic [81]. The set of all temporal constraints is denoted T. Temporal constraints are respectively denoted τa (π), τe (π) ∈ T where a and e refer to the type of restriction and π ∈ Π refers to the property to which the constraint applies. τa (π) expresses that π must always hold while τe (π) signifies that π must hold at some point (eventually).
2.3.2
Linguistic Pre-Processing
The linguistic pre-processing translates English sentences expressing names of properties, modules, and actions into predicative forms. The pre-processing of sentences is divided into three steps: Parsing using Definite Clause Grammars (DCG), Dependency representation as dependency relations, Translation in Predicate Calculus (PC). Properties are first extracted from the sentences by a simple linguistic processing involving a DCG, which is a built-in Prolog [57] mechanism based on the context free grammar model. A DCG allows the expression of a well-defined subpart of the English syntax and lexicon with a set of restricted grammar rules: DCG rules are used to translate controlled English sentences into a syntactic tree-representation [168]. A sentence is minimally defined as the conjunction of a noun phrase and a verb phrase. Each sentence component is then defined following the described sublanguage restrictions. For example, two grammar rules are required to express an intransitive verb phrase with or without a single circumstantial complementation. Each rule decomposes a syntactic sublevel of the sentence. It thus gets down into the sentence’s syntactic structure. From a complete sentence, the parsing rules extract components up to the lowest graphically independent linguistic unit: the word. Words are encoded as entries of a predefined English lexicon. In order to reduce the size of the lexicon, a simple morphological analyzer, inspired from [168], has been added to the parser. Such an analyzer allows the recognition of an inflected verbal variation using its stem and English expected verbal inflections. Nominal derivations are also detected using the analyzer. Such a mechanism permits a substantial reduction of the lexicon’s size. Once found, each significant end-unit allowing the satisfaction of a rule is propagated back into the preceding rule, up to the initial sentence rule. Once the whole sentence is successfully parsed, the sentence “A signal arrives on part a” is reduced to a simple syntactic function, such as: sentence( noun_phrase( a, noun(signal)), verb_phrase( verb(arrive), prep_phrase(on, noun(port a)) ) ).
High-Level Requirements
39
That representation is not usable as-is in order to obtain PC clauses, but the sentence’s syntactic structure gives us the basic clues required to infer dependency relations between the units. Dependency-based representation basically consists in the distinction between head units and dependent units. The relations between a head and its dependent units are described in terms of semantically motivated dependency relations. Eight universal syntactic relations were considered, according to three main types of accepted linguistic deep syntactic dependencies: complementation, modification, and coordinative relations [150]. The complementation relations correspond to the actants from which a predicate is characterized. For example, “to arrive” is a verb known as having two actants, (1) the thing/person that arrives, (2) the location: X arrives [at/on] Y. These two actants correspond to the verb’s arguments in PC. The modification relation is apposed to units that modify the meaning of their head, usually by restricting it. For example, adjectives modify nouns, adverbs modify verbs, nominal complementation modify their head noun, etc. Finally, the coordinative relation is apposed to coordinative constructions, such as “a signal arrives on port a AND port b.” The requirements are expected to express events or states (the predicates). Variables and other objects depend on these predicates, and finally modifiers refine these arguments. Only complementation and modification relations are used. A complete grammar would allow coordinative structure, but also may introduce ambiguity through the requirements. In order to avoid possible ambiguity, no complex syntactic structure is allowed. Once a dependency relation is induced from the parsing-tree, a Prolog assertion is made. That assertion, named arc, has three arguments: the head unit, the relation’s type, and the dependent. The head unit and its dependent are identified by reference numbers found in the controlled lexicon during parsing arc(HeadUnit#Ref1, Rel, DependentUnit#Ref2) Dependency relations are asserted: arc(arrive#1, I, signal#2). arc(arrive#1, II, port_a#3). Distinct analyses are possible for a single sentence. If the grammar is not ambiguous, only one parsing may succeed. The parsing rule above specifies that a sentence consists of a noun phrase followed by a verb phrase. The first parsed nominal unit will be presumed to be the first argument of the sentence’s head verb. sentence(sentence(NP, VP)) --> noun_phrase(NP, N, CarN), verb_phrase(VP, N, CarN, V#A, CarV), { ( arc(V#1, ’I’, N) ; asserta(arc(V#1, ’I’, N)) ) }. The following parsing rule expresses that if a complex verb phrase composed of a verb (arrives) and a prepositional phrase (on port a) are parsed, then the head noun of the prepositional phrase is expected to be the verb’s second argument. Thus, an arc is inserted.
40
System Level Design with .NET Technology
verb_phrase( verb(V#A, S), SubjectNP, V#A) --> verb(V#A), prep_phrase(NP, N), { ( arc(V#1, ’II’, N) ; assertz(arc(V#1, ’II’, N)) ) }. Using the reference number in the assertion, the following that arrives on port a can be printed arrive[class:verb, tense:present] I II
signal [class:common_noun, number:sg] port_a [class:common_noun, number:sg]
That dependency representation easily allows the translation of the original English sentence into a Predicate Calculus representation of its meaning, such as: arrive(signal, port_a). Sentences are then replaced by the properties extracted. The internal representation thus fulfills the formalism presented in Section 2.3, hence permitting automatic validation. Such pre-processing allows easy expression of requirements and permits a considerable gain of time in the formalization process.
2.3.3
Consistency Validation
The validation is based on a set of consistency rules. These rules characterize basic notions of consistence that must be respected by the model. They are gathered in different sets according to their concerns. The validation process consists of ensuring that each and every rule is satisfied. Finding a counterexample of a rules signifies that this latter is violated, thus denoting that some inconsistency is present in the model. The definition of rules conforms to the formalism presented in Section 2.3.1. The number of rules presented here is not fixed and some will be added for general or specific purpose, as the methodology will be improved. In addition, different severity levels should be considered depending on rules. 2.3.3.1
Definition Rules
These rules perform validation on the definitions of properties and actions. Issues addressed are: 1. Each property of sub-properties is defined; ∀π ∈ Π: ∀πx ∈ λπ : πx ∈ Π 2. Each property of pre/post-conditions is defined; ∀α ∈ A: ∀πx ∈ ϕ, πy ∈ ψ: πx , πy ∈ Π 3. Each action of sub-actions is defined; ∀α ∈ A: ∀αx ∈ λα : αx ∈ A 4. No action has empty pre/post-conditions; ∀α ∈ A: ϕ, ψ 6= 0/
High-Level Requirements
41
5. Each property has a unique name; ∀πx , πy ∈ Π: νx 6= νy 6. Each action has a unique name; ∀αx , αy ∈ A: νx 6= νy 2.3.3.2
Enumeration Rules
These rules apply to properties found in composed properties and in pre- and postconditions of actions. 1. No property has contradictory sub-properties; ∀π ∈ Π: ∀πx , πy ∈ λπ : ¬χ(πx , πy ) 2. Pre/post-conditions do not contain contradictions; ∀α ∈ A: ∀πx , πy ∈ ϕ: ¬χ(πx , πy ) ∀α ∈ A: ∀πx , πy ∈ ψ: ¬χ (πx , πy ) 2.3.3.3
Equivalence Rules
This set is dedicated to the detection of redundancy in definitions of properties and actions: 1. No identical actions with different names exist; ∀αx , αy ∈ A, νx 6= νy : ϕx 6= ϕy ∨ ψx 6= ψy ∨ λαx 6= λαy 2.3.3.4
Inter-Enumeration Rules
These rules address inconsistencies between actions and sub-actions of which they are composed. They also address inconsistencies between sub-actions themselves: 1. The pre- and post-conditions of an action are not contradicted by pre- and post-conditions of the sub-actions it involves; α ∈ A, αz ∈ λα : 6 ∃πx ∈ ϕ, πy ∈ ϕz |χ(πx , πy ) α ∈ A, αz ∈ λα : 6 ∃πx ∈ ψ, πy ∈ ψz |χ(πx , πy ) 2. In a set of sequential sub-actions, denoted λα seq, pre-conditions of an action do not contradict post-conditions of the precedent action; αx , αx+1 ∈ λαseq : 6 ∃πx ∈ ψx , πx+1 ∈ ϕx+1 |χ(πx , πx+1 ) 3. Actions of a set of parallel sub-actions, denoted λαpar , do not have contradictory pre/post-conditions; αx , αy ∈ λαpar : 6 ∃πx ∈ ϕx , πy ∈ ϕy |χ(πx , πy ) αx , αy ∈ λαpar : 6 ∃πx ∈ ψx , πy ∈ ψy |χ(πx , πy )
42
System Level Design with .NET Technology
2.3.3.5
Deontic Rules
This set of rules performs a validation of the satisfaction of deontic constraints: 1. Forbidden compositions of actions are respected; α ∈ A: 6 ∃λαx ⊆ λα |δ p (λαx ) 2. Similarly for obligatory compositions; δo (λαx ): ∃α ∈ A |λαx ⊆ λα 3. Absence of contradiction in deontic constraints; δo (λαx ): 6 ∃δi (λαx ) δi (λαx ): 6 ∃δo (λαx ), δ p (λαx ) 2.3.3.6
Temporal Rules
Similarly to the precedent set of rules, temporal rules verify the satisfaction of temporal constraints: 1. Properties that must eventually hold; ∀τe (π): ∃α ∈ A |π ∈ ψ 2. Properties that must always hold; ∀τa (π): ∀α ∈ A: π ∈ ψ 3. Absence of contradiction in temporal constraints; ∀τe (π): 6 ∃τa (π) ∀τa (π): 6 ∃τe (π)
2.3.4
Elicitation of Missing Functionalities
As we defined it, a set of requirements consists of a collection of actions, properties, contradictions, and constraints. Given such a set, it is important to be able to know whether it is complete or not. As a first step towards this validation, we propose a technique for the elicitation of missing actions, i.e., functionalities. A set of pre-conditions is a conjunction of properties denoting a state of the system where some action must be undertaken. The collection of actions can be transposed in a Boolean function F expressed as a sum of products, where the products are the actions and the variables are the properties. Let us consider a door controller system S0 modeled using three properties, namely π0 , π1 , π2 , and two actions, namely α0 and α1 : π0 = [someone · sys · 0/ ·“someone in front of door”]. π1 = [door opened · sys · 0/ ·“door is opened”]. π2 = [tmr expired · sys · 0/ ·“closing timer expired”]. α0 = [open · sys · {π0 , ¬π1 } · {π1 , ¬π2 } · 0/ ·“open”]. α1 = [close · sys · {¬π0 , π1 , π2 } · {¬π1 } · 0/ ·“ close”].
High-Level Requirements
43
As φα0 = {π0 , ¬π1 }, and φα1 = {¬π0 , π1 , π2 }, the corresponding Boolean function is: F = π0 π¯1 + π¯0 π1 π2 . The extraction of missing functionalities then consists in the computation of F’s ¯ which expresses all the cases for which no action have been decomplement, i.e., F, fined. The complementation of a Boolean function relies on its cubic representation. Such representation of a function is a table in which each line is called a cube and represents a product of the sum. The whole table denotes the sum of products. Variables are organized in columns, where cells are initialized with a bit representing that the property is either: present (1), present and negated (0), or absent (x) (don’t care) in the product denoted by the line. Section 2.4.1 presents the details of this process through a case study. Elicitation of missing functionalities is an important process. The solution we propose is valuable, especially in the sense that it is automated and thus permits to compute missing functionalities on very large sets of requirements. However, it is limited in the sense that the elicitation relies on the conjunction of variables; the set of missing functionalities is only the complement of the set of defined functionalities. The latter may be complete, while a whole different part of the system may not be defined. In other words, the elicitation cannot be performed from nothing. Despite this consideration, it is a useful first step towards elicitation of missing functionalities that enhances requirements completeness.
2.4
Experimental Results
Linguistic pre-processing, consistency validation, and elicitation of missing functionalities were implemented altogether in the form of an experimental prototype. Due to the logical nature of the linguistic pre-processing and the consistency validation, both were implemented under Prolog, thus permitting to take advantage of the inference engine of its interpreter without excessively investing time in programming considerations. For performance purpose, the complementation algorithm is implemented under C. Pre- and post-reductions are performed using Espresso [43] to lighten the computation and minimize results. In addition, our implementation performs on-the-fly reduction to prevent excessive memory usage. Three case studies have been performed. The first one concerns an automatic door controller. In the second, the method was applied to an industrial set of requirements. The last one regards the RapidIO interconnection protocol.
2.4.1
Automatic Door Controller
Our first case study targets the modeling of an automatic door controller. The behavior of the latter consists of opening a door automatically when someone arrives. Once opened, it must close automatically after a certain amount of time. We identi-
44
System Level Design with .NET Technology
fied two mutually exclusive actions: opening and closing. The first step requirements consisted in expressing the two actions in our structured natural language representation (Table 2.1):
TABLE 2.1: Automatic door controller actions
ν µ Φ
Ψ λα Θ
Opening open Door “someone is at door”, “door is not opened” “door is opened,” “timer is not expired” 0/ “opening of the door”
Closing Close Door “someone is not at door,” “door is opened,” “timer is expired” “door is not opened” 0/ “closing of the door”
This model was then automatically formalized by the linguistic translation process, which also took care of the extraction of properties. This led to a set of formal requirements composed of the two actions accompanied by the definition of the three properties used in their pre- and post-conditions; π0 = [ is(at(door, someone)) · door · 0/ ·“someone is at door” ]. π1 = [ is(opened, door) · door · 0/ ·“door is opened” ]. π2 = [ is(expired, timer) · door · 0/ ·“timer is expired” ]. α0 = [open · door · {π0 , ¬π1 } · {π1 , ¬π2 } · 0/ · “opening of the door”]. α1 = [close · sys · {¬π0 , π1 , π2 } · {¬π1 } · 0/ ·“ closing of the door”]. Temporal and deontic constraints were then added: 1. A temporal constraint stipulating that the door must eventually close: τe (π1 ). 2. A deontic constraint stipulating that it cannot open and close simultaneously: δi ({α0 , α1 } par ). The consistency validation process did not detect any inconsistency, which is what we could have expected from such a tiny set of requirements. The extraction of missing functionalities however allowed us to complete the set of requirements. The set of actions was then translated in a cubic representation where lines represent actions and columns represent the set of all properties. Cells are initialized with appropriate value, i.e., 1, 0, or x depending whether the property is present, present and negated, or absent in the pre-conditions of the corresponding action. Table 2.2 illustrates the cubic representation of the automatic door system (left) and its complement (right).
45
High-Level Requirements
TABLE 2.2: Cubic representation of automatic door system (left) and its complement (right)
α0 α1
π0 1 0
π1 π2 0 − 1 1
β0 β1 β2
π0 − 0 1
π1 π2 1 0 0 − 1 −
The computation of the complement leads to a table that is converted back into a set of pre-conditions that represent cases for which no action has been defined. The designers can thus be aware of the undefined actions, which may be on purpose or may denote missing functionalities for which some behavior must be defined. Given the results of the elicitation process, designers can then simply complete the functionalities by adding necessary post-conditions to the extracted actions. The complement of system S0 , illustrated on the right part of Table 2.2, is translated back into sets of pre-conditions that illustrate cases for which no functionality is defined: 1. β0 : The door is opened and the timer did not expire 2. β1 : The door is closed and nobody is in front 3. β2 : The door is opened and someone is in front Although the two first actions denote cases where nothing is to be done, the third case must be taken into account, since the action to perform is different whether someone is in front of the door or not. Automatically translated back into a human readable form by our tool, the set of extracted functionalities showed us that action definitions were missing for the following specific situations: 1. What action should be taken by the controller if the door is opened and the timer has not expired yet? β0 = “door is opened,” “timer is not expired.” In that case, we added an action which consists of waiting to close the door; α2 = [w close · door · {π1 , ¬π2 } · {π1 , ¬π2 } · 0/ · “wait to close door”]. 2. Similarly, what must be done if nobody is present in front of the door and this latter is closed? β1 = “someone is not at door,” “door is not opened.”
46
System Level Design with .NET Technology The system must simply wait, as stipulated by the following actions; α3 = [w open · door · {¬π0 , ¬π1 } · {¬π0 , ¬π1 } · 0/ · “wait to open door”]. 3. Last, but not least, what does the system has to do whether someone is at the door and the latter is opened, whatever the state of the timer is (expired or not)? β3 = “someone is at door,” “door is opened.” In that case, it is vital to specify that the controller must absolutely wait until it can close the door. α4 = [w close · door · {π0 , π1 } · {π0 , π1 } · 0/ · “wait to close door”].
An additional iteration of the validation and extraction of functionalities was performed after those actions were added and did not detect any inconsistency or missing functionality. The application of our methodology allowed us to point potential problems very early in the development cycle.
2.4.2
Industrial Router
Our second case study is based on a set of requirements given courtesy of an industrial partner. This set is a sub-part of the complete requirements of a cross-connect switch. Such switch process packets route data received from various sources to a specific destination. Our work targets the routing functionalities of the switch. The latter are gathered in textual documents and are presented in the form of lists where each item, called requirement, is described in plain English, as shown in Table 2.3. For confidentiality reasons, names and descriptions are fictitious.
TABLE 2.3: Industrial partner’s requirements Req # Description R3.22 A synchronization request with no synchronization timer running shall force the start of the synchronization timer and immediate sending of an acknowledgement. R3.23 When the TGV timer expires and no input is configured, the system should switch to the last known source. ... ...
A total of 27 individual requirements constitute this benchmark that specify the functionality of a small module embedded in a complex industrial ASIC. In addition, a recapitulative table is provided. Its columns represent triggering conditions for a particular functionality (pre-conditions) while its lines represent the functionality and the results of this triggering (post-conditions). Those documents were used by
47
High-Level Requirements
the engineers of our industrial partner to directly derive executable specifications written in SDL [80]. We were able to structure the whole set of requirements in the form of pre- and post-conditions expressed in restricted English and then to automatically formalize them using our linguistic pre-processing layer. From this, we performed consistency validation, which did not reveal errors or inconsistencies. This is due to the fact that the list of given functionalities had already been refined and thoroughly verified by our industrial partner.
TABLE 2.4: Full coverage by complement
α28 β3
π1 x x
π2 x x
π3 x x
π4 x x
π5 0 0
π6 x x
π7 0 0
π8 0 0
π9 1 1
Extraction of missing functionalities however revealed several functionalities that were either missing or to be refined. Bringing back those results to our industrial partner, we received confirmation that those results were relevant, especially since the set of functionalities that was given to us was incomplete. The set of missing functionalities extracted by our tool corresponded to those the industrial contact had not given to us. As an example, Table 2.4 shows that the line β3 of the complement fully covers the missing functionality α28 . Example of Table 2.5 shows that β4 covers α31 with a generalization on property π5 . This signifies that the complement represents both values that π5 can take. Our industrial partner had only considered the case where π5 is negated, since the other case was not pertinent to them.
TABLE 2.5: Example of generalization
α28 β3
π1 x x
π2 x x
π3 1 1
π4 x x
π5 0 x
π6 1 1
π7 0 0
π8 x x
π9 0 0
Each extracted functionality covered one or two of the effectively missing functionalities. Some of them presented specialization on one or two pre-conditions, while others presented generalizations. Being already present in the set of given requirements, a single missing functionality was not detected due to it redundancy.
48
System Level Design with .NET Technology
As Table 2.6 recapitulates, 73% of missing functionalities were fully covered with generalization and/or specialization. The generalization ratio is only 8%, which means that the average coverage for missing functionalities was of 92% by functionality. 20% of them were partially covered thus meaning that engineers had to refine what was extracted by our tool.
TABLE 2.6: Summary β extracted Fully covered 11 Partially covered 3 Redundant 1
Ratio 73% 20% 7%
Generalization 8% 0% N/A
Specialization 0% 1% N/A
The generalization ratio oscillates between 0% and 12.5%, while the maximum specialization rate is only of 1%, thus meaning that even in cases of partial coverage, 99% of the functionality is covered, thus confirming the relevance of this process.
2.4.3
RapidIO
The thrid case study is based on the collision rules of RapidIO [6] cache coherence specification. RapidIO aims at providing a standard protocol for very high rate input/output data exchange at the physical level, i.e., in hardware devices. The complete specification consists of English documents augmented with tables, figures, and FSM. The cache coherence part elaborates on the protocol for high speed shared memory data exchange. We focused on the coherence validation of the collision rules’ set. The latter does not contain any constraint but specifies 266 rules. Each rule addresses a specific collision case as well as the way it must be handled in the protocol. Obviously, this set must be complete and free of errors. The consistency validation did not detect problems. The elicitation of missing functionalities identified 3 cases where no action is specified. These specific cases refer to states of the system where no collision occurs. Execution times collected along the second case study confirm that consistency validation takes polynomial time to complete. The Prolog implementation of the consistency validation relies on pair-wise combination of actions, thus leading to a square execution time. Despite this polynomial complexity, the execution times remain fairly reasonable, especially due to the abstraction level, permitting work on reduced sets of requirements. This limitation might be reduced using appropriate mathematical computation techniques such as theorem proving [132] instead of Prolog implementation. Figure 2.3 illustrates the increase of execution time along with the number of requirements considered.
handled in the protocol. Obviously, this set must be complete and free of errors. The consistency validation did not detect problems. The elicitation of missing functionalities identified 3 cases where no action is specified. These cases refer to states of the system where no collision is made. Execution times collected along the second case study confirm that consistency validation takes polynomial time to complete. The Prolog implementation of the consistency validation relies on pair-wise combination of High-Level Requirements actions, thus leading to a square execution time.
Execution time
This paper proposes to graft a ne engineering methodology between re modeling phases of the hardware/sof cycle. This methodology targets the requirements using a formally structured representation, later translated into representation by means of linguistic t this formalization, the methodology pro consistency validation and elicitatio functionalities. Experimental results show 49 complexity of the problem, the add abstraction permits to keep reasonable ex 02:52.8 We believe that this paper presents 02:35.5 02:18.2 contributions. First, it proposes an enh 02:01.0 hardware/software co-design cycle with 01:43.7 for the formalization and validation o 01:26.4 Second it illustrates an applicatio 01:09.1 requirements engineering concepts to a 00:51.8 it demonstrates that formal requirements 00:34.6 higher formal abstract levels, thus filling 00:17.3 00:00.0 textual and usual formal requirements. Fo 15 30 45 60 75 90 116 131 146 161 176 191 206 221 236 251 266 a technique for the elicitation of missing Functionalities limitation might be reduced by using appropriate study has been performed on a set of 36 requirements, which is, to our knowled mathematical computation suchanalysis as theorem pplied by courtesy of PMC-Sierra. The Figure 5. Execution timetechniques for consistency Last, but not least, it presents a compl FIGURE Execution time for consistency analysis proving [10]2.3: instead of a Prolog implementation. Figure 5 gathered in a textual document where and clearly establishes its strength and Despitethe thisincrease limitation, the execution timeswith remain illustrates of execution time along the d one by one with English sentences. In believe that such methodology can be us fairly reasonable, especially due to the abstraction level, number of requirements considered. cument includes a table summarizing the designers, thus improving the quality of t permitting to workofon reduced sets of requirements. The elicitation missing functionalities, illustratedThis by able is organized as shown in Figure 3. The elicitation of missing functionalities, for which results are reported in FigFigure 6 presents experimental linear execution times. ts have been formalized and passed ure 2.4, presents experimental linear execution times. Such results depend on the Such results depend of the nature of the considered guistic pre-processing, after which a nature of the considered requirements. First, only 17 properties are used to characrequirements. First, only 17 properties are used to onsistency and elicitation of missing terize the states of the system. Second, pre-conditions are composed characterize the states ofallthe system. Second, all pre- of at most 3 have been performed. Although the properties, thus leading to at most 3 cells initialized for each line. conditions are composed of at most 3 properties, thus dation did not detect problems, the
Execution time
leading to at most 3 cells initialized for each line. ssing functionalities found a few cases ion was specified. Getting back to these 00:06.0 e were able to state that identified 00:05.2 alities were real missing ones that were versions of the requirements. 00:04.3 ase study is based on the collision rules 00:03.5 cache coherence specification. RapidIO 00:02.6 g a standard protocol for very high rate 00:01.7 a exchange at the physical level, i.e. in s. The complete specification consists of 00:00.9 nts augmented with tables, figures and 00:00.0 coherence part elaborate on the protocol 15 30 45 60 75 90 116 131 146 161 176 191 206 221 236 251 266 Functionalities ared memory data exchange. on the coherence validation of the Figure 6. Execution time for complementation tables FIGURE 2.4: Execution time for complementation set. This latter does not contain any ecifies 266 rules. Each rule addresses a 8. Conclusion and Future Work n case as well as the way it must be protocol. Obviously, this set must be This paper proposes to graft a new requirements ee of errors. The consistency validation engineering methodology between requirements and problems. The elicitation of missing modeling phases of the hardware/software co-design dentified 3 cases where no action is cycle. This methodology targets the formalization of cases refer to states of the system where requirements using a formally structured natural language 2.5 Linking to a UML-Based Methodology ade. representation, later translated into a full formal Our methodology can beby linked to UML-based following a topes collected along the second case study representation means of linguisticmethodologies techniques. From down process. As an example, we mention here preliminary work on the integraonsistency validation takes polynomial this formalization, the methodology provides automatic tionthe of our approach with a virtual prototyping methodology based on UML [55]. te. The Prolog implementation of consistency validation and elicitation of missing dation relies on pair-wise combination of functionalities. Experimental results show that despite the ding to a square execution time. complexity of the problem, the addressed level of abstraction permits to keep reasonable execution times. We believe that this paper presents several valuable contributions. First, it proposes an enhancement of the hardware/software co-design cycle with a methodology for the formalization and validation of requirements. Second it illustrates an application of software requirements engineering concepts to a new field. Third,
50
System Level Design with .NET Technology
Virtual prototyping proposes high-level modeling augmented with cross-abstraction interface mechanisms, hence permitting incremental development of system specifications. The integration of requirements engineering with UML-based; prototyping results in a robust methodology that provides early detection of errors, iterative refinement mechanisms fills the gap between those two levels, and strengthens both requirements and functional modeling.
2.5.1
Integrated Methodology
The unifying concept between the two approaches is the contract [55]. A contract, also called requirement, represents an action that must be performed by the system under specific circumstances. Contracts are guarded by pre- and post-conditions, thus specifying the state in which the system is supposed to be before and after the fulfillment of the contract. Our approach consists of the extraction of contracts from requirements without internal considerations. In other words, a contract C is represented as {pre} C {post}, where {pre} and {post} respectively represent the pre- and post-conditions of the contract and C represents the name of the contract. This notation completely abstracts the internal behavior of the contract. One step later in the development flow, virtual prototyping consists of modeling a system and its contracts, including the internal behavior of the latter, using encapsulation and interface-based design in a UML framework. As Figure 2.5 depicts, the first phase consists of formalizing and analyzing requirements using the methodology described above in this chapter. During the second phase, the set of formalized requirements is used as an input to virtual prototyping. Pre-verified actions and properties can then be embodied in detailed scenarios expressed in UML. State machines are best suited to this end, although message sequence charts and activity diagrams may be used to provide further flexibility, an aspect which is left for future research. The integration of the requirements validation and UML based refinement methodologies hence consists of connecting them such that virtual prototyping process uses the output of the first as the input to the second. Formalized contracts can be ordered depending on their pre- and post-conditions. The set of contracts can then be mapped to a UML state chart diagram, where states represent pre- and post-conditions, and transitions represent the contract itself. From this input, virtual prototyping can refine contracts by defining their internal behavior using fine-grained Message Sequence Charts and other structural and behavioral representation. The translation between a set of formalized requirements and a UML state machine is based on the concordance of pre- and post-conditions. Indeed, given two actions α and β where post-conditions of α correspond to pre-conditions of β , one can infer that β is triggered after α has been executed. Based on this, pre- and postconditions are first translated into states. Then, transitions between states are inferred from pre- and post-conditions of actions, i.e., an action is a transition between two states which represents its pre- and post-conditions. Pseudo-states represented by state-exit and state-entry actions enable finer pre- and post-condition modeling.
5.1. Integrated methodology The unifying concept between the two approaches is the contract [35]. A contract, also called requirement, represents an action that must be performed by the system under specific circumstances. Contracts are guarded by pre- and post-conditions, thus specifying the state in which the system is supposed to be before and after the fulfillment of the contract. Our approach consists of the extraction of contracts from requirements without internal considerations. In other words, a contract C is represented as {pre} C {post}, where {pre} and {post} respectively represent the pre- and post-conditions of the contract and C represents the name of the contract. This notation completely 51
High-Level Requirements
Requirements / system’s textual specifications
Requirements engineering
Refinement
Virtual prototype
Functional modeling
Transactors
Test infrastructure Executable model Hardware component model
Partitioning
Fig. 7. The integrated methodology.
FIGURE 2.5: The integrated methodology
Diagrams built from pre-verified requirements ensure an accurate capture of the application requirements since consistency errors and missing functionalities might have already been detected. UML then provides the means for implementation-independent modeling of both structure and behavior. A virtual prototype built with software-centric UML is a more neutral, yet executable, specification of the system. Prototype coherence is ensured by a transaction mechanism that extends UML models towards several levels of abstraction and other models of computation. This mechanism, called model level transactors, is used to couple asynchronous components, described as high-level UML state machines, with synchronous components, e.g., VHDL or Simulink models. By resolving interface heterogeneity, model level transactors enable executing the resulting prototype and iteratively refining it towards a particular platform. Proper requirement specification can not only help build more accurate UML models, but can also be used to validate execution scenarios gathered from prototype execution.
2.5.2
Case Study
The efficiency of the approach was tested for the development of the control module of an adaptive equalizer. A specification of this module was first directly developed with UML. The goal was thus to verify whether or not we could obtain similar, and even better, results with our integrated methodology, thus justifying its existence.
266
52
N. Gorse et al. / Computers and Electrical Engineering 33 (2007) 249–268
System Level Design with .NET Technology
init
Waiting For Filter Init mode hold mu ntapF ntapB
C
false
config_complete
true
Waiting For Filter Ready
_ FFF_ready true
_ FBF_ready
C
false
filters_ready first_update_done DFE Processing Waiting For Second Update Done
second_update_done
Fig. 8. UML state machine of the filter controller.
FIGURE 2.6: UML state machine of the filter controller
6. Conclusion and future work In normal the control module first performs the initialization of some This paper proposes to graftoperation, a new requirements engineering methodology between requirements and modequalizer components using parameters provided by an external source (the system eling phases of the hardware/software co-design cycle. This methodology targets the formalization of requireCPU for instance). During the operating phase, after initialization, the module monments using a formally structured natural language representation, later translated into a full formal itors the internal state of the equalizer and relays control and status signals to the representation by means of linguistic techniques. this have formalization, the methodology provides autooutside world. The functionalities of From this module been captured from a textual matic consistency validation and elicitation of missing functionalities. Experimental results show that despite documentation and formalized as a set of actions guarded by properties. The consisthe complexity of the problem, the addressed abstraction permits to keep of reasonable tency analysis of this set did notlevel revealofany error. However, the elicitation missing execution times. We believe that functionalities this paper presents valuable contributions. First, it the hardware-software allowedseveral us to detect some that were not represented in enhances the textual documentation. After appropriate improvements, the set of formalized requirements design cycle by adding a complete methodology for the formalization and validation of has requirements. Second been automatically refined requirements into a UML finite state machine, illustratedtobythe Figure 2.6. it illustrates an application of software engineering concepts hardware-software field. Such translation allowed us to derive a precise representation of the behavior of the Third, it demonstrates that formal requirements can be raised at higher formal abstract levels, thus filling initialization module,formal which was as detailed asFourth, the one previously developed directly for the elicitation the gap between textual and usual requirements. it proposes a technique with UML, with however more details and better knowledge of the documentation of missing functionalities in requirements, which is, to our knowledge, totally new. Last, but not least, it shows gaps, thus confirming the potential of such an integrated methodology. The model how the presented approach can be easily grafted on top of UML-based methodologies, as shown by on-going can then be refined and included in a complete UML prototype. The current UML research presented structure in the previous section. and state diagrams of the prototype under development are publicly availWe hence believeable that such methodology can be used as a guide for designers, thus improving the quality of [14]. the final product. As for future research, we first aim at extending the current on-going work, linking our methodology with the virtual prototyping UML-based approach presented in Section 5. As a second target, knowing the current limitations of the method, we aim at reducing its execution complexity, which may be achieved using appropriate programming languages.
References [1] [2] [3] [4] [5] [6]
2.6
Conclusion
This chapter proposes to graft a new requirements engineering methodology between requirements modeling phases of the hardware/software co-design cycle. Sutherland S, Davidmann D, Flake P.and SystemVerilog for design: a guide to using SystemVerilog for hardware design and modeling. Kluwer;This 2003.methodology targets the formalization of requirements using a formally strucBhasker J. A SystemC galaxy; representation, 2004. turedprimer, naturalstar language later translated into a full formal representaPalnitkar S. Verilog HDL. 2nd ed. of Prentice Hall;techniques. 2003. tion by means linguistic From this formalization, the methodology Yalamanchili S. Introductory VHDL, from simulation to synthesis. Prentice Hall; 2001. functionalities. provides automatic consistency validation and elicitation of missing International Technology Roadmap for Semiconductors Design; 2003. Experimental results show that despite the complexity of the problem, the addressed Donlin A. Transaction level modeling: flows and use models. In: IEEE international conference on hardware/software codesign and level of abstraction permits keeping reasonable execution times.
system synthesis, September; 2004. Vahid F, Givargis T. Embedded system design: a unified hardware/software introduction. Wiley; 2002. Bergeron J. Writing testbenches functional validation of HDL models. Kluwer; 2000. Be´rar B et al.. Systems and software validation, model-checking techniques and tools. Springer; 2001. Kropf T. Introduction to formal hardware validation. Springer Verlag; 1999. ITU, ‘‘recommendation Z.120: message sequence chart (MSC), ITU; 1996. Zhu H, Jin L. Scenario analysis in an automated tool for requirements engineering. Require Eng J 2000;5–1(July). Heymans P, Dubois E. Scenario-based techniques for supporting the elaboration and the validation of formal requirements. Require Eng J 1998;3–4(March). [14] Chureau A, Savaria Y, Aboulhamid EM. The role of model-level transactors and UML in virtual prototyping of systems-on-chip: a software radio application, conference for design automation and test in Europe, March; 2005. [7] [8] [9] [10] [11] [12] [13]
High-Level Requirements
53
We believe that this chapter presents several valuable contributions. First, it enhances the hardware-software design cycle by adding a complete methodology for the formalization and validation of requirements. Second, it illustrates an application of software requirements engineering concepts to the hardware-software field. Third, it demonstrates that formal requirements can be raised at higher formal abstract levels, thus filling the gap between textual and usual formal requirements. Fourth, it proposes a technique for the elicitation of missing functionalities in requirements, which is, to our knowledge, totally new. Last, but not least, it shows how the presented approach can be easily grafted on top of UML-based methodologies, as shown by on-going research presented in an earlier section. We hence believe that such methodology can be used as a guide for designers, thus improving the quality of the final product. As for future research, we first aim at extending the current on-going work presented in Section 2.5 by implementing automatic derivation of test scenarios and observer skeletons. This will fill the gap between requirements and specification levels and permit to fully link our approach to E-Sys.Net. As a second target, knowing the current limitations of the method, we aim at reducing its execution complexity, which may be achieved using appropriate programming languages.
3 The Semantic Web Applied to IP-Based Design: A Discussion on IP-XACT James Lapalme Universit´e de Montr´eal - Canada El Mostapha Aboulhamid DIRO Universit´e de Montr´eal - Canada Gabriela Nicolescu Ecole Polytechnique de Montr´eal - Canada 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Models of Architecture and XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SPIRIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XML and Its Shortcomings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advantages of the Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Case Study – SPIRIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cost of Adoption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55 57 60 63 72 76 79 86 88 88
Introduction
The Semantic Web [35] was first envisioned by Tim Berners-Lee in 1999 as the next evolution of the Web, a Web that was self-describing and easily consumable by machines, not just humans. The Semantic Web is all about knowledge sharing. The sharing of knowledge cannot be achieved solely with the sharing of data and encoding formats but through the sharing of unambiguous metadata and meaning. The Semantic Web vision has already helped the domains of life sciences and pharmaceutics [26, 156]. The Semantic Web has allowed researchers to infer new knowledge and understanding by creating a collective knowledge base. This sharing of knowledge has permitted the discovery of new genes and drug interactions. The Semantic Web has not only helped the sciences but also the field of resource
55
56
System Level Design with .Net Technology
management. NASA has implemented a successful project [101] to manage the expertise profiles of human resources. In this project the Semantic Web was a key enabler to integrate information across multiple systems. In the Electronic Design Automation (EDA) domain, the management of data and metadata plays an important role in the context of system-level design. Many design flow methodologies use a pattern which consists of assembling reusable sub-systems and/or Intellectual Proprietary blocks (IP) to construct complex system platforms. This design process relies heavily on metadata. The formalisms used to describe system architectures within this process are often referred to as models of architecture. Over the last decade, special architecture-oriented languages have been developed in order to support these formalisms; we called them architecture description languages (ADL) [149]. Many projects such as Colif [50] and MoML [143] have built the syntax of their ADLs on the XML [66] technology stack as a means for metadata management and its manipulation. From the perspective of syntax, XML based solutions are very effective. A vast number of free open source solutions are available which support the XML technology stack. However, XML was never intended for the management of metadata and its semantics. Hence solutions which rely on XML for the exchange of semantically unambiguous data must deal with many shortcomings. Since the Semantic Web is the next step in metadata representation technologies, it can be a valuable tool for the EDA industry. Metadata management based on Resource Description Framework (RDF) and Web Ontology Language (OWL) [20], both technologies of the Semantic Web, benefits from a very robust foundation focused on semantics. Work such as [190] demonstrates that the Semantic Web technologies can be effectively applied to the domain of embedded systems. The former presents how the Semantic Web technologies can be used in conjunction with service-oriented technologies in order to create a flexible integrated development environment for embedded system design. Another project focused on the definition of a development environment for embedded system design is SPIRIT [133]. The SPIRIT project has gained a lot of momentum over the last couple of years. The project’s principal objective is the definition of standards for the design of an open Integrated Development Environment (IDE) tool for IP-based system design. Currently, the SPIRIT project has taken a more traditional approach based on XML then [190] for the management of IP related metadata and data. This approach has overly complicated the standards of the projects, as well as the management of their versioning. This paper continues the discussion started in [190] by presenting the benefits of using the Semantic Web over XML for metadata and data management. It will also discuss how the SPIRIT project could benefit from using the Semantic Web. The objectives of this chapter are to introduce the Semantic Web to the EDA community, as well as discuss how initiatives such as the SPIRIT project can greatly benefit by adopting OWL instead of XML as a medium for metadata and data. The chapter is organized as follows: Section 3.2 presents background information on IPXACT and the Semantic Web technologies, Section 3.3 discusses the semantic shortcomings of XML, Section 3.4 discusses the advantages of the Semantic Web over
The Semantic Web Applied to IP-Based Design
57
XML and Section 3.5 presents how SPIRIT could benefit from using the Semantic Web technologies.
3.2
Models of Architecture and XML
Models of Architecture (MoA) [189, 128] are closely related to Models of Computations (MoC) [45]. In the same way that MoC define semantic frameworks which permit the definition of computational-oriented application, MoA define semantic frameworks which allow the definition of system platform architectures. However, the state-of-the art of MoC is far more mature. Many MoC have been formally defined and studied in order to understand their inherited characteristics as well as those of the application which are defined with them. A mature concept of MoA has only begun to emerge in the last couple of years. Although no generally accepted definition exists of the term, two major approaches to MoAs can be identified: 1. Describe platform components using existing MoCs; 2. Specify platforms using dedicated formalisms. Some examples of the first approach are SystemC [100], Metropolis [27] and the automated design flow based on Synchronous Dataflow in [185]. Each of these uses the same modeling constructs to define applications and platforms. MoA is a key concept of some modern system design flow paradigm such as the Y-Chart approach [128]. In the Y-Chart approach, application models are defined using MoCs and platform models are defined using MoAs. A mapping is then defined between both models in order to specify how the application components are executed on the resources offered by the platform. MoAs and ADLs are usually focused on describing the resources of architectures (platforms), their properties (area, energy, etc.) as well as their interconnection topology. There are many examples of ADL that exist which use XML as a meta-language for their definition. Through XML, these ADLs capture definition of various architectural resources such has computational, communication and storage. They also allow the capture of system designs which are defined as interconnections of these resources. Hence, these ADLs manage both the metadata about resources as well as data about the designs which use the resources. The reminder of this section will present three key examples of ADLs which use XML.
3.2.1
GSRC and MoML
MoML is an XML modeling markup language [143]. It is a concrete syntax for the Gigascale Silicon Research Center (GSRC) abstract syntax which was developed at UC Berkley in the context of the Ptolemy project. The GSRC abstract syntax can be perceived as a MoA. This abstract syntax, hence MoML, allows the specification
58
System Level Design with .Net Technology
of interconnections of parameterized, hierarchical components. MoML is extensible in that components and their interconnections can be decorated with data specified in some other language (such as another XML language). Figure 3.1(a) illustrated the key elements of the GSRC abstract syntax semantics, hence the elements which are encoded using MoML. The main concepts are entities, port and links.
Connection
Entity Port
Entity Link
Link
Port
A out
nn
ec tio
C in
Co
on ct i ne
Link
n Co
n
Relation
B Port
out
Entity
(a) MoML Concepts
(b) MoML Example
FIGURE 3.1: MoML Concepts and example
Figure 3.2 is the MoML representation of the example illustrated in Figure 3.1(b).
FIGURE 3.2: MOML XML example
59
The Semantic Web Applied to IP-Based Design
3.2.2
Colif and Middle-ML
In a similar fashion to the GSRC abstract syntax, Colif [50] defines a MoA with a focus on the description of application-specific multiprocessor SOC architectures. A key objective of Colif is to model on-chip communication at different abstraction levels while separating component behavior from the communication infrastructure. The concrete syntax for Colif is Middle-ML, an intermediate language defined with XML. Middle-ML offers simple constructs in order to layer another languages ontop; Colif is one such language. Both Colif and Middle-ML have been developed by a research group at TIMA.
module
task
blackbox
net
port
(a) Colif Concepts
... ...
(b) Middle-ML example
FIGURE 3.3: Colif and Middle-ML
Figure 3.3(a) illustrates the main semantic elements of Colif. The core concepts of Colif are: Modules, Ports and Nets. Figure 3.3(b) illustrates the use of Middle-ML to describe a Colif based architecture.
3.2.3
Premadona
Premadona [189] offers a tool for generating abstract performance models of Network-on-Chip based Multi-Processor System-on-Chips (MPSoCs) which are expressed with the Parallel Object-Oriented Specification Language (POOSL) [193]. POOSL is an object-oriented system specification language which is based on a formal mathematical model. Moreover, it is the specification language for the Software/Hardware Engineering (SHE) methodology developed at the University of Eindhoven [193]. The Premadona tool follows the Y-Chart paradigm. Figure 3.4 illustrates the design flow with Premadona. Application models are defined using an
60
System Level Design with .Net Technology Trace & Analysis Setting .xml
.xml
Applications
.xml
POOSL Model
premadona
.p4r
rotalumis
Mapping
.xml
.log
.trace
Platform
Analysis Results
Trace Results
FIGURE 3.4: Premadona tool flow
FIGURE 3.5: Platform model
XML language which uses the SDF3 specification language. SDF3 support the description of application models using one of three MoC: synchronous dataflow, cyclostatic dataflow and scenario-aware dataflow. The specification language for platforms is also an XML based language which support MoA constructs such as processors, storage, etc. The Premadona tool generates POOSL models which can be simulated in order to evaluate performance. Figure 3.5 illustrates a simple platform model composed of 2 MIPS processors, 1 ARM7 processor and 1 TriMedia processor.
61
The Semantic Web Applied to IP-Based Design IP-XACT Compliant Object Descriptions
IP-XACT Compliant Design Environment
IP-XACT Compliant Generators
Component Component Generator
IP-XACT IP Import-Export
Design Design
IP
Component Abstractor
Chain
uP
mem
IP-XACT TGI
Design Design
Generator Generator
Configuration Configuration
Abstraction Abstraction Definition Definition
Bus Bus
Definition Definition
IP-XACT Compliant Object Descriptions
FIGURE 3.6: SPIRIT Design Environment
3.3
SPIRIT
The SPIRIT Consortium [133] is a non-profit organization dedicated to the development of standards to empower the vision of IP-based development. At the heart of the SPIRIT vision is an open Design Environment (DE) which can support an IP-based design flow for the elaboration of embedded systems. The necessity of the SPIRIT vision has emerged because of the absence of standards for the packaging of IP descriptions and their related metadata. Currently, there is no design environment which can support IP descriptions across all vendors as well as integrate the necessary tools to support them. Figure 3.6 illustrates the architecture of the design environment which is part of the SPIRIT vision. In order to realize its vision, the consortium has defined a specification called IPXACT which defined 3 main sub-specifications: the IP-XACT metadata format, the Tight Generator Interface (TGI) and the Semantic Constraint Rules (SCRs). There are two obvious interfaces expressed in Figure 3.6: from the DE to the external object description libraries and from the DE to the generators. The IP-XACT metadata format is used for the interface between the DE and the object description libraries. The TGI is used between the DE and generators.
3.3.1
IP-XACT Metadata Format
IP-XACT metadata format specification is a metadata description for documenting IPs. The metadata format is an XML schema which creates a common and languageneutral way to describe IPs compatible with automated integration techniques and enabling integrators to use IPs from multiple sources with IP-XACT enabled tools.
62
System Level Design with .Net Technology
TABLE 3.1: IP-XACT Object description types Objects Bus Definition Description Abstraction Definition Description Component Description Design Description Abstractor Description Generator Chain Description Design Configuration Description
Meaning Defines the type attributes of a bus. Defines the representation attributes of a bus. Defines an IP or interconnect structure. Defines the configuration of an interconnection between components. Defines an adaptor between interfaces of two different abstractions. Defines the grouping and ordering of generators. Defines configuration information.
amba.com AMBA VLNV AHB v1.0 false true VLNV 16 16 ahb_clk ahb_reset
FIGURE 3.7: AHB bus definition
IP-XACT enabled tools are able to interpret, configure, integrate and manipulate IP blocks that comply with the proposed IP metadata description. The current version is 1.4. The XML schema which defines the metadata format is composed of seven top-level schema definitions. Each schema definition can be used to create object descriptions of the corresponding type. Table 3.1 gives an overview of the main concepts defined with the IP-XACT metadata format. Figure 3.7 is an example of the AHB portion of the AMBA specification [10] described using the IP-XACT metadata format.
3.3.2
Tight Generator Interface (TGI)
The second interface of the SPIRIT Design Environment Architecture is one which defines the interaction API between the DE and Generators. This interface is defined by the Tight Generator Interface (TGI) portion of the IP-XACT specification.
63
The Semantic Web Applied to IP-Based Design
Generators are an important part of the design environment architecture. They are executable objects (e.g., scripts or binary programs) which may be integrated within a design environment (referred to as internal) or provided separately as an executable (referred to as external). Generators may be provided as part of an IP package (e.g., for configurable IP, such as a bus-matrix generator) or as a way of wrapping point tools for interaction with a design environment (e.g., an external design netlist, external design checker, etc.). An internal generator may perform a wide variety of tasks and may access IP-XACT compliant metadata by any method a DE supports. IP-XACT does not describe these protocols. The DE and the generator communicate with each other by sending messages utilizing the Simple Object Access Protocol (SOAP) standard specified in the Web Services Description Language (WSDL). SOAP provides a simple means for sending XML format messages using the Hyper Text Transfer Protocol (HTTP) or other transport protocols.
3.3.3
Semantic Consistency Rules (SCR)
Since the schema of the IP-XACT metadata format is defined using the XML schema technology, it is bound by the expressive limits of this technology. There are a certain number of consistency rules that are important for the coherence of the metadata schema and conforming documents which can not be expressed with XML schemas. In order to define theses rules, the IP-XACT specification defines a list of consistency rules called the Semantic Consistency Rules (SCR) which complements the IP-XACT metadata schema. The intent is that tools implement these rules in order to validate the coherence of documents which use the IP-XACT metadata schema. As with the TGI operations, the rules are organized by elements of focus.
3.4
The Semantic Web
The core technology for knowledge representation in the Semantic Web is the Resource Description Framework (RDF). Figure 3.8b illustrates the position of RDF
Application: Application:IP-XACT IP-XACTDesign DesignEnvironment Environment
IP-XACT IP-XACTTGI TGI Querying: Querying:SPARQL SPARQL
IP-XACT IP-XACTSchema Schema Ontologies: Ontologies:OWL OWL
IP-XACT IP-XACTSCR SCR Rules: Rules:SWRL SWRL
Taxonomies: Taxonomies:RDFS RDFS Data DataInterchange: Interchange:RDF RDF Syntax: Syntax:XML XMLor orN3 N3 Identifiers: Identifiers:URI URI
Character CharacterSet: Set:Unicode Unicode
(a) XML based stack
Application: IP-XACT Design Environment IP-XACT TGI Querying: Web Services
IP-XACT Schema Schema: XSD
Syntax and Data Interchange: XML Identifiers: URI
Character Set: Unicode
(b) N3 based stack
FIGURE 3.8: Semantic Web stack
IP-XACT SCR Rules: English
64
System Level Design with .Net Technology
and all the other Semantic Web technologies relative to XML technologies. The Semantic Web is often seen as a layer above XML but as Figure 3.8b illustrates, it can be layered on other standards such as N3 notation [34]; hence it is independent of XML. Figure 3.8b also illustrates how the various IP-XACT standards can be implemented with the SW and positions a design environment as an application which uses the stack. Figure 3.8a serves to illustrate how the SW approach compares to the current XML approach of implementing the IP-XACT standards.
3.4.1
Resource Description Framework
RDF [20] perceives the world as a collection of resources. A resource can be anything (a web page, a fragment of a web page, a person, an object, etc.) and is referred to in the Semantic Web with a Uniform Resource Identifier (URI). RDF is built on 3 concepts: resources, properties (relations) and statements. As mentioned earlier, a resource can be anything which is referred to by an URI. A property or relation is a resource which gives information on an aspect of a resource. Since a property is a resource, all properties have a unique URI. A statement is a triple of the form which puts a Ressouce (the subject of the statement) in relation with another Ressouce (the object of the statement). The property (relation) of a statement indicates the aspects which the statement is giving information about. A set of statements form an RDF graph. RDF defines a small set of standard resources. The most important is the type property (relation) which expresses a “is a” relation. A triple which uses the type relation such as generally indicates that the subject of the triple is a conceptualization and the object is an instance of the conceptualization. Figure 3.9 is a simple RDF graph which describes some common hardware conceptualizations and instances of those conceptualizations which relate to the IXP45X Intel network processor [119]. Each oval depicts an RDF resource which represents a conceptualization. The arrows depict
Concepts hw:Processor Manufacturer rdf:type
hw:Communication Bus
rdf:type
hw:madeBy
rdf:type
hw:PCI v2.2
hw:IXP45X
hw:Intel Instances
hw:Network Processor
hw:supports
FIGURE 3.9: RDF example
65
The Semantic Web Applied to IP-Based Design
property resources. All resources are identified with a URI (URI prefix is used for conciseness). The model states that an IXP45X is a network processor (a special kind of processor) which is made by a specific manufacturer called Intel and supports a specific communication bus called PCI v2.2. The RDF specification defines a standard serialization format called RDF/XML for its abstract syntax. As mentioned earlier, other serialization formats exist such as N3 [34]. These serialization formats are not typically consumed directly but through tools. Figure 3.10 illustrates the serialization on the example presented in Figure 3.9. In the reminder of this article we shall use N3 notation. The N3 notation expresses each triple on a single line.
(a) RDF/XML serialization
hw:IXPA45X hw:madeBy hw:Intel. hw:IXPA45X hw:supports hw:PCIv2.2. hw:IXPA45X rdf:type hw:NetworkProcessor. hw:Intel rdf:type hw:ProcessorManufacturer. hw:PCIv2.2rdf:type hw:CommunicationBus.
(b) N3 serialization
FIGURE 3.10: RDF serialization
3.4.2
RDF Schema
RDF Schema (RDFS) [20] extends RDF along 2 axes: it defines a precise list of resources (meaning and URIs) and a set of entailment rules which allow the inference of new triples from RDFS graphs. RDFS allows the definition of classes of resources as well as their organization. The concept of a class in RDFS must be understood as a set. Table 3.2 summarizes the main concepts found in RDFS. Figure 3.11 is an extension of Figure 3.9 which defines relations between conceptualizations. For example, Figure 3.11 expresses that the set of all network processors is a subset of the set processors. We often refer to networks of conceptualizations as schema hence the name RDF Schema. Based on the semantic of RDFS and underlying entailment rules, 2 implicit triples should be understood : and . The previous implicit triples may be inferred because of the semantic of the rdfs: subClassOf property. Since the rdfs:subClassOf expresses the relation of parent set and subset between 2 sets, all individuals in the subset are necessary individuals of the parent set. Since RDFS and OWL are languages defined with RDF, both can be serialized to RDF/XML or N3.
66
System Level Design with .Net Technology
TABLE 3.2: Main RDFS concepts Class Meaning Relation Class This is the class of Range resources that are RDF classes. rdfs:Class is an instance of rdfs:Class. Property rdf:Property is the class Domain of RDF properties. rdf:Property is an instance of rdfs:Class. SubClassOf
Meaning The property rdfs:range is an instance of rdf:Property that is used to state that the values of a property are instances of one or more classes. The property rdfs:domain is an instance of rdf:Property that is used to state that any resource that has a given property is an instance of one or more classes. The property rdfs:subClassOf is an instance of rdf:Property that is used to state that all the instances of one class are instances of another. SubPropertyOf The property rdfs:subPropertyOf is an instance of rdf:Property that is used to state that all resources related by one property are also related by another.
Concepts rdfs:Class rdf:type
rdf:type hw:Processor hw:Manufacturer
rdf:type rdfs:subClassOf
rdf:type
rdfs:subClassOf
hw:Processor Manufacturer rdf:type
hw:Network Processor
hw:Communication Bus
rdf:type
hw:madeBy
rdf:type
hw:PCI v2.2
hw:IXP45X
hw:Intel Instances
rdf:type
hw:supports
FIGURE 3.11: RDFS example
3.4.3
Web Ontology Language (OWL)
OWL [20] is a knowledge representation language which can be used to represent the terminological knowledge of a domain in a structured and formally wellunderstood way. More specifically, OWL is a description logic language. Description logics express conceptual descriptions with first-order predicate logic. OWL is
The Semantic Web Applied to IP-Based Design
67
defined on top of RDFS in the same way RDFS extends RDF. OWL defines a list of resources (meaning and URIs) and a set of entailment rules which allow the inference of knowledge in the form of new triples. In particular, OWL adds to RDFS the capability to express the admissibility criteria of a given class (set). It is possible to define a class not only by defining its relationship with other classes but also by defining the criteria which a resource must respect to be classified as an individual of the class. There are many semantically rich elements in the OWL specification but for the context of this chapter we will focus on the concept of a Restriction. The concept of a Restriction defines a resource of type owl:Class. This class defines the set of all individuals which express restriction (criteria) specifications which are used to define other owl:Class. These criteria are typically on the existence of relations (property statement) which an individual resource must be part of. Figure 3.12 defines in greater detail the conceptualization of a Processor and a Network Processor. The example defines the class Processor as all resources which are part of exactly 1 relation madeOf as well as at least 1 relation supports. The class Network is defined to be a subset of all Processors which are part of exactly 1 relation madeFor and the object of this relation must be “networking.” The example retakes the IXPA45X resource of the previous examples but does not express anything about its association with a class. Based on the semantic of OWL, reasoning over the example would conclude that the resource hw:IXPA45X is a Processor and a Network Processor because it fulfils all the criteria for both sets.
hw:Processor rdf:type owl:Class. hw:Processor rdfs:subClassOf _:processorRestriction1. _:processorRestriction1 rdf:type owl:Restriction. _:processorRestriction1 owl:onProperty hw:madeBy. _:processorRestriction1 owl:Cardinality "1"^^xsd:int. hw:Processor rdfs:subClassOf _:processorRestriction2. _:processorRestriction2 rdf:type owl:Restriction. _:processorRestriction2 owl:onProperty hw:supports. _:processorRestriction2 owl:min "1"^^xsd:int. hw:NetworkProcessor rdf:type owl:Class. hw:NetworkProcessor rdfs:subClassOf hw:Processor hw:Processor rdfs:subClassOf _:processorRestriction3. Network _:processorRestriction3 rdf:type owl:Restriction. _:processorRestriction3 owl:onProperty hw:madeFor. _:processorRestriction3 owl:hasValue "networking"^^xsd:string. _:processorRestriction3 owl:Cardinality "1"^^xsd:int. hw:IXPA45X hw:madeBy hw:Intel. hw:IXPA45X hw:madeFor "networking"^^xsd:string. hw:IXPA45X hw:supports hw:PCIv2.2.
FIGURE 3.12: OWL example in N3
68
System Level Design with .Net Technology
The OWL specification defines 3 subsets of the language with extents each other: OWL Lite, OWL DL and OWL Full. Each subset is balanced between expressivity and the computational complexity to reason over a model defined with the subset. Since OWL builds upon RDFS and RDF, it uses the same serialization formats.
3.4.4
SPARQL
SPARQL Protocol and RDF Query Language (SPARQL) [174] is a query language for RDF graphs. SPARQL is very much to RDF what SQL [65] is to relational databases. SPARQL is based on a pattern matching paradigm like XPath [33]. In the same way that an XPath describes an XML pattern which is usually hierarchical, a SPARQL query describes a graph pattern. A basic SPARQL query has 2 portions: a SELECT portion that defines a list of variables returned by the query and a WHERE portion that defines a list of triple statements used to match. The variables in the SELECT portion are used as unbound elements (subject, property or object) in the statements. The query defined in Figure 3.13 searches for all “NetworkProcessor” which support PCIv2.2. If executed on the example illustrated in Figure 3.11, the result set would contain hw:IXP45X. If the query had requested all Processors which support PCIv2.2, the result set would have been empty because there is no explicit relation (rdf:type) between hw:IXP45X and Processor. A SPARQL engine will only search for matches based on what is explicitly present in the queried graph. As discussed previously, the OWL language has precise semantics which includes entailment rules. By processing an OWL model with an inference engine such as Pellet [183], new statements can be added to the model based on the entailment rules. If we apply entailment rules to the OWL example, the RDF statement will be added. The execution of the query would now return the expected result. This capability to infer new knowledge from a given model is a great added value over typical approaches. The SPARQL specification also defines three other query types: 1. The CONSTRUCT query form returns a single RDF graph specified by a graph template. The result is an RDF graph formed by taking each query solution in the solution sequence, substituting for the variables in the graph template and combining the triples into a single RDF graph by set union.
Select Query SELECT ?x WHERE{ ?x rdf:type hw:NetworkProcessor. ?x hw:supports hw:PCIv2.2. } hw:IXP45X
FIGURE 3.13: Select query
The Semantic Web Applied to IP-Based Design Construct Query CONSTRUCT{ ?x rdf:type ?c1. }WHERE{ ?c2 rdfs:subClassOf ?c1. ?x rdf:type ?c2. } hw:IXP45X rdf:type hw:Processor.
69
Ask Query ASK { hw:IXP45X hw:madeBy hw:Intel. } yes
FIGURE 3.14: Construct and Ask queries 2. The ASK query is used to test whether or not a query pattern has a solution. No information is returned about the possible query solutions, just whether or not a solution exists. 3. The DESCRIBE form returns a single result RDF graph containing RDF data about resources. This data is not prescribed by a SPARQL query, where the query client would need to know the structure of the RDF in the data source, but, instead, is determined by the SPARQL query processor. The query pattern is used to create a result set. Figure 3.14 give examples of the CONSTRUCT and an ASK query as well as the results of the queries if executed on Figure 3.11. In conjunction to the SPARQL specification, the SPARQL Protocol specification [56] was established in order to define a communication interface over HTTP for remote SPARLQ query execution. In addition to the operators defined by the official SPARQL standard, others have been proposed and implemented by tools. A specification called SPARQL/Update [174] has been proposed to the W3C for standardization. This specification defines two operators which enable to insert and delete triples, hence given write, update and delete capabilities. Figure 3.15 illustrates the use of the SPARQL/Update extension in order to change the madeBy poperty of all NetworkProcessor from “Intel” to “AMD.”
3.4.5
Tool for the Semantic Web: Editors and Jena
The Jena framework [48] is a Java-based open source toolkit for the Semantic Web. The current version implements a programmatic framework for RDF, RDFS, OWL and SPARQL. Jena also provides some interesting features such as a rule-based Update Query DELETE{ ?x hw:madeBy hw:Intel. }INSERT{ ?x hw:madeBy hw:AMD. }WHERE{ ?x hw:madeBy hw:Intel. ?x rdf:type hw:NetworkProcessor. }
FIGURE 3.15: Update query
70
System Level Design with .Net Technology
FIGURE 3.16: TopBraid Composer
inference engine and a persistence storage framework for large RDF graph. Jena also provides some very powerful extensions to the SPARQL languages such as free text searches and property functions. In the context of this chapter, the most important extension is the SPARQL/Update specification. Web Semantic development is usually done using an editor. Multiple commercial and academic editors are available such as as Prot´eg´e [129] and TopBraid Composer (see http://www.topbraidcomposer.com/). These editors, in addition to facilitating model edition, commonly support visualization of semantic models, integration with inference engines, SPARQL integration and various kinds of analysis. Figure 3.16 is a screen shot of TopBraid Composer.
3.4.6
SWRL and Jena rules
The Semantic Web Rule Language (SWRL) [118] is a W3C member submitted standard since 2004. It is based on a combination of the OWL DL and the Unary/Binary Datalog RuleML sublanguages of the Rule Markup Language. SWRL extends OWL with Horn-like rules [117]. As such, SWRL defines a high-level abstract syntax for the definition of rules as well as a formal semantic definition for the interpretation of these rules in the context of an OWL ontology. SWRL rules take the form of
71
The Semantic Web Applied to IP-Based Design
rdfs:subClassOf(?x,?y)rdfs:subClassOf(?y,?z)->rdfs:subClassOf(?x,?z)
FIGURE 3.17: Rule example an implication between an antecedent (body) and consequent (head). The intended meaning can be read as: “Whenever the conditions specified in the antecedent hold, then the conditions specified in the consequent must also hold.” Both the antecedent (body) and consequent (head) consist of zero or more atoms. An empty antecedent is treated as trivially true (i.e., satisfied by every interpretation), so the consequent must also be satisfied by every interpretation; an empty consequent is treated as trivially false (i.e., not satisfied by any interpretation), so the antecedent must also not be satisfied by any interpretation. Multiple atoms are treated as a conjunction. Atoms in these rules can be of the form C(x), P(x,y), sameAs(x,y), differentFrom(x,y) or built-in(x,y,z,. . . ) where C is an OWL description, P is an OWL property and x,y are either variables, OWL individuals or OWL data values. The specification proposes a library of functions which reuses the existing built-ins in XQuery [37] and XPath. The list of built-ins may be extended by users. Figure 3.17 illustrates the use of SWRL to define an entailment rule for the rdfs: subClassOf property of RDFS. The rule states that if x is a subset of y and that y is a subset of z then x is also a subset of z. Figure 3.18 illustrates the concrete syntaxes of the example of Figure 3.17. The Jena framework also includes a general purpose
XML syntax x1 x2 x2 x3 x1 x3
RDF/XML syntax
FIGURE 3.18: SWRL serialization examples
72
System Level Design with .Net Technology
rule-based reasoner which is used to implement both its RDFS and OWL reasoners; however it is also available for general use. This reasoner supports rule-based inference over RDF graphs and provides forward chaining, backward chaining and a hybrid execution model. The syntax of the rules is very similar to the abstract syntax of SWRL. Jena also defines a similar built-in library such as the one defined by SWRL. Tools such as TopBraid Composer support both the edition of SWRL and Jena rules. Some inference engines such as Pellet support the execution of SWRL rules. When using Top Braid composer, the editor translates SWRL rules to Jena rules and then uses the Jena reasoner; this approach means that the editor only supports the subset of SWRL which maps to Jena rules.
3.5
XML and Its Shortcomings
Pre-XML data exchange was characterized by a vast amount of proprietary file formats, most of which were either binary or flat (comma-delimited, tab-delimited, etc.). Consumption of theses files came at a high cost, each software system had to implement a parser and interpreter for each file format; very little reuse was possible because of the diversity of data encoding and data structuring. Hence, enabling M software systems to exchange information bi-directionally with one another required M*(M-1) parser/interpreter bridges. This high number of software data exchange bridges made data exchange and data interoperability between systems a fairly challenging and expensive endeavor. The advent of the XML technology stack (XML, XSD, XPatn, XSLT and XQuery) [66] has democratized the exchange and consumption of data because XML-based data exchanges use a predefined data encoding scheme (Unicode) as well as a metastructure for syntax. This has enabled the development of generic file parsers which tools can embed and reuse. Moreover, since all the technologies in the XML stack are based on open standards, many free implementations are available which has greatly lowered the cost of data exchange and interoperability. Current data exchanges based on XML only require parties to define a precise XML data model which is defined using the XML schema standard. Once a data model has been defined, software systems only have to implement an interpreter which extracts data fragments (using XPath or XQuery) and consume them in a fashion which is coherent with the meaning or intent of the fragments. Even though XML stack has solved a vast number of problems with regards to data exchange, it has some very important shortcomings.
3.5.1
Multiple Grammars
XML allows multiple valid syntaxes for a particular semantic model. An XML data model consists of two aspects: one is syntax (syntax model) and the other is se-
The Semantic Web Applied to IP-Based Design
73
mantic (semantic model); hence to achieve a consensus around a data model requires a consensus on both models as well as the mapping between them. To be more precise, when defining an XML based format, it is necessary to define the schema that will define the structure which all XML files based on the schema must respect, the syntax model. The syntax model consists of the elements definitions, the attribute definitions, the nesting rules for elements just to name a few. The consensus with regards to semantics is the definition of “what” the data contained in the file structure means independently of how it is expressed in the file. This consensus is important because even if a data structure is clearly defined and accessing the data in the structure is simple, if multiple parties interpret the meaning (semantic) of the information differently, data interoperability has not been achieved. For example, if a data model defines an element called POWER which contains an integer value; it is possible that one party interprets the value in KiloWatts and other party in MegaWatts; this is called a semantic gap. The problem with having multiple syntax models is that it complicates achieving a consensus on the syntax model, for multiple schools of thought exist which advocate different styles. Also as the need for data exchange evolves, the structuring of the information will change for maintenance reasons in order to facilitate the integration of new data in the exchange. This causes incompatibilities between syntax models which requires software systems to be modified even if the semantics models is fully compatible because new data concepts are added or refined which does not invalidate earlier interpretation of the data. For example, between SPIRIT 1.2 and 1.4 some attributes have become elements. The consortium has published a set of XSLT and Perl scripts which manages the conversion from 1.2 to 1.4. We must consume meaning and not encoding and syntax. Figure 3.19 is an example of multiple syntaxes for the same meaning if the statement “IPX45X, a network processor, supports PCIv2.2, a communication bus.”
3.5.2
Documentation-Centric
By design, XML is intended for simple message-oriented data exchanges; it is document-centric. Hence the capabilities of storing data in multiple files – files which have no knowledge of one another but which the data they contain is complementary - and then easily combining them dynamically in order to query the consolidated
Example A Example B IPX45X PCIv2.2
FIGURE 3.19: Multiple possible syntaxes
74
System Level Design with .Net Technology
data set as a virtual file is not possible. For example, it is difficult to have an XML document which defines the structure of a system which is composed of modules, have multiple documents which contain detailed information of each module and then combine all of the documents and query across them. Another problem with the documentation-oriented nature of the XML is that if one wishes to use XPath, XQuery or XSLT it is necessary to know the physical location of a file in order to load it in memory and then manipulate it. Hence software systems which consume XML documentations must have inherent knowledge of the physical file names and locations in order to consume their contents. This makes XML consumption brittle from a physical data exchange point of view. For example, just changing the name of a file or its location can easily break data exchanges. Software systems are tied to the physical location of files even though it is the consumption of their content semantics which is important. Typically, both the consolidation and the location transparency problems are avoided by using a consolidation repository which stores all the information in the XML documents. This repository is usually based on relational database systems because of the omnipresence of the technology and supporting tools. This is a very effective way to consolidate data and to create a mostly transparent access to the information (one must still know the connection string of the database) but it brings a new problem: it is necessary to define a new data model – a relational data modelwhich brings the same two consensus aspects we described with XML data models – grammar model and semantic model consensus. Moreover, a new technology stack must be learned and exploited which brings the cost of software systems higher. SPIRIT uses this approach. Again, in order to avoid this new problem, a typical solution is to hide the data store behind a Web Services layer which exposes an XML data model. By reusing the initial data model, no new consensus must be achieved, only the communication and the maintenance of the APIs which is not necessarily an easy task. Moreover, hiding the consolidated documents behind an API typically narrows greatly the consumption capabilities of a software because APIs usually only expose specific consolidated fragments so querying can only be done on those fragment and not on everything in the data store.
3.5.3
Biased Grammar Model
Database technologies have evolved from tree-based data models (hierarchically database) to network-based models (network databases) to relation data models over the last 30 years; the XML data model brings us back to the beginning. The meta-model of XML data models is biased toward tree-like structures; the nesting of elements is the principal mechanism which allow grammar model definitions. Using graph structures such as in UML [38] class models and table structures in relation models is a more natural ways of modeling the world. Since an XML data model requires a single top level root, it is necessary to either promote a concept in the data model to the top element or create an artificial one which has no semantic meaning and is present for only syntactic reasons.
The Semantic Web Applied to IP-Based Design
75
Through the use of the ID/IDREF from XSD 1.0 or key/keyref pairs from XSD 2.0 it is possible to create implicit graph-like structure attributing identifiers to nodes in an XML document and allowing nodes to refer other nodes by using their IDs. However, these are implicit graphs and querying through implicit graphs is not trivial. With XPath and XQuery, it is not possible to request the XML node which is referenced by another node. It is necessary to explicitly search for the referred node by using query predicates on the keys which identify the node.
3.5.4
Limited Metadata
XML offer more semantics than flat-file because metadata is present in the form of element name and attribute name; however semantic expressiveness is limited. For example the nesting of an element in another element has no semantics, i.e., if element a has a nested element b, does it mean that a possesses b or that b belongs to a, etc. File formats such as comma-delimited possess no inherent metadata; only the data is present. Hence, the consumption of theses files requires an intrinsic knowledge of the content of the file and its meaning. XML documents, by the presence of element and attribute names, deliver metadata as well as data. Moreover, the scoping of elements and attributes in namespaces makes them unique because these are based on URIs. A consensus on an XML data models implies that a consensus of the mapping of the semantic model to the grammar model has been achieved, which implies that a consensus on each XML element – which is unique because of the URI schemeand its meaning has been reached. This allows a software system to consume a previously unknown XML file because it may search the file for elements and attributes with specific URIs which the software systems know how to correctly interpret. Having said this, XML does not convey all the necessary metadata which is implied by an XML data model. The mechanism of nesting has no semantics; an XML element may be nested under another XML element in order to represent very different meanings. For example, given a hardware module which is represented as an XML element and the fact that we wish to express the following two lists of modules which are related to a specific module, 1. the list of modules which this module is backward compatible with 2. the list of modules which the module is not backward compatible with We cannot simply list each module in both of these lists under the module of discussion because it will not be possible to distinguish the modules in the compatible list from the incompatible list. This problem is typically solved by creating two XML elements that respectively represent both of these lists under which we nest the appropriate module entries. We then nest these two XML elements under the module under discussion. In this solution nesting still does not have any semantics but we have created an unambiguous data structure which can be interpreted by a software system.
76
System Level Design with .Net Technology
UML class models and Entity-Relationship data models do not have this issue, for in both, associations between either classes or entities are named. Moreover, in a UML class it is possible to define roles for each class which participates in an association. Hence, in UML class models and ER models, the semantics of associations are precise and clear. If we wished to represent the above problem in a UML class diagram we would have only one class definition and two recursive associations, one called isCompatibleWith and the other isIncompatibleWith with. The cardinality of both associations would probably be many to many.
3.6
Advantages of the Semantic Web
The advent of XML technologies enabled a simpler data manipulation paradigm than the one supported by flat file approaches. It also brought data manipulation at a higher-level of abstraction. Data manipulation was no longer focused on encoding and parsing but on grammar and data exchange. The Semantic technologies do the same with respect to the XML technologies. These new technologies bring many important benefits over its predecessor with regards to data manipulation and also bring the data manipulation at a higher-level of abstraction. The focus now is on semantics, data integration and queries over consolidated information sources. Table 3.3 is a summary of this section.
3.6.1
Richer Semantic Expressivity
OWL offers a rich, formal and non-ambiguous language to describe information and knowledge. In many regards, its expressive capabilities go well beyond that of
TABLE 3.3: XML and OWL Comparison XML
OWL
1.
Syntax and grammar focused
1.
2. 3. 4.
Informal Semantics Document and centralized oriented Supports federation with but not transparently Hierarchical data model with implicit graph structures Multiple grammar for a specific semantic model Support syntax transformation Doesn’t support entailment
2. 3. 4.
5. 6. 7. 8.
5. 6. 7. 8.
Semantic focused which abstracts syntax Formal Semantics Distributed oriented Supports federation by design and transparently Explicit graph data model Single grammar for a specific semantic model Support semantic integration Supports entailment
The Semantic Web Applied to IP-Based Design
77
XML. By defining information using OWL, it is possible to infer additional information, hence creating knowledge; this is not possible with XML. This capability is enhanced when combining OWL with SWRL rules. Another distinctive advantage that OWL, RDFS and RDF has over XML is that the schemas are also information to which inferencing may be applied. As discussed earlier, most modern data model schemas such as UML and Relation schemas are inherently graph-oriented; XML is tree-oriented which can often be an inconvenience. By the vary nature of the triple basis of the RDF, semantic data models are directed graphs. In addition, the meaning of a model which used semantic technologies is unambiguous. Each element of data that is in a relation with another element of data is done so with a predicate; hence the meaning of the relation is unambiguous. RDF based models do not suffer from the semantic gaps with which XML does such as element imbrications which define ambiguous relations between data element.
3.6.2
Separation between Semantics and Encoding
The SW stack is focused on the modeling/expression of semantics and not encoding/data structure. The serialization of OWL to an RDF/XML encoding and grammar is defined in the OWL specification, hence only one grammar model exists of a specific semantic model for that particular encoding/structuring scheme. Even though multiple encoding/structuring schemes exist for OWL, consumption is always done at the semantic level through tools such as SPARQL and Jena. Since the consumption of OWL is at the semantic level when using Jena, software applications do not know which file format was used to express the OWL model, hence, they are not affected by encoding changes or file location etc. Moreover, software applications are also isolated from changes to the semantic models which are backward compatible. As long as semantic definitions are not removed or the meaning changed (semantic element should never change meaning), software applications will not break.
3.6.3
Federated Data Model
The semantic technology stack is based on the premise that “anybody can say anything about anything” on the web which implies that data are scattered throughout the Web. Because of this premise, technologies of the semantic stack have been developed in order to federate this information and allow its consumption in a manner which is agnostic of this distribution. Through federation, consumers have the perception that all the information is local and storage in a single “container.” Tools such as Jena and SPARQL engines are designed to stitch RDF statements from multiple sources together in order to achieve a global picture, a single graph structure. This focus on federation is at the opposite of XML which is mostly focus data exchange patterns in the context of peer to peer communication where distribution is not a concern. Hence, semantic technologies have a distinct advantage in scenarios which require data consolidation (file consolidation) even when the consolidation is not on the Web. The newest generation of XML technologies (XQuery 1.0, XPath 2.0 and
78
System Level Design with .Net Technology
XSLT 2.0) has the capability to manipulate data from multiple files, hence offering a certain level of federation capabilities. However, the physical location of files has not been abstracted and the consolidated view is achieved by the user defining the necessary joins between data elements. With the Semantic Web technologies, file location is not important because it is managed by SPARQL engines which also stitch data element base on semantics without human intervention.
3.6.4
Simpler Data Manipulation
Data manipulation can often be reduced to three simple areas of focus: data queries, data updating and data transformation. In the context of the XML technologies, the XQuery and XSLT languages support theses areas. Even though they are simpler than previous technologies, these technologies still have various aspects which are overly complicated. XQuery offers querying capabilities over XML documents. It has a syntax which is similar to a combination of SQL and XPath. It is not a very complicated language but because of the underlying data model which is XML, queries must always take in consideration tree-oriented structure of the data which is not as simple as a graph structure. Moreover, as discussed earlier, it is difficult to navigate in the implicit graph structure of an XML document. Both of these aspects are simpler with the Semantic Web technologies, because all nodes have an explicit identifier which is universally unique and which is independent of the relation in which the node is part of. It is this node which is used to define the explicit graph structure of the data model. Because of this explicit graph structure based on simple identifiers it is simple to navigate along relation by means of the name of the relation and the identifier of the starting node. The extension for the XQuery language has been proposed in order to manage update operations. Prior to this, updates could only be achieved by either using cheat updating transformation to generate a new document from an older one with the updates or to use a coding library which supported the DOM which is a standard API for XML document manipulation. Both of these older approaches were overly complicated for a simple attribute value update. The new XQuery extension offers a number of update operators which allows the insertion, modification, deleting of XML content. This approach is much simpler than the older approaches but it still exposes users to the tree structure requirements of the manipulation such as: 1. under which element must another element be added 2. if other elements are already present between which elements should the inserting be done. With the semantic technology, additions are all as simple as adding them to the “cloud” of existing statements because each statements is independent of the other. Removal or modification of statements is just a matter of finding the statements which must be manipulated and applying the manipulation; all the operations are done at the semantic level.
The Semantic Web Applied to IP-Based Design
79
With regards to the area of transformation, transformations which are focused on the massaging of various data sources into a canonical format are where the Semantic Web is at its best. Because semantic data models use a formal model to define their structure and their means, it is possible to use declarative axioms to define the semantic equivalence between various sources. Examples of such axioms are: 1. Equivalence class axiom; 2. SubClassOf axiom; 3. Equivent property axiom; 4. SubPropertyOf axiom. These types of transformations are referred in the Semantic Web as data or semantic integrations. [20] discusses in great details the capabilities of the Semantic Web to fulfill this task. The execution of the declarative statements is achieved with an inference engine. Using SWRL rules can also augment the expressive capabilities in order to define transformations. The Semantic Web is only focused on semantics; hence it is only interested in transformation with regards to semantics in order to either: 1. do semantic alignment (semantic integration); 2. define rules which will help deduce new knowledge in a semantic model from knowledge in another semantic model. The Semantic Web does not address transforming information in a certain encoding format into another encoding format. This capability is XSLT’s strong point. This is to be expected because the XML technologies were designed to facilitate manipulations at the syntax and grammar level. However, this same focus on encoding makes semantic data integration more complex because it must be achieved and the syntax and grammar level is at a lower level of abstraction. Because this article is mainly concerned with semantics, the advantages are given to the semantic technologies.
3.7
Case Study – SPIRIT
SPIRIT, as discussed earlier, uses XML information representational and sharing technology. IP vendors use the IP-XACT metadata format for the definition of metadata which describes their IPs. The SPIRIT development environment uses the IP-XACT metadata format schema as a MoA formalism in order to define systemmodel designs based on IP aggregation. If the SPIRIT consortium was based on the IP-XACT specification on the Semantic Web technologies, IP-XACT would benefit in a number of ways without very few disadvantages. This section will discuss these advantages.
80
System Level Design with .Net Technology
TABLE 3.4: Breaking semantic changes summary Change Type Attribute removal Element removal Splitting of concepts Cardinality Changes Type changing
3.7.1
Examples The spirit:choiceStyle and spirit:direction attributes have been removed from all element configurable elements The spirit:maxMasters and spirit:maxSlaves elements have been removed from the spirit:channel element The busDefinition was split in two concepts: busDefinition and abstractDefinition The number of spirit:busInterface elements under spirit:busInterfaces has changed from 1..n to 0..n. The type serviceTypeDef has changed from xs:Name to xs:string.
Impacted Files autoConfigure.xsd busInterface.xsd busDefinition.xsd busInterface.xsd port.xsd
Advantages Applied to Version Management (SPIRIT 1.2 to SPIRIT 1.4)
Like most specifications and standards, the IP-XACT specification is rectified. This evolution process generally will come at the cost of incompatibilities between version increments. On many occasions, these incompatibilities will be unavoidable because changes are made to the semantics of the specification. These changes will require consuming tools to be modified in order to interpret the new version correctly. For example, changes to the IP-XACT metadata format have been made between versions 1.2 and 1.4. Many of these changes are semantic in nature; hence, modification to the semantics and syntax of the metadata format have been made. Figure 3.4 summarizes semantic breaking changes between versions 1.2 and 1.4 of IP-XACT. These changes cause incompatibilities which are unavoidable and thus independent of the specification technology which is used. On the other hand, Figure 3.5 summarizes changes between both versions which do not break semantics but only syntax; if semantic level data exchange was used this would have been avoided.
3.7.2
Advantages Applied to Modeling
The semantic modeling technologies such as RDF, RDFS and OWL have an advantage over XML when modeling because many encoding details which have no semantic significance are abstracted. This abstraction of encoding details allows for a simpler modeling experience. A common practice in XML is to use container style elements in order to organize file content. An example of this technique is the uses of parameter elements which contain parameter elements in the IP XACT specification. These container tags add not semantic meaning, they only facilitate human readability. From a modeling and
81
The Semantic Web Applied to IP-Based Design
TABLE 3.5: Non-breaking semantic changes summary Change Type Attribute Renaming Element Renaming Changes from attribute to element
New collections tags
Examples The spirit:signalName is renamed to spirit:portMap inside spirit:signalMap The spirit:remapSignal element has been renamed to spirit:remapPort The spirit:name attribute of the spirit:adHocConnection element has become a sub-element.
Impacted Files busInterface.xsd
A container element called spirit:parameters has been created in order to organize multiple following spirit:parameter elements.
Global
busInterface.xsd subInstances.xsd
computational processing perspective, these tags only add “noise” to the model. A similar subject of great debate which is most often stylistic in nature is use of element vs attribute in order to encode data. As discussed earlier, XML support the use of elements or attributes for the encoding of properties. When an entity can be associated with multiple values for a same property it is often necessary to use elements, because only elements may be repeated. In mostly all other situations, from a modeling perspective, there is no semantic difference between both approaches. Hence, this again just complicates the modeling process and leaves room for unnecessary debates. Another area where XML has added complexity is the management of element cardinality when using nesting. XML schema offers two options for defining nesting rules: sequence and all. The sequence option allows nesting an unlimited sequence of elements (order is important); the number of times an element may be present can be specified using a minimum and maximum cardinality constraints. The all option allows nesting of an unlimited set of elements (order is not important) but each element can appear at most once. There is no option which allows the nesting of a set of elements (order not important) and that allows the specification of occurrence using minimum and maximum constraints. As a consequence, XML schemas when using sequence option in order to manage cardinality without constraints become sensitive to element reordering and addition. This added complexity is only because of encoding concerns and not for semantic concerns. Since the IP-XACT specification uses sequences it is overly sensitive to element adding in a sequence and re-ordering of elements in a sequence both of which do not break semantic compatibility but which break grammar compatibility. The concept of uniqueness is at the core of the Semantic Web technologies and thus across files. RDF has implemented this requirement with URIs, a very simple but effective construct. Since XML is document oriented, it does not have any construct to manage unique references across files. Moreover, even if XML schema offers capabilities to manage identifiers and references within a file, it is not common
82
System Level Design with .Net Technology
practice to use them. The IP-XACT specification does not use the key capabilities of XML schema; it has defined it own concept called VLNV, a 4 part identifier formed by the vendor, the library, the component name and the component version. Using a custom identifier scheme adds unnecessary complexity. Moreover, tools require added development to guarantee the integrity and good management of the identifiers. The 4 part scheme of IP-XACT could be easily encoded within a URI.
3.7.3
Impact on TGI
The TGI portion of the SPIRIT standard exemplifies the Web Services approach to the XML location and consolidation problem. By using the SW technologies, the entire API could be eliminated because federated SPARQL queries over a collection of RDF files. The quasi-totality of the TGI API is composed of “getter” and “setter” operations. There are two types of “getter” operation in the API, those that return values contained in the model and those that return computed values based on values in the model. In the current version, all of the operations that return computed values are test operations which return Boolean values. The first class of “getter” operations can be substituted with simple SPARQL queries using the SELECT construct. The second class of “getter” operation can be substituted with simple SPARQL queries using the ASK construct. Respectively the getAbstractionDefPortLogicalName and getAbstractionDefPortIsAddress are examples of the two types of “getter” operations. Figure 3.20 illustrates the implementation of these two operations. The TGI API also has a number of “setter” type operations which allow the modification of the data contained in the data store. These methods cannot be implemented with basic SPARQL queries. Two main options are possible. The first is to use the Jena framework library and use the API to modify data. The other
spirit:AbstractDefinition rdf:type owl:Class. spirit:Port rdf:type owl:Class. spirit:hasPort rdf:type owl:ObjectProperty. spirit:hasIdentifier rdf:type owl:DataProperty. spirit:hasLogicalName rdf:type owl:DataPorperty. spirit:isAddress rdf:type owl:DataProperty. spirit:AbstractDefinitionInstance rdf:type AbstractDefinition. spirit:PortInstance rdf:type :Port. spirit:AbstractDefinitionInstance :hasPort :PortInstance. spirit:PortInstance :hasIdentifier "myID". spirit:PortInstance :isAddress true. spirit:PortInstance :hasLogicalName "myName". SELECT ?logicalName ASK{ ?aPort rdf:type :Port. WHERE{ ?aPort rdf:type spirit:Port. ?aPort spirit:hasIdentifier "myID". ?aPort spirit:hasIdentifier "myID". ?aPort spirit:isAddress true. ?aPort spirit:hasLogicalName. } ?logicalName. }
FIGURE 3.20: SPARQL implementation of TGI example
The Semantic Web Applied to IP-Based Design
83
INSERT{ ?aPort spirit:hasLogicalName "newName".} DELETE{ ?aPort spirit:hasLogicalName ?oldName.} WHERE{ ?aPort spirit:hasIdentifier "myID". }
FIGURE 3.21: SPARQL Update implementation of TGI example option is to use the UPDATE/SPARQL extensions which the Jena SPARQL engine supports. Figure 3.21 is an example of using the SPARQL/UPDATE for the setAbstractionDefPortLogicalName operation.
3.7.4
Implications for SPIRIT Semantic Constraint Rules (SCRs)
The SPIRIT 1.4 specification contains a list of SCRs which define constraints that cannot be expressed or easily expressed with XML Schemas. By using semantic technologies such as SWRL or Jena rules, SCRs which could be expressed using these technologies could be used to verify the consistency of designs. Verification would be achieved by processing the design using an inference engine which supports rules. This utilization of the semantic technologies would bring two key benefits: 1. Unless specified using controlled vocabularies, rules expressed in natural languages may be ambiguous. This ambiguity may result in different interpretations of the rules, hence different validation of the rule. By using formal rule languages such as SWRL, rules can be specified in a nonambiguous fashion eliminating any possible ambiguities. 2. Rules described in a specification document, hence not as a constraint in a schema language such as XML schema, must be translated into a program in order to be applied to models for validation purposes. By using a formalism which is executable, this extra translation step is eliminated, thus adding value. The remainder of this section will present a number of SCRs rules which may be expressed using OWL and/or SWRL. We will also present rules which cannot be expressed formally in order to present possible limitations. The objective of this section is not to discuss thoroughly all the SCRs but rather to demonstrate that it is possible to define formally some of the rules with SWRL which cannot be expressed with XML Schema. Even if only a fraction of the SCRs can be expressed formally using SWRL, this capability is a benefit over the current XML implementation which cannot. The specification contains a certain number of rules which pertain to referential integrity between design elements. The main objective of these rules is to express
84
System Level Design with .Net Technology
spirit:BusInterface rdf:type owl:Class. spirit:AbstractionDefinition rdf:type owl:Class. spirit:BusDefinition rdf:type owl:Class. spirit:hasBusType rdf:type owl:ObjectProperty. spirit:BusInterface rdfs:subClassOf _:BusInterfaceRestriction1. _:BusInterfaceRestriction1 owl:onProperty spirit:hasBusType. _:BusInterfaceRestriction1 owl:allValuesFrom spirit: spirit:BusDefinition. spirit: AbstractionDefinition rdfs:subClassOf _: AbstractionDefinitionRestriction1. _: AbstractionDefinitionRestriction1 owl:onProperty spirit:hasBusType. _: AbstractionDefinitionRestriction1 owl:allValuesFrom spirit:spirit:BusDefinition.
FIGURE 3.22: Implementing SCR 1.4 using OWL
that a design element which references another design element must reference a valid element. The XML Schema 1.0 standard allows the definition of element attributes as identified by using the ID type. These identifiers must be unique within the context of a document. These identifiers may be used as values for element attributes which are declared as type IDEF. Using this feature, it is not possible to declare that a certain element may refer, by means of its ID attribute, to another element of a specific type. It is only possible to declare that an element may refer to another element by means of its ID. With OWL, all resources must have a unique ID (its URI) and resources may be associated with specific owl:Class. By means of “allValuesFrom” axioms it is possible to define precise class criteria on range constraints from relations. The SCR 1.4 is defined as follows: The VLNV in a busType element in a bus interface or abstraction definition shall be a reference to a bus definition. Figure 3.22 illustrates how this rule could be implemented using OWL. The models define that all instances which are in a relation with an busInterface or an abstractionDefinition instances using the hasBusType property must be a busDefinition. The owl:allValuesFrom declares this constraint on the range of the hasBusType property. This example also demonstrates the substitution of VLNVs by URIs. SCRs such as SCR 2.4-2.9 express more complex conditional constraint on values. These rules define constraints on the allowed combinations of interfaces that may be connected together using an interconnection. For example SCR 2.4 states: An interconnection element shall only connect a master interface to a slave interface or a mirrored-master. These rules could be expressed using only OWL and restriction criteria. Figure 3.23 illustrates the implementation of SCR 2.4. SCRs which express constraint on values which are in simple equality relations can be expressed using OWL and SWRL. For example SCR 2.10 states: In a direct master to slave connection, the value if bitsInLAU in the master’s address space shall match the value of the bitsInLAU in the slave’s memory space. Figure 3.24 illustrates the implementation of SCR 2.10 using OWL and SWRL. As stated earlier, no all rules may be expressed using OWL and/or Jena rules. An example of such a rule is SCR 3.3 which states: A channel can be connected to no more mirrored-master busInterfaces than the least value of maxMasters in the busDefinitions referenced by the connected busInterfaces. The main issue is that since
The Semantic Web Applied to IP-Based Design
85
spirit:MasterInterface rdf:type owl:Class. spirit:SlaveInterface rdf:type owl:Class. spirit:MirroredMasterInterface rdf:type owl:Class. spirit:MirroredSlaveInterface rdf:type owl:Class. spirit:MirroredSystemInterface rdf:type owl:Class. spirit:DirectInterface rdf:type owl:Class. spirit:MasterInterconnection rdf:type owl:Class. spirit:hasMainInterface rdf:type owl:ObjectProperty. spirit:hasSecondaryInterface rdf:type owl:ObjectProperty. spirit:MasterInterconnection rdfs:subClassOf _:MasterInterconnectionRestriction1. _:MasterInterconnectionRestriction1 owl:onProperty spirit:hasMainInterface. _:MasterInterconnectionRestriction1 owl:allValuesFrom spirit:spirit:MasterInterface. spirit:MasterInterconnection rdfs:subClassOf _:MasterInterconnectionRestriction2. _:MasterInterconnectionRestriction2 owl:onProperty spirit:hasSecondaryInterface. _:MasterInterconnectionRestriction2 owl:allValuesFrom _:Union1. _:Union1 owl:unionOf (spirit:SlaveInterface spirit:MirroredMasterInterface).
FIGURE 3.23: Implementing SCR 2.4 using OWL
OWL and Jena are based on first-order logic, there is no direct way to express rules which require counting, hence we cannot express the fact that the sum of a certain number of relations which an instance is part of such be lower than value which is itself defined by another relation. Our examples with the concept of cardinality have always been with an absolute value which is part of the schema, hence all instances must respect the same cardinality.
3.7.5
Dependency XPath
The IP-XACT metadata specification allows design models to contain values which are defined with mathematical equations based values present in the design models. Theses equations are expressed using XPath 1.0 expressions. Figure 3.25 illustrates a typical example for the use of dependency expression; the example defines the “base address of a certain memory map” as a function of parameters of another memory map. The specification also defines a list of XPath functions which extend the default library. By using SWRL rules, such dependency expression may be defined. Also, in the same way XPath supports user defined functions, user defined built-ins may be defined. Execution of the SWRL rule by an engine will result in the evaluation of the expressions. Figure 3.25 illustrates the XML oriented approach defined by IP-XACT as well as the equivalent using SWRL. The SWRL portion defines custom built-ins which have a prefix of spiritb. These are functions which are not part of the default SWRL built-ins; they have the same meaning as their equivalent in the XML version. The main differences between both approaches are: 1. In the SWRL version, because of the predicate nature of the rules, it is not possible to define expressions which use imbricate built-ins. It is necessary to define variables for each intermediate calculation.
86
System Level Design with .Net Technology
spirit:AddressSpace rdf:type owl:Class. spirit:hasBitsInLAU rdf:type rdf:DataTypeProperty. spirit:hasAddressSpace rdf:type rdf:ObjectProperty. spirit:Interface rdfs:subClassOf _:InterfaceRestriction1. _:InterfaceRestriction1 owl:onProperty spirit:hasAddressSpace. _:InterfaceRestriction1 owl:cardinality 1. spirit:AddressSpace rdfs:subClassOf _:AddressSpaceRestriction1. _:AddressSpaceRestriction1 owl:onProperty spirit:hasBitsInLAU. _:AddressSpaceRestriction1 owl:cardinality 1. spirit:MasterInterface rdfs:subClassOf spirit:Interface. spirit:SlaveInterface rdfs:subClassOf spirit:Interface. spirit:MasterInterconnection(?connection) spirit:hasMainInterface(?connection, ?x) spirit:hasSecondaryInterface(?connection, ?y) spirit:MasterInterface(?x) spirit:SlaveInterface(?y) spirit:AddressSpace(?space1) spirit:hasAddressSpace(?x, ?space1) spirit:hasBitsInLAU(?space1?, bits_x) spirit:AddressSpace(?space2) spirit:hasAddressSpace(?y, ?space2) spirit:hasBitsInLAU(?space2?, bits_y) -> swrlb:equal(?bits_x, bits_y)
FIGURE 3.24: Implementing SCR 2.10 using OWL
2. In the IP-XACT version, it is necessary to use spirit:id for values in order to reference them in calculations. The SWRL approach is more verbose than the XPath approach, but it has the added advantage of: 1. Not requiring spirit:id to be defined; hence it is possible to use values which were not initially intended to be referred. 2. Not requiring the implementation of a pre-processing stage of XML models. The IP-XACT approach requires custom code be written to interpret the embedded XPath expressions
3.8
Cost of Adoption
Migration of the SPIRIT standards to the Semantic Web would offer many benefits which are important with regards to expressivity, simplicity and flexibility. However,
The Semantic Web Applied to IP-Based Design
87
mmap ab1 0 786432 32 memory read-write dependent_mmap 0 4096 32 register read-write spirit:MemoryMap(?mm1) spirit:hasName(?mm,"mmap") spirit:MemoryMap(?mm2) spirit:hasName (?mm, "dependent_mmap") spirit:hasRange(?mm,?range) spirit:hasBaseAddress(?mm,?baseAddr) spiritb:decode(?decRange,?range) spiritb:decode(?decBaseAddr,?baseAddr) swrlb:add(?tmpRange,?decRange 1) spiritb:log(?tmpLog, 2, ?decBaseAddr) swrlb:floor(?floorTmp, ?tmpLog) swrlb:pow(?powTmp, 2, ?floorTmp) swrlb:add(?dependent_baseAddr, ?powTmp, ?tmpRange) -> spirit:hasBaseAddress(?mm2, ?dependent_baseAddr)
XPath expression
FIGURE 3.25: Dependency XPath example nothing comes without a price. The migration of the SPIRIT standard as well as tools which have been developed by vendors which have adopted the standard would consist of three primary tasks: 1. Develop an OWL and SWRL based model for the IP-XACT standards; 2. Migrate all IP descriptions and design models to the new ontology; 3. Replace the XML consumption portion of current tools with an implementation based either on SPARQL or on the Jena toolkit. The first task would not be very difficult for the most demanding portion of creating an OWL model is determining the required semantics and achieving consensus. This has already been achieved through the development of the IP-XACT metadata format. The second task will probably be the most demanding; however by using a combination of XSLT scripts and Perl scripts, it would probably be possible to automate a large portion of the migration. The third task, despite being straightforward
88
System Level Design with .Net Technology
once an OWL model has been established, will require a fair amount of development; however we believe that the mid- to long-term benefits out-weigh by far the cost. Some of the secondary tasks would be the selection of an inferences engine as well as the development of the necessary extension functions for SWRL rules.
3.9
Future Research
This chapter discusses a possible path for the use of the Semantic Web technologies in the context of EDA. Based on this work, many other aspects are left to be explored. The development of a complete ontology for the IP-XACT standards would offer many interesting challenges with regards to semantic modeling. It would also be very interesting to apply the ideas in this chapter to other work such as Colif and MoML. A discussion on the quantity of code required to implement a Semantic Web approach versus a traditional XML approach would be interesting in order to guide further implements. The whole aspect of performance benchmarking is also to be explored and discussed. Moreover, comparing the effectiveness of modeling with regards to time and complexity would be interesting in order to measure designer’s comfort with this approach compared to approaches based on XML.
3.10
Conclusion
The XML technology stack has significantly helped the EDA industry over the last decade by simplifying the exchange of information between tools. It has also given developers an effective mean for the creation of simple markup-based languages. We believe that the Semantic Web technology stack is the next step. The next generation of EDA tools will benefit in multiple ways by adopting a technology which is focused solely on semantics and not syntax. This chapter has presented the major benefits of the Semantic Web technologies over XML in general. It also discussed key benefits for the IP-XACT standard if it adopts these new technologies. We believe that the benefits of adopting the Semantic Web technologies outweigh by far its inconveniences.
4 Translating Design Pattern Concepts to Hardware Concepts Luc Charest Universit´e de Montr´eal - Canada Yann-Ga¨el Gu´eh´eneuc Universit´e de Montr´eal - Canada Yousra Tagmouti Universit´e de Montr´eal - Canada 4.1 4.2 4.3 4.4 4.5 4.6 4.7
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Object-Oriented Translations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Constraint and Assumptions for Design Pattern Synthesis . . . . . . . . . . . . . . . . . Design Pattern Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operational Description of Design Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Work & Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89 92 98 100 103 111 113
Introduction
For half a century, hardware systems have become increasingly complex and pervasive. They are not only found in satellite navigation systems or automated factory machinery but also in everyday cell-phone, parc-o-meter, and car control-andcommand systems. This increase in the use of hardware systems led to a (r)evolution in their design and implementation: the chips are becoming more and more powerful; their logics is implemented as software systems executed by the chips, thus helping system designers to cope with their complexity. These mixed hardware–software systems raise the level of generality of the “hardware part” and the level of abstraction of the “software part” of the systems. Thus, they suggest that mainstream software engineering techniques and good practices, such as design patterns, could be used by system designers to design and implement their mixed hardware–software systems.
89
90
System Level Design with .NET Technology
As a variable may match a register, we propose a mapping between design patterns and a hardware implementation. System designers could use this mapping when designing and implementing their mixed hardware–software systems to translate the solution of a design pattern into its appropriate hardware counter-part. Thus, designers would benefit for their systems of the good practices embodied by design patterns from software design. This chapter presents a mapping to “translate” some design patterns into hardware concepts to alleviate the system designers’ work and, thus, to accelerate the design and quality of mixed hardware–software systems. It focuses on interesting and challenging concepts to foster future research, without trying to be exhaustive. Design patterns are “good” solutions to recurring design problems in software design. We only consider the design patterns originally defined by Gamma et al. [89], because these patterns are well defined, well known, and the subject of many work in the software engineering community. With the beginning of the 21st century, design patterns began to emerge in the system design domain [51, 23]. However, the main challenge of mapping design pattern into hardware systems is that existing design patterns relate to object-oriented systems. Therefore, as a first approach of translating design patterns from software into hardware, we consider that we first build an “object system”—a representation of an object into the hardware world. As there are several ways of designing and implementing a processor, there is not only one way to build an object system. This chapter is not a catalog of recurring hardware design patterns because such a catalog should be compiled by the community based on the hardware designers’ experience and is therefore out of the focus of this chapter. Motivating Example. Consider as example the problem of a circuit C that performs a computation, as shown in Figure 4.1a. The classic solution to accelerate the given circuit, at the cost of augmenting its latency, is to pipeline this circuit into the subcircuits {C1 ,C2 ,C3 . . .Cn }, each executing a small amount of the computation in par-
(a) The problem: a complex circuit taking too much time to execute.
(b) The solution: a pipeline which simplifies the task into smaller sub-circuits while allowing an augmentation of the clock speed at the cost of constant latency.
FIGURE 4.1: Motivating example for getting inspiration from software engineering
Translating Design Pattern Concepts to Hardware Concepts
91
allel, as shown again in Figure 4.1b. (The clock speed could also be increased, resulting in an overall faster execution.) The pipeline architecture has been applied in software engineering to obtain the flexibility to replace particular components seamlessly. For example, it is used to design compilers, where each phase of the compilation corresponds to a component in the pipeline. We believe that other good practices from software engineering could be applied in hardware–software system design and therefore study the feasibility to apply concepts developed in software engineering to the synthesis of hardware systems. Running Example. In the rest of this chapter, we use the running example of a module ComplexNumber to compute and perform operations on complex numbers. Figure 4.2 describes the whole running example, from the software classes to its implementation and instantiation in hardware. We will describe this example stepby-step in the rest of this chapter. In particular, we will use this example to highlight the translation of object-oriented software concepts into hardware concepts, such as inheritance.
Number #value: float +add(p:Number): Number *
ComplexNumber #imag: float +add(p:ComplexNumber): ComplexNumber *
(a) A UML software definition of a ComplexNumber class using a Number base class.
(b) A possible hardware implementation of Object-Oriented inheritance.
(c) The module being in an “not instantiated” state.
(d) The ComplexNumber module down-cast into (or instantiated as) a Number class.
FIGURE 4.2: Running example of a ComplexNumber software class and its hardware module
92
System Level Design with .NET Technology
Structure of the Chapter. Design patterns are inherently tied to the object-oriented paradigm. Therefore, in Section 4.2, we present a mapping between software objectoriented concepts and hardware concepts. Then, in Section 4.3, we describe the constraints on our mapping. In Section 4.4, we detail our mapping between design pattern concepts and hardware concepts in the form of a catalog of the most interesting patterns. Such a mapping would be incomplete without a means to translate the patterns into hardware concepts concretely; we therefore present an operational description of design patterns and its use to generate hardware “code” in Section 4.5. In Section 4.6, we describe related work while in Section 4.7 we conclude and introduce future work.
4.2
Object-Oriented Translations
The design patterns in Gamma et al.’s catalog [89] are solutions to recurring object-oriented design problems. We therefore present first a mapping between objectoriented concepts and hardware concepts. This mapping will be used in Section 4.4 to map design patterns and hardware concepts. Essential concepts in object-oriented design and implementation are: objects and their instantiations, methods, inheritance (and casting operations), and polymorphism. We also discuss the cost of generating hardware code from object-oriented code based on our mapping.
4.2.1
Translation of Classes and Their Members
In object-oriented programming, the main concepts of interest are Structures or Classes, which contain fields and related methods. For example, in C++, the distinction between a struct and a class is the default scope: members of classes are private by default while those of structures are public by default. Class members may be of two types: fields, which contain data and define the “state” of a class or of its objects, and method (or constructor) that provide functionalities usually operating on the data contained in the fields. Even if the fields are tied with their methods in the class, they are distinct members. Methods are shared among all the instances of a same class while fields are not necessarily. Indeed, fields can be instance fields, whose values are unique for a particular object, instance of the class, or class fields, whose values are shared by the class and all of its instances.
4.2.2
Translation of Object Encapsulation
Encapsulation is the means by which a class embeds and hides its members. It requires a protection mechanism for the enclosing class to gain the responsibilities
Translating Design Pattern Concepts to Hardware Concepts
93
over its members and to isolate them from the outside world. Considering that we are targeting synthesis, with the software being ready to be transformed into hardware, we can treat a base class and its inherited classes similarly to an enclosing class and its encapsulated member. The encapsulated member can be generated directly into the enclosing class or it could be indexed with a pointer to another instance.
4.2.3
Translation of Object Instantiation
An instantiation corresponds to the creation of a new object, hence memory allocation for their instance fields. When either a struct or class is locally declared, their instance fields are allocated on the execution stack. When a dynamic instantiation is requested (with new or malloc), the instance fields are allocated on the heap. Unless a class has no fields and therefore provides only methods, its fields must be defined in its hardware equivalent. Fields are required to be created in two different situations: 1. At the beginning of the execution of the system, for class fields. 2. At the instantiation of the object, when its constructor gets called. There are at least four ways of storing instance fields: As constants, hard-coded as bits. This kind of implementation is interesting for some rare case of static constant fields, which are initialized at the beginning of the execution and which values do not change during execution. Constant fields would be generated with the platform in a ROM (or with a static combinatory circuit) and will have an infinite lifespan. In registers, distributed in small “modules” that could be generated to correspond to an object. In a global RAM memory, probably the best way to store the field values, in a centralized unit. In a local RAM memory, similar to the previous way, but distributed over the system. Instantiating class fields, at the beginning of the execution of the system, is straightforward because the translator could identify such cases and they can be implemented in hardware as constants or into registers for faster access. The fact that the class fields are shared by all objects makes it easy because the fields become unique in the whole system for a given class. The instantiation of an instance field is more challenging. The exact moment of a call to a given constructor is not known in advance if not simulated first. For example, nothing forbids the constructor call to be controlled by something like a random
94
System Level Design with .NET Technology
draw (e.g.: Rand()). Therefore, the time at which the construction takes place, and at which the instance fields must be allocated, cannot be taken for granted and the number of time a given constructor is called could also change between different execution runs. Therefore, for any realistic system, we cannot pre-compute the exact number and type of each object that will be instantiated in the system. We can only estimate this number and design a system that will be able to contain the most appropriate number of instances of each objects.
4.2.4
Translation of Object Method Calls
A method call corresponds to a static function call using an hidden this pointer to indicate the object on which to apply the function. For example, the call in Listing 4.1 (Line 13) is the object-oriented equivalent of the function call in Listing 4.2 (Line 12). The method calls in Figure 4.3a and 4.3b can be translated into a signal sent from one module to another, as shown in Figure 4.3c. Parameters can be sent over the signal data lines and the return values can be sent back as signals as well. If an elaborated structure is to be exchanged, a memory reference, as shown in Figure 4.3d, could be sent instead of concrete values.
1 2 3 4 5 6 7 8
/ / Class d e f i n i t i o n class List { protected : c l a s s Item ∗ head ; public : / / O b j e c t −O r i e n t e d p r o t o t y p e ( method ) : v o i d add ( I t e m &i t e m t o a d d ) ; }
9 10 11 12 13
/ ∗ O b j e c t −O r i e n t e d c a l l , t h e ‘ ‘ ∗ t h i s ” p o i n t e r i s ‘ ‘ h i d d e n , ” made i m p l i c i t by ‘ ‘ m y l i s t , ” w h i c h i s t h e o b j e c t o n t o w h i c h t o a p p l y t h e ‘ ‘ add” method ∗ / m y l i s t . add ( I t e m ( 4 2 ) ) ;
Listing 4.1: Object-oriented way; the Add method applied on a List object.
4.2.5
Translation of Polymorphism
Polymorphism allows behavioral variations based on the class of an object. Polymorphism corresponds to a method defined in several specialized classes with a signature identical to the signature of the method in the base class. When the method of an object is called, depending on the class that was used when the object was in-
Translating Design Pattern Concepts to Hardware Concepts
95
stantiated, the method of the appropriate class is called, even if the object was stored beneath a base class reference, as illustrated in Listing 4.3, Line 6 and 13. In this example, f() is polymorphic because it is redefined in the inherited class and it is marked as virtual. Both g() and h() are not polymorphic.
1 2 3 4
// Structure definition struct List { s t r u c t Item ∗ head ; }
5 6 7
/ / Procedural prototype ( function ) : v o i d L i s t a d d ( L i s t ∗ l i s t t o m o d i f y , I t e m &i t e m t o a d d ) ;
8 9 10 11 12 13
/ ∗ Pr oc edu ral c a l l , here t h e s t r u c t u r e onto which apply t h e b e h a v i o r o f ‘ ‘ add” m u s t be made e x p l i c i t by p a s s i n g t h e reference to the ‘ ‘ L i s t ” s t r u c t u r e ∗/ L i s t a d d f u n c t i o n (& m y l i s t , I t e m ( 4 2 ) ) ;
Listing 4.2: Classic procedural way; the call of the Add function with the List structure passed explicitly. Polymorphism is usually implemented using a Virtual table (also known as Vtable), which is a table that maps classes with pointers to methods to direct any call to the appropriate method. A virtual table is available at compile time, after parsing, before linking, as illustrated in Listing 4.4, which shows the assembly code of the virtual table with names mangling correspondence shown with Table 4.1.
3 4 5 6 7 8
c l a s s Base { public : v i r t u a l void f ( void ) ; void g ( void ) ; };
9 10 11 12 13 14 15
c l a s s D e r i v e d : p u b l i c Base { public : v i r t u a l void f ( void ) ; void h ( void ) ; };
Listing 4.3: Polymorphic f() method defined in base and derived class.
96
System Level Design with .NET Technology
main ComplexNumber #real: float #imag: float +add(p:ComplexNumber): ComplexNumber *
X:ComplexNumber add(Y)
this
(a) A UML software definition of a class.
(b) A software method call.
(c) A possible translation into hardware.
(d) Hardware method call using a memory reference.
FIGURE 4.3: Example of a class representing complex numbers, from its software design to its hardware implementation 103 104 105 106 107 108 109 110
ZTV4Base : .long 0 ZTI4Base .long .long ZN4Base1fEv .weak ZTS7Derived .section . r o d a t a . Z T S 7 D e r i v e d , ”aG” , @ p r o g b i t s , ZTS7Derived , comdat .type ZTS7Derived , @ o b j e c t
Listing 4.4: The vtable with the assembly language excerpt of the code of Listing 4.3.
The class of an object is not always known at compile time and the virtual table is persisted in the machine code and preserved for the methods called dynamically. Only virtual methods end up in the virtual table (Line 106) because ordinary methods can be called directly; they are linked with the object class that is implicit. The translation of polymorphism into hardware can be achieved by creating several instance of specialized classes into hardware and then controlling the classes of objects by deactivating some part of the module when base classes are needed. The translation of the Vtable into hardware then becomes straightforward, as exemplified in Table 4.2, because it is similar to a LUT (Look-Up Table). The LUT
Translating Design Pattern Concepts to Hardware Concepts
97
TABLE 4.1: Translation of mangled names of Listing 4.3. Mangled identifier ZTV4Base ZTI4Base ZN4Base1fEv ZTS7Derived
Result returned by c++filt vtable for Base typeinfo for Base Base::f() typeinfo name for Derived
TABLE 4.2: Possible method (and polymorphism) encoding for the ComplexNumber module. Class type Base Base Derived Derived
Method f() g() f() h()
Code assigned 0 1 2 3
becomes a decoder; the numbers can be attributed sequentially to each newly discovered method upon parsing during code generation.
4.2.6
Translation of Inheritance and Casting Operations
Inheritance and polymorphism help bring communality and variation [61]. In software engineering, inheritance is achieved by adding methods and properties to the base inherited structure. We reuse this idea in hardware by generating a module for each specialized class at the end of the inheritance tree (all leaf classes of the inheritance tree). Each of these specialized modules contains its parent class, which would, in its turn, recursively contain all of its parents, as shown in Figure 4.2b. A special register would then be generated within each class module to hold the type of the module, obj inst, as shown in our example. When the module is not instantiated, this number is 0, in Figure 4.2c, indicating all the data in the local registers/memories are invalid. The proposed mechanism also enables typecasting an object into an inherited class easily by changing the obj inst number to the one indicating the type of the new class. If the class is downcast, as illustrated in Figure 4.2d, the invalidated registers can still holds some valid values (masked by the downcast) and could be restored when the object is cast back to its original class.
98
4.3
System Level Design with .NET Technology
Constraint and Assumptions for Design Pattern Synthesis
The translation of design patterns from software to hardware is subject to one constraint and three assumptions that we present now.
4.3.1
Constraint: Dynamism of the Hardware
Hardware systems used to be static: sets of wires and components “hardwired” together to perform specific computations. For some time now, hardware systems blend altogether with their software systems to benefit from the dynamic nature of the software. End products are more customizable with flash memories and configurable with small Web servers implemented in embedded systems as software systems running on a generic hardware core. Traditionally, objects are generated in memory, having a generic processor performing method calls and field accesses. Although such a solution is the usual way of reaching the dynamism found in software systems, it is an extreme case that relies on a “pure” software implementation and a generic processor and that is therefore uninteresting in the context of this chapter. We are interested in implementing design patterns using a “pure” hardware system, which we could define as a hardware system with as less software as possible, and which would potentially bring faster execution at the cost of specializing the hardware. Such a solution is more desirable for embedded systems, where the computations or application are unlikely to change. Naturally, this solution would also work for mixed software–hardware systems (e.g.: FPGAs) which are consequently implicitly covered by this chapter, because such a solution does not directly implement (or emulate) an object-oriented system: we assume a more conservative hardware system, thus guaranteeing that the translation would work on more dynamic hardware systems.
4.3.2
Assumption: Compiled Once
If we were to target an FPGA for our compiled system, it is possible to change the hardware on the fly, thus easing the task of implementing polymorphism. Such a solution complies with our constraint of obtaining a “pure” hardware solution, even though changing the nature of the hardware on the fly requires more logic gates (as the control logic of the FPGA cell blocks). Indeed, changing the hardware only requires more knowledge at the start of the system. This knowledge is available if we limit our discussion to such a case where the software system is known and is available as software code, ready to be transformed into hardware by an hypothetical “software to hardware” compiler. If we are to generate a solution for an ASIC (Application-Specific Integrated Circuit), we have to assume that the translation into hardware will occur once and that the nature of the hardware—unless designed for—can not be changed.
Translating Design Pattern Concepts to Hardware Concepts
99
Hence, in this chapter we shall focus the most restrictive platform type, and restrict ourselves to a one-time compilation and synthesis, no dynamic compilation or synthesis.
4.3.3
Assumption: Limited Number of Objects
We could also create several hardware modules to simulate an object-oriented software system, each module matching a class and its inheritance tree. We prefer, however, to reuse specialized modules and use them as base modules. In order for each class to be instantiated at least once in our system, we can then assume the number of module must be at least the number of leaf classes of our system. Let n be the number of classes in our system at compile time. Assuming that no new classes can be added after generation and that we have a complete binary common-rooted balanced inheritance tree, there are n+1 2 leaf classes, the minimum number of modules that must be generated in the hardware. By generating more hardware modules for a class, we bring parallelism in the system by enabling the existence and computation capability of several objects at the same time. As mentioned earlier, unless the hardware is capable of mutating into another class, once generated, the number of active class in the system at the same time is limited by the number of times the designer instantiates a specific class. This limitation means that a constant must be defined for each class, indicating the maximum number of active objects allowed in the system at the same time. Computing the maximum of number of active objects is in general impossible, because objects can be created dynamically in software systems and with no other constraints but the size of the memory. For example, a large and unknown number of objects could be created to compute a complex scene in a ray tracer. Yet, it is possible to run the software systems in a set of reasonable scenarios and obtain an insight on the maximum number of objects of its classes. Such practice is already used when developing for example software systems for cell-phones.
4.3.4
Assumption: Pattern Automatic Recognition Problem
Although design patterns are quite formally described with Gamma et al. [89], they were not meant to be defined into a computer language and parsed in an automated way. They express generic solutions to recurring design problems: they cannot be easily automatically identified in a software system. Since a design pattern can have several variants, identifying occurrences of the pattern in a software system becomes a challenge of its own, which has been tackled by the software engineering community as early as 1998 [206]. Therefore, we assume that we know explicitly which design patterns have been used to implement a software system and which classes play some roles in their occurrences.
100
4.3.5
System Level Design with .NET Technology
Translation Cost versus Performance
An optimization phase can occur to reuse part of the behavioral synthesis process. For example, the controller for the ComplexNumber class could use only one ALU, at the cost of a more complex controller module. In Figure 4.3c, we show a solution to our running example where its behavioral parts, the ALU, are duplicated. Such solution provides more parallelism but at the expense of more hardware. In Figure 4.3d, we depict an alternate solution where the behavioral part is reused for several distinct methods. The need of buses then arises and complexifies the logic circuits of the overall unit (not shown in the figure). Yet, this solution saves on hardware, at the cost of serializing the operations. This solution, in worst case, corresponds to the execution on a classic mono processor architecture. A threshold could be set by the designer generating the hardware system, indicating how many processing modules are to be generated to accelerate the overall architecture. Another part of reuse could be achieved by replacing local registers by a local memory that could hold an array of objects of the same inherited branch. Let n be the number of distinct classes in a given branch. We can then pose {i | 1 ≤ i ≤ n} to be a number identifying each class. Let xi be the number of living objects required for class type i. The minimal required memory size is then defined by Equation 4.1. n
memory size = ∑ xi × memory usage of(i)
(4.1)
i=1
For example, in the running example, the architecture could be configured to hold, in the same ComplexNumber module, x0 objects of the Number class and x1 objects of the ComplexNumber class. Such a solution means larger amounts of memory to hold more objects in the same module, at the cost of creating a execution bottleneck if the number of executing units in the module is low. Local buses of units will also be a bottleneck if the number of objects contained by a module is high. It is advisable to use a mixed approach, where one could generate several modules of the same kind, with each a small memory capable of handling several objects at the same time, to distribute the computations whenever possible while minimizing the hardware cost. The acceleration could be thus maximized, especially with massively parallel systems, where objects are relatively independent.
4.4
Design Pattern Mappings
We now describe some interesting patterns using the mappings of object-oriented concepts into hardware concepts and the constraint and assumptions described before.
Translating Design Pattern Concepts to Hardware Concepts
4.4.1
101
Creational Patterns
Creational patterns are the patterns related to the dynamic creations of objects. As explained in Section 4.3.1, we consider hardware as being more static than dynamic. Applying these kinds of dynamic patterns to “pure” hardware is challenging because of the static nature of the hardware. Therefore, we present the mapping of two characteristic creational patterns into hardware: the Prototype and Singleton design patterns. Prototype is a pattern that provides a “typical” instance that can be copied before being customized, with the help of the public method clone(). It allows a class to have a default instantiation and avoids having to call several (maybe complex) methods to initialize an object with its default values. The prototype pattern corresponds to a ROM (Read Only Memory) that can hold a block of data containing the prototype. Upon call of the clone() method, an object is allocated and the data are copied into the newly created instance, creating a new object based on the prototype. The new object will typically have to evolve in time and should be allocated in a RAM or in a register as described in Section 4.2.3 Singleton is a pattern that restricts the number of objects of a class to one. In a software system, where a class can usually be instantiated at will, the need quickly arises to ensure there is only one instance of a certain class that is shared by all other objects of the system, when a second object of the same class could cause miscomputations or crashes (e.g.: a multi-threading controller). Implementing the Singleton pattern is usually achieved by hiding the constructor from the outside world and providing a class method to obtain the unique object (e.g.: instance(), get new(), get instance(). . . ). Any other object is forced to use the class method supplied to get the unique instance of the Singleton class. The class members (and inherited) still have access to the constructor. In terms of hardware, this pattern is easily implemented by directing the synthesis to generate only one object of a class. A Boolean flag can also be included in the controller (for example the one in our inheritance example in Figure 4.2b) to check that the object has been instantiated, and if so, to raise an error with the calling entity to indicate it can not provide a new instance.
4.4.2
Structural Patterns
Structural patterns are useful to create the design of a software system. Beck [29] pointed out that design patterns generate architectures. We describe two typical structural design patterns that are useful to make objects interact seamlessly and to isolate a set of objects from the rest of the system.
102
System Level Design with .NET Technology
(a) The problem: a complex compiler system.
(b) A Fac¸ade class “Compiler” which lightens the use of the system.
FIGURE 4.4: Example of a Fac¸ade
Adapter is the software equivalent of a wrapper. The goal of the Adapter pattern is to enable the interconnection of several directly incompatible objects by delegating method calls to the appropriate (incompatible) methods either by using multiple inheritance or object composition. In terms of hardware, it is not rare to see wrappers constructed around IP blocks (Intellectual Properties blocks). As for the software pattern, wrappers may be used to enhance, break into several subcomponents, reroute, or even disable some behavior (or structure) of the component that they are “masquerading” by intercepting part of the communication with the external entity. Fac¸ade is a class that hides the complexity of a whole sub-system into a single object. The classical example of a Fac¸ade is a compiler, as shown in Figure 4.4a, where several compilation steps are implemented with several different objects of various classes, each having distinct responsibilities. Fac¸ade helps to separate a functionality and to segment the code into simpler parts that are easier to maintain. The Fac¸ade is the class “Compiler” in our example in Figure 4.4b that is inserted between the system and the external world and acts as an interface to the system. The cost of using a Fac¸ade is an extra level of indirection and the extra burden of updating the Fac¸ade when a major change occurs in the system. Moreover, the Fac¸ade is sometimes blamed for quickly getting big and may lead to an entanglement when it needs to be tightly coupled with a lot of other classes. In terms of hardware, a Fac¸ade corresponds to an interface, where a protocol is defined to access a more complex system. Usually pins are created that might correspond directly to some internal components, but the interface is usually simplified to reduce its complexity. A bus can be considered as a Fac¸ade as it usually gives access to (while caching access to) a whole complex system. It also usually provides a simple inter-
Translating Design Pattern Concepts to Hardware Concepts
(a) Generic layout of the Observer pattern.
103
(b) The Observer ⇒ ESL Configuration (Memory, Screen, Bus, CPU, DMAs. . . )
FIGURE 4.5: Example of an Observer face (rather than have the external system communicating with every other subsystem).
4.4.3
Behavioral Patterns
Finally, the last category of patterns includes patterns related to the behavior of objects at runtime. Observer is a pattern that allows a Subject to notify its Observers when some of its data change thus ensuring consistency among the Observers, as shown in Figure 4.5a. In terms of hardware, a memory can be considered as a Subject that is observed, as illustrated in Figure 4.5b. The Observers are all the different components that need to access the memory data (CPU, DMA, peripherals. . . ). The bus along with its communication protocol form the “contract” that matches the software interface using inheritance and polymorphism mechanism. We show a screen as the observer in our example.
4.5
Operational Description of Design Patterns
To operationalize our mapping between software design patterns and hardware systems, we choose ESys.NET [138, 139]. ESys.NET is a system design environment (similar to SystemC) based on C] and the corresponding .NET framework (rather than C++). Design motifs are the “Solution” parts of design patterns. They are what developers actually implement in their systems when using design patterns. The generation
104
System Level Design with .NET Technology
of design motifs for ESys.NET requires a means to describe design motifs in a form that can be manipulated by a computer to perform code synthesis into hardware. We use the Pattern and Abstract-level Description Language (PADL) as formalism to describe design patterns. We first present PADL. Then, we introduce MIP, an extension to PADL to describe more precisely the behavior of the methods declared in a motif. Finally, we show the use of PADL and MIP to generate ESys.NET code on the Observer design pattern.
4.5.1
PADL in a Nutshell
PADL is a meta-model that can be used by developers to describe design motifs and object-oriented software systems. A meta-model is essentially a set of classes whose instances represent a model. The methods of the classes in the meta-model describe the semantics of the model. Consequently, PADL provides a set of classes representing constituents of design motifs and the methods required to instantiate and link the instances together in a meaningful way. Figure 4.6 shows a UML-like class diagram representing the architectural layers of the PADL meta-model, their main packages and classes, and the design patterns used in the design. The diagram decomposes in three horizontal parts representing three different layers of services: first, CPL (Common PADL Library); then, PADL; finally, PADL ClassFile Creator, PADL AOL Creator, POM, and PADL Analyses. The first layer, CPL, provides utility classes and libraries used across PADL. The second layer, PADL, provides the meta-model to describe models of systems and motifs. The meta-model defines the interfaces (and implementation classes) of the possible constituents of motifs, for example, IDesignMotif, whose instances are motifs and IClass, whose instances describe the classes suggested by a motif. These instances are combined to describe motifs and subsets of their behaviors. The padl.kernel and padl.kernel.impl packages declare respectively the types of the constituents (as Java interfaces) and their implementations. The PADL meta-model is at the heart of the Ptidej project (Pattern Trace Identification, Detection, and Enhancement in Java) to evaluate and to enhance the quality of object-oriented software systems, promoting the use of patterns, either at the language-, design-, or architectural-levels. In particular, it has been extensively used to identify occurrences of motifs in systems, for example in [103].
4.5.2
PADL in Details
Figure 4.7 shows the classes and main methods of the constituents of the PADL meta-model. Essentially, the meta-model divides in four parts. The first part includes all the possible constituents (inheriting from Constituent) of the structure of a system or a motif. These constituents include different types of entities, Interface (interface a` la Java) and Class (classes found in C++ or Java); methods and fields; parameters. The second part adds constituents to refine a model of a system or of a motif with
CompleteClassFileCreator
Padl.creator.relationship Padl.creator Padl.creator.javacup pom.metrics Padl.creator.util
LightClassFileCreator
FIGURE 4.6: The PADL meta-model layers
Builder Design pattern
Director Padl.kernel
Padl IFileRepository
Builder
IIdiomLevelModel
IVisitor
IField
IGeneralization IAggregation
IMethod
IAssociation IComposition
Respository
Invoker
Command Design pattern
SystematicUMLAnalysis
Command
Concrete Elements
Abstract Factory Design pattern
IPatternModel
PADLAnalyses
Padl.analysis Padl.analysis.repository
Concrete Products
Factory Abstract Factory
IFactory
MethodPrimitives
Singleton Design pattern
Singleton Design pattern
Padl.kernel.impl Concrete Factory
IIdiomLevelModelCreator
Visitor
Product
ClassPrimitives
Operators
Padl.creator.jlex
Concrete Builder
Concrete Builders
PADL
Metrics
AOLCreator
pom.primitives
pom.operators
Abstract Products
Padl.patterns BehavioralPatternModel CreationalPatternModel
Elements
StructuralPatternModel
Visitor Design pattern
Prototype Concrete Prototype
Prototype Design pattern
Padl.pattern.repository Padl.util ChainOfResponsibility Composite Facade
Visitor
Memento
FactoryMethod Mediator
Proxy
Observer
Concrete Visitor
ConsituentRepository DefaultFileRepository ModelStatistics PatternRepository VisitorRepository
Padl.util
Translating Design Pattern Concepts to Hardware Concepts
Padl.creator
POM
PADL AOL Creator
PADL ClassFile Creator
JavaGenerator ptidejSolverDomainGenerator ptidejSolverConstraintGenerator
util PropertyManager
util.awt
util.io
NameDialog
SubtypeLoader
WindowCloser
util.lang ClassLoader Modifier
105
106
System Level Design with .NET Technology
a comprehensive set of binary class relationships. These relationships are important because the interaction among classes and their objects in design motifs are often described in terms of such relationships. The relationships include, from less constraining to the more constraining, the Use, Association, Aggregation, and Composition relationships [102]. The Creation relationship is also available to describe that objects of a class instantiate objects of another class. The third part includes the constituents specific to the descriptions of design motifs. A design motif DesignMotif is described in terms of its participating classes Participants which could be played by classes (ClassParticipant) or interfaces (InterfaceParticipant). Any participant can declare elements as defined in the part one and two of the meta-model. Finally, the fourth part includes the constituents specific to the description of a ProgramModel and its possible set of MicroArchitectures that are the concrete manifestations of a DesignMotif. A micro-architecture knows which of its constituents plays which role in a DesignMotif. We use the Abstract Factory design pattern to manage the concrete instantiation of the constituents of PADL. The concrete factory, class Factory, implements the Singleton design pattern. We use the Builder design pattern to let the parsers choose the constituents to instantiate, through the Builder class. We use the Visitor design pattern to offer a standard mean to iterate over a model or a subset of a model, the padl.visitor package provides default visitors. The padl.pattern and padl.pattern.repository packages define several prototypical models of well-known design motifs, which we can clone and parameterize, using the Prototype design pattern. The third layer contains several separate projects: • Parsers for Java class-files and AOL files (PADL Java and AOL Creator). These parsers are independent of the meta-model and new parsers for other programming languages can be added seamlessly using the Builder design pattern. • A metric computation framework (POM), in which we use the Singleton design pattern. POM decomposes in a set of primitives defined in terms of the meta-model constituents. These primitives are combined using set operators to define metrics. • A repository of analyses based on the meta-model, in which we use a simpler version of the Command design pattern. An analysis is invoked on a model of a software system or of a pattern and returns a (potentially modified) model when the analysis is done. Reflection is used by the repository to build the list of available analyses dynamically.
4.5.3
PADL by Examples
PADL has been used to develop a library of design motifs from the 23 design patterns by Gamma et al. [89], including Chain of Responsibility, Composite, Observer,
Micro-Architecture Related Part of the Meta-Model
participatingEntities
Element
FIGURE 4.7: The PADL meta-model
Attribute
ContainedElements
Role role
Entity
MicroArchitecture
specialize
targetEntity
Relation
Interface
ContainedArchitectures
Class
ProgramModel
Implements
Use
Association
Creation
Aggregation Composition
Idiom-level Related Part of the Meta-Model
Field
Method
1 Source Code Meta-model (Fondation for the description and traceability of design motifs and micro-architectures)
4
Parameter
2
To Element class containedElement
Participants
DesignMotifModel specialize
From Relation class targetParticipant
Translating Design Pattern Concepts to Hardware Concepts
IConstituent
InterfaceParticipant
classParticipant implements
Design-motif Related Part of the Meta-Model
3
107
108 1 2
System Level Design with .NET Technology
p u b l i c c l a s s O b s e r v e r e x t e n d s B e h a v i o u r a l M o t i f M o d e l implements PropertyChangeListener , Cloneable {
3 4 5 6 7
private private private private
IClass subject , concreteSubject ; IInterface observer ; IDelegatingMethod n o t i f y ; IMethod u p d a t e , g e t S t a t e ;
8 9 10
p u b l i c O b s e r v e r ( ) throws C l o n e N o t S u p p o r t e d E x c e p t i o n , ModelDeclarationException {
11 12 13
super ( ” Observer ” ) ; this . setFactory ( Factory . getInstance ( ) ) ;
14 15 16 17 18 19 20 21 22
/ / I n t e r f a c e Observer t h i s . observer = t h i s . getFactory ( ) . c r e a t e I n t e r f a c e ( ” Observer ” ) ; t h i s . u p d a t e = t h i s . g e t F a c t o r y ( ) . createMethod ( ” Update ” ) ; this . observer . addConstituent ( this . update ) ; this . observer . setPurpose ( MultilingualManager . getString ( ” Observer PURPOSE ” , Observer . c l a s s ) ) ; this . addConstituent ( this . observer ) ;
Listing 4.5: The Observer design motif using the PADL meta-model: declaration of the Observer role.
Visitor. . . . For example, we show with the code of Listing 4.5 the Observer design motif using the PADL meta-model. The following PADL code systematically instantiates constituents of the meta-model according to the motif as suggested by Gamma et al., see Figure 4.8. We show in Listing 4.5 the declaration of the Observer design motif, as a class
Observer
Subject +Attach(Observer:in) +Dettach(Observer:in) +Notify()
Observer
+Update ()
for(this.NotifyObserver)
Concrete Subject
+SubjectState +GetState()
return SubjectState
Subject
Concrete Observer +OberverState +Update()
ObserverState=subject.GetState()
FIGURE 4.8: The Observer design motif (from [89])
Translating Design Pattern Concepts to Hardware Concepts
109
Observer. The motif declares an interface Observer that plays the role of Observer in the motif. The interface is built using a Factory. In Listing 4.6 we show the declaration of the Subject role as a Subject as a class. This class is abstract and is associated, using an embedded aggregation ContainerAggregation, to the previously declared Observer class. The Subject class also declares a Notify methods that delegates its call, through the aggregation, to all the subject’s observers. Listing 4.7 illustrates the declaration of the role of Concrete Subject as a class ConcreteSubject that declares a method getState. The Concrete Subjects inherits from the subject and assumes all its interface. Finally, Listing 4.8 shows the declaration of the role Concrete Observer as a class ConcreteObserver. This class is associated to the concrete subjects through another aggregation. It declares an update method that is being called by the concrete subject notify method when appropriate and that fetches the Concrete Subject’s changes through a call to its getState method. An instance of the Observer class is an instance of the Observer design motif, which can then be parameterized to fit a given implementation. This parameterized instance can be used to identify occurrences of the motif in a system or to generate source code.
4.5.4
MIP
The PADL meta-model has been extended with additional constituents to describe the inner working of the methods of systems and motifs. This extension to the
24 25 26 27 28 29
/ / Association observers f i n a l I C o n t a i n e r A g g r e g a t i o n anAssoc = this . getFactory ( ) . createContainerAggregationRelationship ( ” observers ” , this . observer , C o n s t a n t s . CARDINALITY MANY ) ;
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
/ / Classe Subject this . subject = this . getFactory ( ) . createClass ( ” Subject ” ) ; t h i s . s u b j e c t . s e t A b s t r a c t ( true ) ; t h i s . s u b j e c t . a d d C o n s t i t u e n t ( anAssoc ) ; this . notify = this . getFactory ( ) . createDelegatingMethod ( ” Notify ” , anAssoc , this . update ) ; this . subject . addConstituent ( this . notify ); this . subject . assumeAllInterfaces ( ) ; this . subject . setPurpose ( MultilingualManager . getString ( ” Subject PURPOSE ” , Observer . c l a s s ) ) ; this . addConstituent ( this . subject );
Listing 4.6: The Observer design motif using the PADL meta-model: declaration of the Subject role.
110
System Level Design with .NET Technology / / Classe Concrete Subject t h is . g e t S t a t e = t h is . getFactory ( ) . createMethod ( ” g e t S t a t e ” ) ; this . concreteSubject = this . getFactory ( ) . createClass ( ” ConcreteSubject ” ) ; this . concreteSubject . addInheritedEntity ( this . subject ); this . concreteSubject . setPurpose ( MultilingualManager . getString ( ” ConcreteSubject CLASS PURPOSE ” , Observer . c l a s s ) ) ; this . concreteSubject . addConstituent ( this . getState ); this . concreteSubject . assumeAllInterfaces ( ) ; this . addConstituent ( this . concreteSubject );
47 48 49 50 51 52 53 54 55 56
Listing 4.7: The Observer design motif using the PADL meta-model: declaration of the Concrete Subject role.
f i n a l I C o n t a i n e r A g g r e g a t i o n a2Assoc = this . getFactory ( ) . createContainerAggregationRelationship ( ” subject ” , this . concreteSubject , C o n s t a n t s . CARDINALITY ONE ) ;
59 60 61 62 63 64
/ / Classe Concrete Observer this . notify = this . getFactory ( ) . createDelegatingMethod ( ” Update ” , a2Assoc , this . getState ); t h i s . n o t i f y . setComment ( M u l t i l i n g u a l M a n a g e r . g e t S t r i n g ( ”DELEG METHOD COMMENT” , Observer . c l a s s ) ) ; this . n o t i f y . attachTo ( this . update ) ; this . concreteSubject = this . getFactory ( ) . createClass ( ” ConcreteObserver ” ) ; this . concreteSubject . setPurpose ( MultilingualManager . getString ( ” ConcreteObserver CLASS PURPOSE ” , Observer . c l a s s ) ) ; this . concreteSubject . addImplementedEntity ( this . observer ) ; t h i s . c o n c r e t e S u b j e c t . a d d C o n s t i t u e n t ( a2Assoc ) ; this . concreteSubject . addConstituent ( this . notify ); this . concreteSubject . assumeAllInterfaces ( ) ; this . addConstituent ( this . concreteSubject );
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
}
84 85
}
Listing 4.8: The Observer design motif using the PADL meta-model: declaration of the Concrete Observer role.
meta-model, called MIP, is necessary to describe the behavior of design motifs more precisely than with PADL alone.
Translating Design Pattern Concepts to Hardware Concepts 1 2 3 4 5 6 7 8 9 10
111
IBlock block = StatementFactory . g e t S t a t e I n s t a n c e ( ) . createBlock ( ) ; t h i s . n o t i f y . addConstituent ( block ) ; IIterator iterative = StatementFactory . g e t S t a t e I n s t a n c e ( ) . c r e a t e I t e r a t o r S ( this . update ) ; block . addConstituent ( i t e r a t i v e ) ; IMethodInvocation invocation = Factory . getInstance ( ) . createMethodInvocation (2 , 1 , 1 , this . subject ) ; invoc . addCallingField ( t h i s . observer ) ; invoc . setCalledMethod ( t h i s . update ) ; i t e r a t i v e . addConstituent ( invocation );
Listing 4.9: Extending the description of the design motif Observer using PADL extended with MIP. MIP proposes new constituents implementing the interface IConstituentOfMethods to describe the various statements that can be used to define the behavior of methods. This set includes: IMethodInvocation, IParameter, IConditional, IInstantiation, IAssignment. Figure 4.9 shows the extension of the PADL meta-model with MIP. Essentially, the PADL meta-model was refactored to distinguish constituents of methods using the interface IConstituentOfMethods. The MIP extension provides a set of such constituents of methods. This set is sufficient to describe several behavioural and creational design motifs more precisely than with PADL alone. For example, using PADL extended with MIP, the description of the Observer design motif would be extended with the code shown in Listing 4.9 code: This code describes in more details the behavior of the notify method. Thus, with MIP, it is possible to describe completely the structure and the behavior of behavioral, creational, and structural design motifs.
4.5.5
ESys.NET Code Generation
The PADL meta-model provides an implementation of the Visitor design pattern that allows any client to write visitor to traverse the constituents of a model. We implement such a visitor to generate ESys.NET code from the extended models of design motifs.
4.6 4.6.1
Related Work & Background Object Oriented Synthesis & Patterns in Hardware
The synthesis of complex C structures has been discussed by [181] and they claim at the end of the article that their methodology can be applied for more complex C++ structures. Some hardware designs for Object Oriented paradigms have been put forward, especially an Object Oriented processor by [124]. They discuss an interesting hardware
IConstituentOfModel
IConstituentOfEntity
Contains
ClasseMIP
Common classes with PADL
ClassePADL
IConstituentOfMethod
Contains
IStatement
IEntity IElement
+addActor(IConstituent)
IInterface
IClass
Implements
+GetField(): String +SetifBlock(): IBlock +SeteleseBlock(): IBlock +Getcondition(): String +addActor(IConstituentOfMethod)
IAttribute IMethodInvocation IParameter
IField
+getType(): Entity +setType(Entity)
IAbstractMethod
IMIPMethod
IMIPFactory +createIterator(): IIterator +createConditionals(): IConditional +createBlock(): IBlock +createOutput(): IOutput
+SetReturnValue(): String
IMethod
+addParameter(Parameter) +SetReturnType(Entity) +getReturnType(): Entity
IMethodInvocation IIterator
IConditional
IParameter IInstanciation IAffectation IOutput IBlock
System Level Design with .NET Technology
FIGURE 4.9: The MIP extension to the PADL meta-model
+getName(): String +isStatic(): boolean +isPrivate(): boolean +isPublic(): boolean +isProtected(): boolean
Classes added to PADL
112
IConstituent
Translating Design Pattern Concepts to Hardware Concepts
113
object allocation strategy, although their approach analysis was limited to a global shared memory. Some patterns were used for hardware modeling as in [64].
4.6.2
Original Patterns
Original Design Patterns were introduced by [89]. Design Patterns express structured and elegant solutions (based on the experience of software engineers) applied to object oriented commonly encountered problems. Design Patterns are sometimes critiqued for a lack of coherency in their interrelations, and blamed for degradation of performance by rising overall design complexity. Despite these disputed drawbacks, they bring other interesting benefits such as: • clarification of object responsibilities, • reduced class couplings, • enhance code genericity, • augment reusability of classes and algorithms. . . . Patterns are classified under three major groups: Creational Patterns are solving problems related to class instantiations. Usually, each given object knows how to instantiate itself. With these patterns, the instantiation responsibility is often delegated to other classes. The creation of complex objects is more structured and more flexible. Structural Patterns are solving problems related to class structures and interrelations. They help creating more dynamic and flexible class constructions. Behavioral Patterns are solving problems related to class functionality. Usually, a class contains the implementation of the functionality of each of its instances. Behavioral patterns help to isolate object comportment from the class definition, bringing a more flexible approach.
4.7
Conclusion
We discussed relations and matches between some of the Design Pattern in a software form, and their various correspondence in hardware. With the help of such things as the Pipeline pattern, we showed that Design Patterns are not only software specific, but are already present in the domain and should be better outlined. We presented a specialized object system which can be implemented in “pure” hardware in order to reproduce the behavior of a generic object system running on
114
System Level Design with .NET Technology
a processor. We also discussed how every Object-Oriented aspect can be integrated into hardware, using our object system as examples. We introduced ESys.NET, a new System Design platform based on C], along with PADL, a Design Pattern framework into which Patterns can be defined and used in order to generate code. Future area of interest is to further develop the object-oriented system in order to implement a full scale prototype on an FPGA. The system design community needs to gather the experience they collectively possess into a hardware focused pattern catalog in order to stop reinventing the wheel, and drive the reuse of well known and proofed solutions. This chapter is a first step in the right direction, but only with the help of a thriving community will we succeed in building a strong collaborative tool based on Design Patterns.
Part II
Simulation and Validation
5 Using Transaction-Based Models for System Design and Simulation Amine Anane Universit´e de Montr´eal, Montr´eal - Canada El Mostapha Aboulhamid Universit´e de Montr´eal, Montr´eal - Canada Julie Vachon Universit´e de Montr´eal, Montr´eal - Canada Yvon Savaria Ecole Polytechnique de Montr´eal - Canada 5.1 5.2 5.3 5.4 5.5 5.6
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transaction Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . STM Implementation Using .NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
117 119 122 133 150 153
Introduction
One of the main areas that ITRS addresses is the Design Technology (DT) that “enables the design, implementation, and validation of microelectronics-based systems. Elements of DT include tools, libraries, manufacturing process characterizations, and methodologies.” They consider that “Cost (of design) is the greatest threat to continuation of the semiconductor roadmap.” This leads to what is called a design productivity gap where the number of available transistors grows faster than the actual ability to effectively design them. According to ITRS 2007, manufacturing cycle times are measured in weeks, while design and verification cycle times are measured in months or years. This design productivity gap is mainly due to the
117
118
System Level Design with .NET Technology
present verification methods which necessitate considerable time and numerous engineers in order to produce a system of reasonable quality. Consequently, ITRS has identified the key challenges for verification in the near and longer term. Here follows the integral text taken from [120]: “In the near term, the primary issues are centered on making formal and semi-formal verification techniques more reliable and controllable. In particular, major advances in the capacity and robustness of formal verification tools are needed, as well as meaningful metrics for the quality of verification. In the longer term, issues focus mainly on raising the level of abstraction and broadening the scope of formal verification. These longer-term issues are actually relevant now, although they have not reached the same level of crisis as other near-term challenges. In general, all of the verification challenges apply to SOC” To take up those challenges, we propose to use a novel semi-formal approach based on transaction models. The merit of this approach is that it provides an easy and reliable abstraction to describe parallel components while being expressive enough to describe systems at different levels of abstraction. It possesses a rigorous formal semantics amenable to semi-formal verification and incremental refinement, and can thus be used to produce systems proven correct-by-construction. In this chapter, we will focus on a specific aspect of verification that is based on simulation. While formal verification has its virtues, simulation remains one of the most popular techniques to verify systems. This remains true despite the fact that simulation performance often constitutes a big hurdle. Consequently, we will show how we can take advantage of the transaction-based model to overcome some shortcoming in the semantics of SystemC (cited by ITRS as one of the most prominent system modeling languages) in order to allow a meaningful simulation acceleration using parallel architectures. Our modeling methodology proposes to use software engineering techniques, such as Design Patterns, Aspects Oriented Programming, and Software Frameworks, to facilitate introspection and exploitation of metadata. Using these techniques, we propose to develop high-level reusable models to describe heterogeneous systems composed of both software and hardware components. We also plan to implement different simulation algorithms and techniques which can be adjusted to boost the simulation performance of the transaction-based model under verification. Section 5.2 will further motivate this work by showing the inadequacies of SystemC for parallel simulation as well as system level untimed modeling. Section 5.3 introduces the transaction-based model allowing us to get rid of the sequential nature of SystemC. Section 5.4 shows how we can implement a Software Transactional Memory (STM) using the .NET framework in order to simulate a Transaction-based model. Experimental results and comparison with SystemC are given in Section 5.5, while Section 5.6 concludes this work.
Using Transaction-Based Models for System Design and Simulation
5.2
119
Motivations
SystemC is designed to provide a modeling framework for hybrid systems composed both of hardware and software components described at a level of abstraction above the Register Transfer Level (RTL). SystemC uses a generic execution model based on an event-driven simulation kernel. This kernel allows the execution and the scheduling of a set of concurrent processes that react in response to event notifications. SystemC provides three types of event notifications: immediate notification, delta notification and timed notification. Figure 5.1 shows the main phases executed by the SystemC scheduler.
Initialization
Evaluation
Processes activated due to immediate notifications
Update
Processes activated due to delta notifications
Advance Time
Processes activated due to timed notifications
FIGURE 5.1: SystemC scheduler execution phases During the Initialization phase, each process is executed once: each one thus runs until it returns or until it suspends itself by calling the wait function. Then, the scheduler starts the evaluation phase by executing sequentially the ready processes due to notification from the initialization phase. An executing process may generate immediate notifications resulting in the addition of processes, sensitive to those events, to the list of processes ready to run. Once all the processes are executed, the SystemC scheduler executes the update phase. The update phase is used to implement the request-update semantics according to which the modification of a system state is delayed until the end of the evaluation phase. The combination of the evaluation phase and the update phase forms a delta cycle (or delta delay). So, in the end of
120
System Level Design with .NET Technology
the current delta-delay, the scheduler checks if there are pending delta notifications. If so, all the processes waiting for those delta notifications are put in the list of processes ready to run and the scheduler starts a new delta cycle by executing again the evaluation phase. When there are no more delta notifications, the scheduler advances time to the earliest pending timed notification and determines which processes are to be scheduled at the current time. It then repeats the execution of the three phases with a new simulation time. In the following we present what we consider to be shortcomings of SystemC for untimed modeling. Implementation dependent semantic and deadlocks: At first, we consider the nondeterminism imposed by the SystemC specification which prevents a designer to rely on the order of execution of the processes during the evaluation phase, since that order is unspecified. However, for a given implementation and a given stimulus, we always get the same order of execution of the processes. Consequently, some specifications may execute correctly using one particular implementation of SystemC but fail when another implementation is used[109]. Figure 5.2 illustrates the problem: (a) if the Writer process executes first, the notification of the write event is lost because no process is waiting for the notification and this results in a deadlock. However, (b) if the Reader process executes first, it waits for a write event and then when the Writer process executes, the notification awakes the Reader process and the read-write loop is launched. Given an implementation of SystemC, the designer can decide in which order to put line 1 and line 2 (Figure 5.2) to avoid the deadlock. This situation is error prone and implementation-dependent and it could mask errors that do not appear in the simulation environment but that may happen when that description is refined or mapped to hardware at a later stage of the design process.
Main { Writer; // line 1 Reader; // line 2} void reader(){ void writer(){ while(true){ while(true){ wait(write_event); compute_write_data read_process_data; write_event.notify(); read_event.notify(); wait(read_event); } } } }
FIGURE 5.2: Untimed SystemC model using immediate notification To eliminate the undesired behavior caused by the non-determinism introduced by using the immediate notifications, most of standard SystemC channels, including sc fifo, use the request-update semantics and the concept of delta cycle described above[15]. For instance, if we replace the two immediate notifications, in the
Using Transaction-Based Models for System Design and Simulation
121
example of Figure 5.2 by delta notifications, i.e., “notify (SC ZERO TIME),” we get a deterministic behavior independent of the order of execution of the processes. However, such implicit synchronizations imposed in each delta cycle are more restrictive than an effective synchronization of concurrent processes in a multi-threaded software application or an asynchronous circuit[131]. Co-routine semantics and bound buffers:SystemC execution model behaves as described in [15]: “Since process instances execute without interruption, only a single process instance can be running at any one time, and no other process instance can execute until the currently executing process instance has yielded control to the kernel. A process shall not pre-empt or interrupt the execution of another process. This is known as co-routine semantics or co-operative multitasking” However it is far from representing software concurrent processes or even the 1987 VHDL standard where processes have no such semantics but are true concurrent processes. In Figure 5.3 the producer will execute until it encounters a wait statement. This happens only when the FIFO is full, despite the fact that at each write to the FIFO, a notification is sent. This is due to the fact that when the write is complete, it returns to the process that will continue its execution. Once the producer is suspended, the consumer resumes execution until the FIFO becomes empty. Despite this behavior departure from a real execution of concurrent processes, the co-routine semantics prevents us from modeling true Kahn Process Networks (KPNs), where the size of the FIFO channels are unbounded and resized depending on the workload.
void producer(){ while (1){ Compute_Value(); fifo.write (value); } }
void consumer() { while (1){ int value=fifo.read(); Process_Value(); } }
FIGURE 5.3: A producer and a consumer communicating using an sc fifo
Starvation and unfairness: Suppose we use a FIFO channel with multiple producers and multiple consumers (Figure 5.3). Only the first and the last consumer and producer, executed in an order determined by the SystemC implementation, are effectively reading from the FIFO. All the other consumers and producers behave as if they were always suspended. False confidence: The co-routine semantics enables a designer to write a SystemC model at a high level of abstraction without caring about the synchronization and coordination between the concurrent processes in accessing common resources. In
122
System Level Design with .NET Technology
Figure 5.4, we notice that the variables num elements and first are not protected; however this protection will be needed during refinement or parallel simulation on a multicore machine. This also leads to a lot of difficulties in modeling software processes or creating execution platform models, representing RTOS architectures. This inconvenience leads to use of a different simulation engine for the software processes running in parallel with the simulation kernel of SystemC[69].
// SystemC Implementation Using notification void write(char c) { while (num_elements == max) wait(read_event); data[(first + num_elements) % max] = c; ++ num_elements; write_event.notify(); } FIGURE 5.4: A blocking FIFO Write using SystemC
Finally, the co-routine semantics imposes a sequential execution of the processes. It therefore prevents us from taking advantage of a multi-processor machine.
5.3
Transaction Model
The transaction concept is a powerful model used to describe in a simple way the coherent execution of simultaneaous process activities in concurrent systems. Indeed, a transaction allows encapsulating a set of operations which therefore behaves as a single atomic operation. In this way, a microelectronic system functionality can be modeled as a set of transactions, where each transaction can be dealt with in isolation (from other transactions). Hence, the designer does not need anymore to worry about shared resources consistency and the coordination of concurrent modules accessing them, since all these matters are being automatically taken care of by the transaction manager implemented by the underlying simulator. In order to simulate a transaction-based model, we need to implement a Software Transactional Memory. A STM model consists of a set of processes which communicate through a shared memory by executing transactions. In this section, we begin by presenting the correctness criteria used to validate the parallel execution of transactions. Then, we give an overview of the different techniques and policies used in the implementation of a STM. Finally, we show some implementation examples of a STM.
Using Transaction-Based Models for System Design and Simulation
5.3.1
123
STM Concurrent Execution
The simplest execution model we can think of to preserve the atomicity of transactions is to run them sequentially. Each transaction is executed separately until termination before starting another one. Thus, each transaction execution can be seen as instantaneous with no observable intermediate states, thus preserving the atomicity of each transaction. Implementing an STM by executing sequentially the transactions may give poor performance and neglects taking advantage of available multicore machines. Consequently, we need to consider a concurrent execution where the operations inside the transactions can be interleaved. However, in order to be able to execute the transactions concurrently, we need to disallow some potentially dangerous situations threatening atomiciy. For instance, Figure 5.5 presents an example[198] illustrating the lost-update problem due to a concurrent execution of the two transactions t1 and t2 .
Time t1 t2 /* x = 100 */ 1 r(x) 2 r(x) 3 /* update x := x + 30 */ 4 /* update x := x + 20 */ 5 w(x) /* x = 130 */ 6 w(x) /* x = 120 */ ↑ update “lost” FIGURE 5.5: Lost-update problem example[198] Initially the shared variable x has the value 100. The two transactions read the same value 100 of x. t1 updates this value and thus writes the value 130 into x at time 5. At time 6, t2 updates x with the value 120. At this point, the update executed by t1 is lost and consequently the atomicity of the execution is violated. Indeed, a serial execution of the two transactions would have terminated with x having value 150. Another problem which can arise during a concurrent execution is the inconsistentread problem. Figure 5.6 shows an example [198] illustrating this situation. Suppose that two shared variables represent two bank accounts containing 50$. t1 intends to calculate the sum of the two accounts while t2 does a transfer between the two accounts. At time 3, t2 updates x by withdrawing 10$ from it. In the meantime, t1 reads x and y, does the sum (40$ + 50$) and outputs the value 90. In this case we are in the situation of inconsistent read: the sum of the two accounts should have been 100$ (not 90$) since no money has been globally withdrawn from the two accounts.
124
System Level Design with .NET Technology
Time t1 t2 /* x = 50 and y = 50 */ 1 r(x) 2 /* update x := x − 10 */ 3 w(x) 4 /* sum := 0 */ 5 r(x) 6 r(y) 7 /* sum := sum + x */ 8 /* sum := sum + y */ /* sum = 90 */ 9 ↑ r(y) 10 inconsistent-read /* y := y + 10 */ 11 w(y) FIGURE 5.6: Inconsistent-read problem example [198] To avoid the lost-update problem and the inconsistent-read problem, and hence preserve the atomicity of the transactions, a common alternative suggests to enforce concurrent transaction executions to satisfy the conflict serializability property (or simply serializability). Serializability is defined as follows: DEFINITION 5.1 Conflicting Operations Two operations are conflicting if: 1. they belong to different transactions, 2. at least one of them is a write operation, 3. and they access the same shared variable. DEFINITION 5.2 Serializability A concurrent execution is serializable, or precisely conflict serializable, if the serializability graph, built as follows, is acyclic. 1. Each transaction Ti corresponds to a vertex Ti in the graph. 2. An arc exists from a vertex Ti to a vertex T j if there exist at least two conflicting operations O1 and O2 such that (a) O1 belongs to Ti and O2 belongs to T j (b) and O1 precedes O2 Given the partial order defined by the serializability graph over the set of transactions, we can generate total orderings of those transactions corresponding to equiva-
Using Transaction-Based Models for System Design and Simulation
125
lent serial executions. According to the serializability principle, a concurrent execution of the transactions should have the same effect as some serial execution [198]. If a cycle is found in the graph, no equivalent possible serial execution of the transactions can be found, and thus, the concurrent execution of the transaction is said not to be serializable. For example, Table 5.1 shows three transactions with their read and write operations.
TABLE 5.1: Example of three actions with their read and write operations A1 X1R : Read X Y1R : Read Y
A2 A3 Y2R : Read Y Y3R : Read Y X2W : Write X Y3W : Write Y
The pairs of conflicting operations are: • (Y1R ,Y3W ) • (Y2R ,Y3W ) • (X1R ,X2W ) The Table 5.2 describes a possible interleaving of the operations of the three transactions obtained from a concurrent execution. From this interleaving we obtain the serializability graph of Figure 5.7.
TABLE 5.2: A serializable interleaving of the operations time 1 2 3 4 5 6
T1 X1R
Y1R
T2 Y2R X2W
T3 Y3R Y3W
The serializability graph is acyclic. So, the concurrent execution of those transactions is serializable. This execution is equivalent to the serial execution T1 < T2 < T3 .
126
System Level Design with .NET Technology
X1R < X2W
T1
T2
Y1R < Y3W
Y2R < Y3W
T3
FIGURE 5.7: Acyclic serializability graph
We notice that the above concurrent execution of transactions (Table 5.2) only requires 6 units of time. If we consider that each transaction needs 6 units of time to complete, a serial execution is therefore three times less efficient than a concurrent one. Let us now switch the operations Y1R and Y3W between time 5 and 6 as shown in Table 5.3. In this case the execution is no more serializable since the corresponding serializability graph (Figure 5.8) is cyclic. Several STM implementations and scheduling strategies were studied in order to produce serializable executions for concurrently executing transactions. The following section gives an overview of the different techniques proposed.
TABLE 5.3: A nonserializable interleaving of the operations time 1 2 3 4 5 6
T1 X1R
T2 Y2R X2W
Y1R
T3
Y3R Y3W
Using Transaction-Based Models for System Design and Simulation
127
X1R < X2W
T1
T2
Y3W < Y1R T3
Y2R < Y3W
FIGURE 5.8: Cyclic serializability graph
5.3.2
STM Implementation Techniques
The main approaches used by the different STM implementations consider two types of concurrency control: optimistic concurrency control and pessimistic concurrency control[141]. According to the optimistic concurrency control approach, transactions execute freely while the STM manager assumes the global execution to be serializable. It is only at transaction commit time that the manager looks at the generated scheduling and can detect if it is not serializable. In this case, it must choose one or several transactions which do not satisfy the property of serializability, abort their execution by rolling back all the changes already done, and resume their execution later. Contrary to the optimistic approach, the pessimistic approach does not allow unserializable executions to occur. It requires each transaction to obtain an exclusive access to the shared memory before it can access it. The two concurrency control approaches were implemented using several different techniques. [141] and [25] outline the various key techniques employed as well as the advantage and the disadvantage of each of them. [142] gives an overall picture of the various STM implementations suggested in the literature. Some STM implementations provide simple instructions like “atomic ” so that the programmer can isolate a set of instructions which must be carried out in a single transaction. However, the transactions alone provide little assistance to synchronize several concurrent tasks. Indeed, let us take the example of a producer writing in a FIFO and a consumer reading from it. We can isolate the reading part and the writing part in two different transactions so that accesses to the FIFO do not interfere. However, when the FIFO is full, the producer cannot continue and is forced to execute the transaction later. The consumer must behave in a similar manner when the FIFO is empty. Without synchronization mechanisms, the producer must wait ac-
128
System Level Design with .NET Technology
tively by continuously consulting the state of the FIFO until it is not full anymore. The consumer is also obliged to carry out a similar polling strategy. To avoid this costly busy wait, it is necessary to suspend the producer and the consumer until the condition on the state of the FIFO is satisfied. A possible solution is therefore to assign a guard to each transaction in order to determine the moment at which to wake it up to continue its execution. The STM manager has the responsibility to resume a suspended transaction once its guard becomes true. The idea of synchronizing a parallel program by using guards is not new. For example [145] suggests the instruction “await (condition) ” in order to block a transaction until the condition is true. In [116] the author introduces the concept of Conditional Critical Region where a region is guarded by a Boolean condition. Nevertheless, the implementation of a STM [106, 107, 67] including synchronizing guards remains difficult, as it is not enough to guarantee execution serializability. Indeed, the STM must also manage the various synchronizations between concurrent transactions and determine when to resume the execution of a suspended transaction. The authors of [106] present a Java solution to this problem by extending the language with an instruction “atomic(condition) .” For example, a method that reads a value of a FIFO can be programmed as follows: 1 2 3 4 5
Item get(){ atomic (n_items > 0){ ...remove item... } }
It is also possible to compose two transactions. For example, the following code fragment allows to retrieve two consecutive elements from the FIFO: 1 2 3 4
atomic(){ item1 = get(); item2 = get(); }
Another STM implementation in Haskell was proposed by [107]. Two instructions retry and orElse are provided to be able to wait on a condition and to choose among two transactions to execute. The retry instruction suspends the transaction in progress and aborts it. It will be started again if one of the variables, read between the beginning of the last execution and the call to the retry method, was modified by another transaction. The same technique is used in [106], except that the programmer must formulate explicit suspension requests, using a retry instruction, instead of associating guards with transactions. The instruction orElse makes it possible to compose two transactions by choosing a second transaction if the first is aborted. The following section presents three different implementations of STM. Although they all rely on a pessimistic concurrency control mechanism, these implementations differ with regard to the algorithm and the metadata structures they use.
Using Transaction-Based Models for System Design and Simulation
5.3.3
129
STM Implementation Examples
In this section we present three implementations of a Software Transactional Memory and discuss the key differences between them. The first two implementations, named TMem and OFree, are from SXM [104] and are thus running on the .NET Framework. The third STM manager, which we called TwoPL, uses a strong twophases locking protocol to manage access to shared objects. Each STM manager has to process at least five kinds of events when executing a transaction-based program: • Begin transaction: The program enters a new block of instructions to be executed inside a transaction. The STM manager does the necessary operations to prepare the transaction. Being intended to handle nested transactions, SXM must not only create a new transaction but also record its parent transaction if it exists. • Read access: The program requests to read a value from a Transactional Object. • Write access: The program requests to write a value into a Transactional Object. • End Transaction: The program has finished the current transaction and requests its effects to be made permanent by committing the transaction. • Retry: The program cannot progress anymore and needs to be aborted and executed again. For instance, when a write transaction tries to put an element into a FIFO when it is full, it has no choice to wait until a slot becomes free. Figure 5.9 shows the flowchart of the algorithm used by SXM when dealing respectively with read/write access, end transaction, and retry events. Although relying on a common conflict resolution algorithm (c.f. [104] for details), OFree and TMem differ considerably with regard to the set of data structures they used and how their updates are performed. For example, the OFree implementation uses a different update policy. That is, it creates a clone of the object to be modified and all the modifications are done on this clone. It is only when the transaction commits, that the updates’ effects become visible for the others’ transactions. On the contrary, the TMem implementation uses a direct update policy. That is, each modification is directly applied on the transactional object. This means that old values of modified objects must be saved if one wants to later be able to recover those values when an abort occurs. Since it requires managing different clones, the differed update policy induces a more important runtime overhead that its direct update counterpart. However, OFree implements an obstruction-free synchronization mechanism which ensures that no thread is blocked due to delays or failures of other threads [112]. Yet, this is not the case of Tmem since it locks the object during the whole access duration. So, if the locking thread gets suddenly blocked during this access, the other threads, that need to access the locked object, cannot progress.
130
System Level Design with .NET Technology Read Access End Transaction Already Aborted
Yes Abort and Retry
No
Already Aborted
Conflict with Writer
Yes
Resolve Conflict
Yes Abort and Retry
No
No
Commit
Do Read
Write Access
Yes
Abort and Retry
Retry
Already Aborted
Sleep until timeout
No
Conflict With a Reader or Writer
No
Yes
Resolve Conflict
Aborted Yes
No Do Write
Abort and Retry
FIGURE 5.9: SXM algorithm flowchart
The third transaction manager implementation, TwoPL, uses a strong two-phase locking protocol [198] to manage access to shared ressources. Figure 5.10 shows the flowchart of the algorithm used by this implementation. The two-phase locking algorithm associates each shared object with a lock that can either be a write-exclusive lock or a shared-read lock. Each transaction executes
Using Transaction-Based Models for System Design and Simulation
131
End Transaction Retry
Read/Write Access Unlock All Objects Lock Object
Log that the current transaction is waiting on each object already read
Resume each transaction waiting on an object modified by the current transaction Do Read/Write Commit
Sleep until resumed by another transaction
FIGURE 5.10: TwoPL algorithm flowchart the two following phases: 1. During the first phase, the transaction can acquire locks but cannot release them. 2. During the second phase, the transaction releases its locks but cannot acquire new ones. The two-phase locking rules restrict lock acquisition and release in order to guarantee the property of serializability. However, the protocol can reach undesirable situations leading to cascading aborts. For instance, suppose a transaction t1 writes a value into a shared variable which is then read by some transaction t2 . If t1 aborts then t2 must also be aborted. Different variants of the two phase locking algorithm do exist. The most widely known variants are described below: • Strict Two Phase Locking: Write locks can only be released after the end of the transaction (i.e., after committing or aborting). This constraint prevents the occurrence of write/read conflicts between uncommitted transactions. Consequently, cascading aborts can thus be avoided. • Strong Two Phase Locking: Both write and read locks can only be released after the end of the transaction (i.e., after committing or aborting). Without prior knowledge of how and when a transaction acquires a lock, an STM manager cannot conclude if a transaction will acquire another lock during the lifetime of a transaction. In this case, the end of the first phase is naturally determined by the end of the transaction. From there, the transaction has no choice to release all its locks, thus executing a somehow “degenerated” second phase. The strong two
132
System Level Design with .NET Technology
phase locking variant can perfectly deal with this kind of scenario. This motivated our choice of this variant for our implementation of the two phase locking protocol. In contrast to TMEm and OFree implementations, the TwoPL algorithm does not permit concurrent readers to access the same shared object. On the other hand, during a retry operation, we resume a transaction waiting on some object states to change only when one of these objects is updated by another transaction. This is more efficient than using a timeout since we do not need to wake up the waiting transaction each time the timeout is expired to check if any object state is changed. The SXM algorithm differs from the one implemented by TwoPL with regards to the way conflicts are resolved. The SXM algorithm calls the contention manager to resolve conflicts, while the locking protocol of TwoPL blocks the transaction that tries to acquire an already acquired lock until this lock is released. The contention manager of SXM has three choices to resolve a conflict: • Aborting the calling transaction. • Aborting the transaction conflicting with the calling transaction. • Delaying the calling transaction. SXM comes with different implementations of contention managers [104]. For our experimentations, we selected the aggressive contention manager that always aborts the conflicting transaction. To measure the runtime overhead induced for each implementation, we tested the three implementations on a FIFO benchmark using a single thread. Since only one process can access a shared object at a time, no synchronization is required. Therefore, we did our reference tests without any synchronization. The additional time taken by the respective implementations to execute the FIFO benchmark thus corresponds to their overhead. To simulate a fictive computation for the producer and consumer of the FIFO, we coded two loops before each read and write operations. We did 3 different tests for each implementation by varying the number of execution iterations. Table 5.4 shows the results of these tests.
TABLE 5.4: Overheads per computation load Computation load 0 iteration 1000 iterations 10000 iterations
operation/s overhead operation/s overhead operation/s overhead
unsynchronized 6843646 0% 200335 0% 21048 0%
TMem OFree 212279 156588 3100% 4200% 103800 89498 93% 102% 18996 18592 10% 13%
TwoPL 331685 1900% 121016 65% 19675 7%
First we notice that TwoPL has a smaller overhead than TMem that has itself a
Using Transaction-Based Models for System Design and Simulation
133
smaller overhead than the OFree implementation. Indeed, as explained before, the TMem implementation uses a direct update policy which is less costly than the deferred update policy used by OFree. The TwoPL implementation has the smallest overhead since the former is simply due to locking and unlocking operations required for shared object accesses. When the computation load is insignificant (i.e., 0 iterations), the only computation the producer and consumer do is to read and write once from and to the FIFO. In this case the overhead is huge compared to a simple execution without the instrumentation code required for managing synchronizations. Indeed, this observation is one of the important reasons why Software Transactional Memory is still the subject of intense academic research and has not yet moved to industrial use[49]. To take advantage of multicore machines, the performance of a parallel execution has to surpass the overhead induced by the STM implementation. Obviously, this is not the case for the 0 iteration test case. For example, to reach the performance of an unsynchronized execution, and without counting the overhead due to conflicting or aborted transactions (i.e., considering an ideal parallel execution), we need to execute the TMem implementation at least on a 32-processors machine. However, when the computation becomes important, two concurrent transactions spend more time executing their computation than executing the required synchronization code. The relative overhead being reduced in this case, we can observe a significant performance gain with respect to sequential executions. To illustrate this observation, we performed two tests based on the TwoPL implementation. The first test uses an effective computation of 1000 iterations per operation, while the second test uses 10000 iterations. For each test we varied the number of producers and consumers executing in parallel. The test was run on a VISTA machine comprising two Intel Xeon processors with 4 cores, each running at 1.87 GHz. Figure 5.11 shows the result for 1000 iterations. The overhead being very high, the parallel execution on the 8 core machine did not succeed in surpassing the performance of the single threaded execution. However, when the overhead is reduced to about 7%, as it is the case when considering 10000 iterations, we observe speedup factor of at least 4 (see Figure 5.12). The last test was intended to compare the three implementations. For this purpose, we fixed the number of iterations to 10000 and calculated for each implementation the speedup obtained for different configurations of producers and consumers. Figure 5.13 shows the results of this experiment. The TwoPL implementation presents the best speedup for the FIFO benchmark. This performance motivated our decision to select the TwoPL implementation for our main experiments, whose results are presented in Section 5.5. The following section presents how we can take advantage of the .NET Framework features, such as attribute programming and dynamic code creation, to implement different kinds of STMs based on the C# language.
134
System Level Design with .NET Technology 260000
TwoPL with 1000 iterations Unsynchronized Single Threaded
240000
Operations/s
220000
200000
180000
160000
140000 ST
1P1C
2P2C Threads
3P3C
4P4C
FIGURE 5.11: Variation of the number of operations as a function of the number of producers and consumers (Computation Load: 1000 iterations)
5.4
STM Implementation Using .NET
This section presents two existing C# implementations of STM. Details and algorithms used in the implementation of these STM are intentionally omitted. The emphasis is rather put on the key features provided by the .NET architecture, such as introspection and attribute programming. We will specifically focus on how attribute programming and introspection mechanisms can be used to extend the language without the need to introduce new keywords, therefore avoiding modifications to the programming language. First we give an overview of the SXM transactional memory and explain how a programmer can define shared objects and transactions using delegation and design patterns. Subsequently, we introduce another STM implementation (NSTM) which offers a more convenient interface to end users thanks to the attribute and aspect-oriented-programming techniques implemented by the PostSharp framework. Finally, we show how we can combine advantages of SXM and NSTM to develop a generic STM Framework allowing the integration of different kinds of implementations and using different concurrency control policies. This STM Framework will allow us to conduct different experiments on a same platform using a common benchmark.
135
Using Transaction-Based Models for System Design and Simulation 100000 90000
TwoPL with 10000 iterations Unsynchronized Single Threaded
80000 Operations/s
70000 60000 50000 40000 30000 20000 10000 ST
1P1C
2P2C Threads
3P3C
4P4C
FIGURE 5.12: Variation of the number of operations as a function of the number of producers and consumers (Computation Load: 10000 iterations)
5.4.1
SXM Transactional Memory
SXM allows the creation of both shared and unshared objects. Shared objects are those which can be accessed by many concurrent transactions. They must be instances of classes to which a special Atomic attribute has been assigned. For example, the following code shows a class Node, having this attribute, whose instances denote 4.5
TWOPL TMEM 4 OFREE
Speedup
3.5 3 2.5 2 1.5 1 ST
1P1C
2P2C
3P3C
4P4C
Threads
FIGURE 5.13: Speedup for a computation load of 10000 iterations
136
System Level Design with .NET Technology
list elements allowing access from concurrent transactions. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
[Atomic] public class Node{ protected int value; protected Node next; public Node(int value){ this.value = value; } public virtual int Value{ get{return value;} set{this.value = value;} } public virtual Node Next{ get{return next;} set{this.next = value;} } }
Usually, to create an object of type Node we have to use the new operator as follows: 1
Node root = new Node(0);
However, with the above instruction the compiler will generate an object without the instrumentation code that manages the necessary synchronization when two concurrent transactions access the object. In fact, the creation of objects of some class A should be delegated to a special creator class factoryA that knows precisely how to create and equip instances of type A with appropriate instrumentation code. SXM realizes this using the well-known Abstract Factory design pattern. For each class implementing shared objects (i.e., tagged with the Atomic attribute), SXM creates a factory that is used to create and equip instances of this class with the appropriate concurrency control code. This code includes the required read/write barriers required to manage accesses to shared objects following the policy used by the underlying STM. Therefore, the creation of the factory itself depends on two things: the type of instances being created and the kind of concurrency control policy being applied. SXM currently supports two concurrency control policies, which are TMem and OFree. SXM thus provides the two following kinds of factories: • TMemFactory simulates a hardware transactional memory using very short critical sections. • OFreeFactory provides obstruction-free synchronization using a combination of copying and compare-and-swap calls. The type of factory that shall be used is passed as an argument to the program and stocked in the global variable factoryType. To create a factory building new shared objects of type Node, the user must call the MakeFactory method as follows: 1
IFactory factory = XAction.MakeFactory(typeof(Node));
Using Transaction-Based Models for System Design and Simulation
137
As shown below, the MakeFactory method allows one to create a new factory although knowing its precise type only at runtime. In fact, MakeFactory takes advantage of the reflection mechanism to get the constructor required to create the appropriate kind of factory (as indicated by factoryType). The invoke method is then used to create and instantiate the new factory that will produce new shared objects of the type given in parameter. 1 2 3 4 5 6 7 8 9 10 11 12 13 14
public static IFactory MakeFactory(Type type) { ConstructorInfo constructor = factoryType.GetConstructor( new Type[]{typeof(Type)}); IFactory factory = (IFactory)constructor.Invoke(new object[1]{type}); if (factory == null){ throw new PanicException( "Cannot find object factory: {0}", factoryType.Name); } return factory; }
For instance, if factoryType points to class TMemFactory and the argument type is Node, then the execution of the MakeFactory method is equivalent to the execution of this line of code. 1
IFactory factory = new TMemFactory(typeof(Node));
From there on, new shared object of type Node can be created using this factory. 1
Node node = (Node)factory.Create(value);
Of course, a single factory instance is required for each transactional class (i.e., defining shared objects). Once this factory is created, individual objects are created as indicated above. SXM dynamically produces the required read/write barriers thanks to the capability of the .NET framework to generate code at runtime. Indeed, for each transactional class, SXM generates a specialized subclass taking care of concurrent access matters. Besides barriers required to intercept accesses to shared objects, this subclass also implements all the necessary metadata needed to manage those accesses. For instance, in the above example, the creation of a TMemFactory for the Node class generates a new Node$ subclass. New shared object instances created by this factory are thus of type Node$. As mentioned in Section 5.3.3, the TMem implementation uses a direct update policy which requires saving the current value of a class field before updating it. This is necessary since this value must be recovered if ever the transaction is aborted. For this purpose, the derived class Node$ implements the following IRecoverable interface: 1 2 3 4
public interface IRecoverable{ void Backup(); void Restore(); }
138
System Level Design with .NET Technology
A fragment of the Node$ class, as dynamically generated by the TMemFactory, is given below. Of course, to backup and restore the old values associated to an object, new fields must be added to save each field of the base class. The value field was included in the Node$ class for this purpose. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
public class Node$ : List.Node, TMemFactory.IRecoverable { // Fields private IContentionManager manager$; private XState me$; private List.Node next$; private TMemFactory.SynchState synch$; private int value$; // Methods public Node$(int num1) : base(num1) { this.synch$ = new TMemFactory.SynchState(this); this.manager$ = XAction.Manager; } public override void Backup() { this.value$ = base.value; this.next$ = base.next; } public override List.Node get_Next(){ XState current = XAction.Current; XState writer = null; while (true) { lock (this){ writer = this.synch$.OpenRead(current); if (writer == null) { return base.Next; } } this.manager$.ResolveConflict(current, writer); } } ... }
It is important to recall that the code of dynamically created classes, such as Node$, is originally given in the low-level MSIL language, contrarily to staticallydefined classes which are written in high-level languages such as C# or VB.NET. The code of the Node$ class above has been disassembled from MSIL to C# using Reflector, a tool that permits to decompile and analyze .NET assemblies[200]. In fact, the code fragment below is the one used to generate at runtime the MSIL code implementing the Restore method declared in the IRecoverable interface. 1 2 3 4 5
ILGenerator methodIL; typeBuilder.AddInterfaceImplementation( typeof(IRecoverable)); restore = typeBuilder.DefineMethod("Restore", MethodAttributes.Public | MethodAttributes.Virtual,
Using Transaction-Based Models for System Design and Simulation 6 7 8 9 10 11 12 13 14 15
139
typeof(void), new Type[] {}); methodIL = restore.GetILGenerator(); for (int i = 0; i < shadow.Length; i++){ methodIL.Emit(OpCodes.Ldarg_0); methodIL.Emit(OpCodes.Ldarg_0); methodIL.Emit(OpCodes.Ldfld, shadow[i]); methodIL.Emit(OpCodes.Stfld, fields[i]); } methodIL.Emit(OpCodes.Ret);...
Since SXM derives a subclass from a base transactional class, the following rules should be followed when implementing a transactional class. SXM checks those rules during the definition of the extending class. • Get and set operations have to be virtual; otherwise calls to those methods are intercepted by the base class instead of the inheriting class. • Fields of the base class have to be protected to at least allow subclass methods (e.g., the Backup method of IRecoverable interface) accessing them. To be able to execute a block of instructions as a whole transaction we need to instantiate a delegate of type XStart defined as follows: 1
public delegate object XStart(params object[] args);
When using a delegate, the signature of all the functions to be executed within a given transaction must have the same signature as the delegate. For instance the code to insert an element in a list is defined as follows: 1 2 3 4 5
public override object Insert(params object[] _v){ int v = (int)_v[0]; Node newNode = (Node)factory.Create(v); ... }
To execute the function insert into transaction we instantiate an XStart delegate as follows: 1
insertXStart = new XStart(this.Insert);
Then, we execute the insertXStart delegate as follows: 1
((bool)XAction.Run(insertXStart, value))
To get rid of the need for factories and delegates, the following section introduces another implementation of a software transaction memory which takes advantage of attribute programming features of the .NET framework, so as to provide more convivial interfaces for the definition of transactional objects and transactions.
5.4.2
NSTM Transactional Memory
Similarly to SXM, transactional classes are declared in NSTM [186] by assigning them an attribute which is here named NstmTransactional. But this time, a user can
140
System Level Design with .NET Technology
create transactional objects with the usual new construct without passing through a factory. In a similar way, attributes are used to identify methods which are transactions. More precisely, transaction methods are labeled with an attribute called NstmAtomic. Table 5.5 shows the programming differences between SXM and NSTM.
TABLE 5.5: Differences between SXM and NSTM SXM Declare Transactional class
1 2 3 4 5 6
Instantiate transactional object
1 2 3 4 5 6 7
Call transaction
1 2 3 4
NSTM
[Atomic] public class Node{ protected int value; protected Node next; ... }
1
public override object Insert( params object[] _v){ int v = (int)_v[0]; Node newNode = (Node)factory.Create(v); ... }
1
insertXStart = new XStart(this.Insert); ((bool)XAction.Run( insertXStart, value))
2 3 4 5 6
[NstmTransactional] public class Node{ protected int value; protected Node next; ... }
8
[NstmAtomic] public bool Insert(int v) { Node newNode = new Node(v); ... }
1
this.Insert(v)
2 3 4 5 6 7
Table 5.5 reveals the simplicity of NSTM’s declarative style compared to the operational style of SXM. Indeed, NSTM simply requires the use of two attributes, NstmTransactional and NstmAtomic, to respectively declare transactional objects and transactions. This implementation thus avoids explicit factory-based creation of transactional objects and explicit transaction calls through a delegate. In fact, all the code required to implement the STM is automatically generated (according to the attributes indications) and remains hidden from the end user. This is possible thanks to the use of PostSharp, an open platform for the analysis and transformation of .NET assemblies. The following gives an overview of PostSharp and shows how it is used
Using Transaction-Based Models for System Design and Simulation
141
by NSTM.
5.4.3
PostSharp
Source
Compiler
Assembly Load Time
Compile Time
Laos Plug‐In PostSharp Laos Plugin Laos Plugin Plug‐Ins Enhanced Assembly
Execute
Save
FIGURE 5.14: PostSharp build process
PostSharp [17] is a platform permitting to analyze compiled assembly code and transform it by extending it with new behaviors. Such style of programming is known
142
System Level Design with .NET Technology
as Aspect-Oriented Programming (AOP) or Policy injection. As shown by Figure 5.14, PostSharp can be used during the compile time phase or the load time phase. During the compilation phase, PostSharp integrates itself into the built processes. Once the source file is compiled into binary assembly, a PostSharp post-compilation procedure is automatically activated to transform the binary assembly into a new enhanced assembly. The same procedure can be applied at load time. Indeed, before an assembly is loaded into memory and executed by the .NET Platform, PostSharp is activated to add a new behavior to the original assembly. It then loads and executes the modified assembly instead of the original one. Paragraphs below summarize the main features of the PostSharp Platform (c.f. [17] for details). • PostSharp Core: This module is the heart of the PostSharp Platform. It provides a low level object model API to read, search, manipulate and modify managed .NET assemblies. In addition to the object model, it supplies the necessary infrastructure to manage the transformation process. This transformation consists in defining supplementary tasks and grouping them inside Plug-in so they can be added dynamically to the main program execution and thus extend it with the desired new functionalities. For instance, this feature was of great benefit in the development of the STM framework based on the STM PostSharp Plug-in discussed in Section 5.4.4. • PostSharp Laos: This module is a build-in PostSharp Plug-in which allows enhancement of .NET Assemblies with different types of behaviors (Aspects). Table 5.6 gives an overview of the main aspects supported by PostSharp Laos.
TABLE 5.6: Main aspects supported by PostSharp Laos Aspect Type Method Boundary Method Invocation
Method Implementation Field Access Interface Composition Compound Aspect
Description Permits to add behaviors at different method locations: On Exit, On Entry, On Success, On Exception, Intercepts method call by implementing OnInvocation method handler. The original method is passed as an argument to the handler. Permits to defer the implementation of an abstract or external method. Intercepts the setter and getter methods of a field. Permits to add an interface to a class. The implementation is delegated to another object declared in the enhanced class. Permits to add a set of new behaviors based on individual aspects.
Now let’s look how PostSharp is used inside NSTM. The NstmAtomic attribute is an indicator that some special behavior, related to atomicity, must be added to the attribute’s target. This additional behavior is here provided by an aspect of type “On Method Boundary” as shown by the following code fragment:
Using Transaction-Based Models for System Design and Simulation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
143
[Serializable] [AttributeUsage(AttributeTargets.Method | AttributeTargets.Property | AttributeTargets.Constructor)] public class NstmAtomicAttribute : OnMethodBoundaryAspect{ ... public override void OnEntry(MethodExecutionEventArgs eventArgs){ bool createdNewTx; NstmMemory.BeginTransaction(this.transactionScope, this.isolationLevel, this.cloneMode, out createdNewTx); eventArgs.InstanceTag = createdNewTx; } public override void OnExit(MethodExecutionEventArgs eventArgs){ bool createdNewTx = (bool)eventArgs.InstanceTag; if (createdNewTx) if (eventArgs.Exception == null) NstmMemory.Current.Commit(); else NstmMemory.Current.Rollback(); } }
The NstmAtomic attribute derives from the OnMethodBoundaryAspect class which in its turn derives from the Attribute class. NstmAtomic therefore behaves as a usual .NET attribute. The AttributeUsage construct specifies where this attribute can be applied. According to the specification above, this attribute can only be used to annotate a method, a property or a constructor. This is verified at compile time. Although NstmAtomic is a normal .NET attribute, it inherits from the behavior of aspect OnMethodBoundaryAspect. Yet, when a method is annotated by an attribute derived from OnMethodBoundaryAspect, PostSharp Laos transforms this method into one having the following additional behaviors: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
int MyMethod(object arg0, int arg1){ try{ OnEntry(); // Original method body. OnSuccess(); return returnValue; } catch (Exception e){ OnException(); } finally{ OnExit(); } }
As we can see, PostSharp Laos weaves some “before” and “after” behaviors to the base method by introducing calls to special methods OnEntry(), OnSuccess(), OnException() and OnExit(). These methods, declared in OnMethodBoundary Aspect, are overridden by the NstmAtomic class so as to provide appropriate behaviors required by transaction executions. Hence, before the execution of the base
144
System Level Design with .NET Technology
method, the OnEntry() method is run which allows NSTM to open a new transaction by calling NstmMemory.BeginTransaction(). Similarly, before the base method returns, method OnSuccess() or OnException() is run either to commit the transaction if it succeeded or to roll it back if an exception was raised. The NstmTransactional attribute is defined in NSTM as follows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[Serializable] [AttributeUsage(AttributeTargets.Class)] [MulticastAttributeUsage(MulticastTargets.Class)] public class NstmTransactionalAttribute : CompoundAspect { public override void ProvideAspects(object element, LaosReflectionAspectCollection collection) { Type targettype = (Type)element; collection.AddAspect(targettype, new NstmVersionableAspect()); NstmTransactionalAspect txa = new NstmTransactionalAspect(); foreach (System.Reflection.FieldInfo fi in targettype.GetFields(System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.Instance)) { if (!fi.IsStatic) collection.AddAspect(fi, txa); } } }
This attribute is defined as a “Compound Aspect” since it used to provide a class with a set of new behaviors described by a collection of individual aspects. In the present case, the aspect collection defined by NstmTransactionalAttribute is intended to extend classes with transactional behaviors. This compound aspect is made of two sub-aspects: (1)NstmVersionableAspect, which applies to a whole class, and (2)NstmTransactionalAspect to be applied to each nonstatic field of a class. The NstmVersionableAspect is an “Interface Composition Aspect” which permits to associate version numbers to objects of a transactional class. This version number is necessary for the conflict management algorithm implemented by NSTM. The NstmTransactionalAspect is a “Field Access Aspect” which permits to intercept field setter and getter methods of a class. These intercepted methods can thus be equipped with the appropriate read/write barriers required to synchronize all the accesses to shared objects. The advantage of the PostSharp Laos plug-in is that it can manage many aspects that add behaviors to existing classes, methods and fields, at different locations in the code. All this is done easily without the need to manually modify the MSIL code. However, in some cases we still have to manipulate MSIL code directly when optimizations are required, such as overhead reductions, or when some behavior needs to be added but is not supported by PostSharp Laos. For instance, we cannot use PostSharp Laos to add new field declarations to a class. It is thus not possible to ask PostSharp Laos to modify the Node class to make it implement the IRecoverable Interface, for it would require the declaration of new shadow fields to store old values. To overcome these limitations, we developed a new PostSharp Plug-in dedicated to STM’s specific needs. The following section presents our plug-in and shows how it
Using Transaction-Based Models for System Design and Simulation
145
can be used to group many Software Transactional Memory implementations inside a same STM Framework
5.4.4
STM Framework
In this section we will show how we can take advantage of .NET architecture and PostSharp to develop a Software Transactional Memory framework in order to be able to implement and test different types of STMs in the same framework. The idea is to combine the declarative approach of NSTM implementation with the possibility of defining several STM implementations. For this purpose we developed an STM plug-in based on the PostSharp core. It defines two PostSharp tasks: TransactionTask and SharedTask. These tasks will be executed during the PostSharp build process to add the necessary behaviors to a given assembly. 5.4.4.1
TransactionTask
The TransactionTask will detect all the methods annotated with a Transaction attribute. The following code shows how this attribute is defined: 1 2 3 4 5 6 7 8 9 10 11
[AttributeUsage( AttributeTargets.Method, Inherited = true, AllowMultiple = false)] [RequirePostSharp("STM.STMPlugin", "STM.TransactionTask")] public class TransactionAttribute : Attribute, IGetTransactionManager{ public virtual Type GetTransactionManager(){ return PluginParameters.TransactionManagerType; } }
During the transformation process, the TransactionTask needs the type of the transaction manager which deals with the execution of a method inside a transaction. This type should implement a given static function ExecuteTransaction that takes as parameters a STMGenericDelegate delegate and an array of arguments. These parameters correspond to a Transaction-annotated method (the delegate) and of its arguments. The following code shows the function ExecuteTransaction of a two-phase locking manager TwoPL. 1 2 3 4 5 6 7 8 9 10 11 12 13
public abstract class TwoPLManager : TransManager{ . . . public static object ExecuteTransaction( STMGenericDelegate start, object[] args){ object result = null; while (!TransManager.stop){ try{ myTotal++; BeginTransaction(); result = start(args); EndTransaction(); myCommits++; return result;
146
System Level Design with .NET Technology } catch (RetryException){ . . .
14 15 16
}
17 18
}
The execution of the original method is done through the start delegate of type STMGenericDelegate by executing the following code: 1
result = start(args);
To show how this transformation is done let’s consider the Insert Method given above. However, in contrast with SXM, the user must not define the parameter as an array but give the real integer type. We also replace the return parameter type by bool instead of object. 1 2 3 4 5
[Transaction] public override bool Insert(int v){ Node newNode = new Node(v); ... }
First, the TransactionTask defines a new method ˜Insert which has the same signature as the STMGenericDelegate delegate: 1
public delegate object STMGenericDelegate(object[] args);
Then it takes the original body of the Insert method and replaces each original argument with the corresponding argument defined in the array parameter. For instance, the argument v is replaced by args[0]. The following code fragment shows the new ˜Insert method disassembled by a Reflector. 1 2 3 4
public object ˜Insert(object[] args){ Node node = new Node((int) args[0]); . . . }
Next, a new STMGenericDelegate delegate field is defined and instantiated using the new ˜Insert method. 1 2 3 4 5 6 7 8
public class List : IntSetBenchmark{ private STMGenericDelegate del_Insert; . . . public List(){ del_Insert = new STMGenericDelegate(˜Insert); . . . } }
Finally, the body of the original Insert method is replaced by a call to the ExecuteTransaction function of the transaction manager given as a parameter to the Plugin. 1 2 3 4 5
public override bool Insert(int v){ object[] args = new object[] { v }; return (bool)TwoPLManager. ExecuteTransaction(this.del_Insert, args); }
Using Transaction-Based Models for System Design and Simulation 5.4.4.2
147
Shared Task
The SharedTask has for its goal the detection of all the classes annotated with a Shared attribute. The following code shows how this attribute is defined: 1 2 3 4 5 6 7 8 9 10 11 12
[AttributeUsage( AttributeTargets.Class, Inherited = true, AllowMultiple = false) ] [RequirePostSharp("STM.STMPlugin", "STM.SharedTask")] public class SharedAttribute : Attribute, IAddTransformationAspect{ public virtual IAddBehavior GetTransformationObject(ModuleDeclaration module){ return PluginParameters.GetTransformationObject(module); } }
This task takes as single parameter a class which implements the IAddBehavior interface. Then, for each type annotated with the shared attribute, SharedTask calls the function addBehavior of the interface. Thus, all the transformations have to be done inside the addBehavior method of the type given as parameter to the task. To reduce the amount of work the user has to provide for defining transformations and manipulating MSIL code, a generic SharedAdvice class is made available. This class calls a getObjectAccess method which returns an object implementing the following IObjectAccess interface: 1 2 3 4
public interface IObjectAccess{ object openRead(STMGenericDelegate method, object[] args); object openWrite(STMGenericDelegate method,object[] args); }
Then, each get property is replaced by a call to openRead and each set property is replaced by a call to openWrite. Therefore, the user only needs to specialize the SharedAdvice class and to override the getObjectAccess method as follows: 1 2 3 4 5 6
public class TwoPLAdvice : SharedAdvice{ public TwoPLFactory(ModuleDeclaration module): base(module){} public override Type getObjectAccess(){ return typeof(TwoPLMem); } }
The necessary synchronizations are thus handled by openRead and openWrite methods of the TwoPLMem class. 1 2 3 4 5 6 7 8 9 10
public class TwoPLMem : IObjectAccess{ private object obj; public TwoPLMem(object obj){ this.obj = obj; } public object openRead( STMGenericDelegate method, object[] args){ Transaction trans = TwoPLManager.currentTransaction; if (trans != null) trans.lockObject(obj); return method(args);
148
} public object openWrite( STMGenericDelegate method, object[] args){ return method(args); }
11 12 13 14 15 16
System Level Design with .NET Technology
}
It is sometimes necessary to add some fields to transactional objects or make them implement some new interfaces. For this reason the user can override the methods addFieldInterface and InitiateConstructors. In this case, however, the new code must be specified in MSIL. For instance, let us consider the application of the TwoPLAdvice transformation object to the following generic transactional class: 1 2 3
[Shared] class Shared{ private T value;
4
public Shared(T value){ this.value = value; } public T Value{ get{ return value;} set{this.value = value;} } public static implicit operator T(Shared instance){ return instance.Value; }
5 6 7 8 9 10 11 12 13 14 15
}
The following transformed class is obtained: 1 2 3 4
class Shared{ private STMGenericDelegate del_get_Value; private TwoPLMem objectAccess; private T value;
5 6 7 8 9 10
public Shared(T value){ this.objectAccess = new TwoPLMem(this); this.del_get_Value = new STMGenericDelegate(this.˜get_Value); this.value = value; }
11 12 13 14
private object ˜get_Value(object[] args){ return this.value; }
15 16 17 18
public static implicit operator T(Shared instance){ return instance.Value; }
19 20 21 22 23 24
public T Value{ get{ return (T) this.objectAccess. openRead(this.del_get_Value, null); }
Using Transaction-Based Models for System Design and Simulation set{
25
this.value = value;
26
}
27
}
28 29
149
}
Following the transaction framework presented in this section, we have indeed produced an implementation of the SXM transaction memory. The following recapitulates the advantages of this implementation over SXM: • The creation of a transactional object uses the traditional new construct. Hence, there is no more a need for the cumbersome use of factories. • Get and set properties of the transactional class do not need anymore to be declared as virtual, neither do the fields need to be protected. • Methods simply need to be annotated with the Transaction attribute to be executed inside a transaction. The explicit use of a delegate is thus avoided. • Parameters of a transactional method can have any signature. Thanks to the framework we use, end users can benefit from much more convenient interfaces to implement transactional objects and methods. However, we must make sure these advantages preserve the global expected performance of the system. For this, we compared the performances of the existing implementation of SXM with the new one. We computed the runtime overhead induced by the new implementation compared to the one required by the existing SXM implementation. For this we measured the number of operations per seconds each implementation could execute during a single thread execution. The following table shows the results of this test.
TMem SXM 60000 STM Plugin 108000 Speedup 180%
OFree 14400 32800 227%
So, the new implementation is about two times faster than the original one. This can be explained by the fact that our STM Plugin implementation avoids the extra computation required by SXM when creating a new shared object. Indeed, SXM needs to create a derived class from the transactional object and then creates a new object at runtime using the CreateInstance method of the factory: 1
Activator.CreateInstance(proxyType, args);
The overhead (class derivation, etc.) induced by the use of this factory results in poorer performance than what is obtained when using the statically coded “new” operator. Indeed, thanks to the STM Plugin, we can directly transform the transactional object without the need to derive a new class. This transformation is always done statically either at compile or load time.
150
System Level Design with .NET Technology
With the new framework, the user is free to favor good performance or convivial use. For example, a user can first choose the generic SharedAdvice mentioned above to test a new implementation and thus avoids the harsh manipulation of MSIL code. When he is satisfied with the functionality, he can refine the MSIL code to tune the performance of the system. For instance, when the OFree implementation uses the generic SharedAdvice, a call to the invoke dynamic method is required, which might slow down the execution: 1 2
object clone = sync.openRead(); return method.Method.Invoke(clone, new object[] { args });
This can be corrected by making the call static. But this is only possible if the user dares manipulating the low level MSIL code. Finally, since we have access to the low level MSIL code, it is now possible to inline the body of the ExecuteAction function inside the transactional method. This way, a double call to the delegate and to the original method body, can be avoided. Although lengthening the size of the code, this solution should provide better performance. Figure 5.15 shows the C# implementation of the FIFO model we used for experimenting with our transaction model. The following section presents our experimental results and compares them with the ones obtained with SystemC when using the sc fifo and removing the register port function to allow the declaration of many producers and consumers. For performance reasons, we use two shared variables to indicate the respective position of the FIFO’s head and tail. A shared boolean variable is also used to test if the FIFO is full or not. These shared variables will allow concurrent accesses to the FIFO by producers and consumers as long as they do not try to simultaneously operate on the same position i.e., when head = tail.
5.5
Experimental Results
The objectives of our experiments, with regards to the transaction-based simulation model we propose, are the following 1. Show that we can profit from a multicore machine to increase the simulation performance. 2. Demonstrate that our transaction-based simulation model allows more nondeterministic interleaving between the processes than SystemC does. 3. Provide estimations of the real system performance from simulation results. Let us remark that such estimations cannot easily be obtained with SystemC since its simulation model does not allow for the actual parallel execution of the real system.
Using Transaction-Based Models for System Design and Simulation 1 2 3
151
public class Fifo{ private int _size; private int[] frame;
4
private Shared head; private Shared tail; private Shared[] full;
5 6 7 8
public Fifo(){ _size = 10; frame = new int[_size]; full = new Shared[_size]; head = new Shared(0); tail = new Shared(0); for (int i = 0; i < _size; i++){ full[i] = new Shared(false); } } [Transaction] public override bool Enqueue(int v) { if (full[tail].Value) TransManager.Retry(); frame[tail] = v; full[tail].Value = true; tail.Value = (tail.Value + 1) % _size; return true; } [Transaction] public int Dequeue(int v){ int loc; if (!full[head].Value) TransManager.Retry(); loc = frame[head]; full[head].Value = false; head.Value = (head.Value + 1) % _size; return true; }
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
}
FIGURE 5.15: FIFO implementation using Transactions
The transaction model simulator used in our experiments implements the strong two phase locking protocol described in Section 5.3.3. Table 5.7 shows the execution time (in seconds) of four producers and four consumers accessing the FIFO 10000 times. The computation load of individual producers and consumers is simulated by a loop containing up to 106 iterations. This test was run on a VISTA machine composed of two Intel Xeon processors with 4 cores, each running at 1.87 GHz. As the computation load of processes increases, concurrent accesses allowed by the parallel host machine are all the more profitable. As a consequence, our transaction-based simulation model succeeds in outperforming the sequential simulation of SystemC.
152
System Level Design with .NET Technology
TABLE 5.7: Co-routine vs Parallel Execution Time (4 Consumers, 4 Producers, FIFO size: 100, Time Unit: sec.)
Computation load Transaction Model SystemC
0 9.48 0.031
1000 10000 9.70 11.38 0.343 3.058
100000 12.49 30.171
1000000 61 301.20
We conducted a second experiment using 3 producers and 5 consumers. Table 5.8 clearly shows the inadequacy of SystemC in this context: we see that only Consumers 1 and 5 are active, and that interactions between processes are very limited. On the contrary, with the transaction-based model, interactions appear to be uniformly distributed between producers and consumers.
TABLE 5.8: Co-routine vs Parallel Execution Time (4 Consumers, 4 Producers, FIFO size: 100, Time Unit: sec.)
Consumer1 Consumer2 Consumer3 Consumer4 Consumer5
Producer 1 Producer 2 Producer 3 SC AA SC AA SC AA 0% 21.58% 50% 21.40% 100% 19.06% 0% 19.10% 0% 19.18% 0% 18.37% 0% 19.61% 0% 20.18% 0% 21.16% 0% 20.07% 0% 22.06% 0% 21.87% 100% 19.64% 50% 17.18% 0% 19.54%
Figure 5.16 shows the frequency of accesses to the FIFO and how its size changes over time, for the transaction-based simulation. Given the way SystemC executes, these results could not be obtained easily. Since each process executes as a coroutine, the producer executes until the FIFO is full before giving the control to another process; the consumer then takes the control and empties the FIFO thus leading the other consumers to starvation. With the transaction-based model, this situation is avoided and we can therefore measure the maximum FIFO’s size required for a given computation load (Figure 5.16a) and we can observe how it is used over time (Figure 5.16b). Transactions contribute to increase the expressivity of the modeling language by enabling the design and simulation of microelectronics systems consisting of both hardware and software components. This can be most beneficial in the case of hardware/software partitioning and co-simulation[69]. As SystemC lacks the necessary mechanisms to simulate software components, such as preemption and priority scheduling modules, the designer is required to do hardware/software partitioning before launching the co-simulation. The co-simulation requires manual refinement of soft-
153
Using Transaction-Based Models for System Design and Simulation
frequency(1K)
2
0
70
Fifo size distribution
0
FIFO size variation
FIFO size
4
40
50
FIFO size
90
0
(a) FIFO size distribution
0
40
accesses(K)
80
(b) Evolution of FIFO size over time
FIGURE 5.16: FIFO size estimation
ware components on the target simulation platform. If the type of partitioning was inappropriate, the designer has no other choice than going over all the partitioning and co-simulation phases again. By taking advantage of the preemption nature of the operating system under which the model is simulated, this designer can now explore different hardware/software configurations and decide which partitioning is the most appropriate before going further in the refinement process. Concerning verification, transactions obviously simplify the description of concurrent programs. The designer can focus on each transaction in isolation from the others, without having to manually control all the necessary synchronizations required for locking as well as for deadlock and livelock avoidance. As systems become more complex, this kind of task is very difficult and error-prone. One will therefore greatly appreciate to see that all synchronization issues can be automatically managed by the simulation environment. The result is that one can now produce executable models which are more reliable since they are less prone to errors due to manually managed concurrency.
5.6
Conclusion and Future Work
The approach based on transactions offers to microsystems designers a powerful design technique. The concept of transactions as a concurrency control mechanism has been widely studied in the domain of distributed programming and database transaction management. In this chapter, we showed how this concept can be levered up to answer methodological needs of microsystems design. As stated above, we envisage conducting many phases of the development cycle within the same unique modeling environment: the environment shall adapt to the needs of individual phases such as modeling, co-simulation, verification and synthesis. We plan to give for-
154
System Level Design with .NET Technology
mal semantics of the execution model coordinating hardware and software components at different levels of abstraction. We also intend to develop a user interface to ease the configuration of hardware and software components and thereafter facilitate phases concerned with architecture exploration and hardware/software partitioning. Logically, the next step should address the refinement of the hardware subsystem to Bluespec[166] compiler in order to obtain a direct translation from the transaction level to the register transfer level. At the moment, designers are able to verify models at the RTL level using linear temporal logic (LTL). We are currently working on the implementation of SystemVerilog Concurrent Assertion (SVA). We are interested in extending this language to match the semantics of transactions in order to allow the verification at a higher level of abstraction than RTL. Implementations of transactions exploit multiprocessor machine capabilities. However, they may introduce an important runtime overhead. To overcome this overhead, some practical implementations are proposed and will be explored.
6 Simulation at Cycle Accurate and Transaction Accurate Levels Fr´ed´eric P´etrot and Patrice Gerin Laboratoire TIMA UJF/INPG/CNRS - France
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Short Presentation of the Cycle Accurate and Transaction Accurate Abstraction Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Cycle Accurate Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Transaction Accurate Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1
155 156 157 167 175
Introduction
Simulation is a primary technology in integrated circuit and integrated system design. It has been an intensive field of work since the start of the micro-electronic ages, because it is the only way of validating circuits that either cannot be analytically described or whose size cannot be handled analytically by a designer, i.e., practically speaking all circuits. As such, designing a circuit or a complete electronic system means writing lines of codes in a language suited to model a specific type of behavior. The simulation aims at making sure that the circuit indeed behaves as expected, by applying patterns and having the user checking the results. Simulation strategies are very dependent on the level of abstraction used: solving differential equations representing transistor networks by fixed point computation is very different from simulating a whole multiprocessor system that includes everything up to the application that runs on it. The focus of this chapter is the simulation of multiprocessors SoC, including hardware and software parts, and more precisely at first the Cycle Accurate Bit Accurate (CA) level,1 in which the exact behavior of the communication protocol in terms of 1
Cycle Accurate models are accurate on the interfaces, since they are the only way to communicate at the signal level, but approximate concerning the timings of the computations, as reaching total accuracy in the models implies a huge effort that is usually not useful.
155
156
System Level Design with .NET Technology
signaling, is detailed, and the Transaction Accurate (TA) level, in which the communication is based on transactions that hide the details of the protocol behind function calls. These two modeling levels are well accepted in the integrated system design community because they fulfill a need for the validation of different software layers on top of a more or less abstract view of the hardware. This part of the system where the hardware and the software are intimately intermixed is called the hardware/software interface, as it is indeed similar to the processor instruction set at the register transfer level [111].
6.2
Short Presentation of the Cycle Accurate and Transaction Accurate Abstraction Levels
Domain specific models of computations, such as synchronous data-flow, Kahn processes, synchronous languages, finite-state machines, are useful for expressing in a well formalized way the behavior of a specification at the system level [76]. An implementation of a system described by these means can be obtained by refinement, in using a galaxy of adhoc languages used to write executable or synthesizable specifications and merging the result of their compilation in a well defined way using very precisely specified steps forming what is called a design flow. Unfortunately, there is no and there probably will never be a formal model able to describe the path from system level specification to RTL implementation of hardware/software systems in general. Even worse, the implementation requires a very high level of detail and a so precise definition of the hardware and software behavior that abstracting from it automatically to make the problem tractable (i.e., extracting the semantic) is hopeless. Therefore, the simulation of the entire system as abstracted by the engineer is the solution that is currently in use. Each abstraction level must then be able to respond to a specific design question. The Cycle Accurate Bit Accurate model (Figure 6.1.a) allows for the validation of RT level issues, such as hardware behavior of finite state machines, and implementation of the Hardware Abstraction Layer for a given platform. This detailed validation has a cost: speed of execution, that makes it unacceptable for OS or application level validation. However, it is sometimes necessary to revert to it, as it allows for example to put to light the very high sensitivity of some system to specific load or traffic conditions. The Transaction Accurate level (Figure 6.1.b) hides the hardware details, but leaves, for example, the system memory map visible. It is useful to validate OS or high level communication primitives, even applications linked with them, on top of functional but not necessary timed hardware. The speed of simulation is usually high enough, specifically when timing is ignored. Timing information can be added and made acceptably accurate for the hardware [195]. Timing annotation of the abstracted
Simulation at Cycle Accurate and Transaction Accurate Levels
157
FIGURE 6.1: a) Cycle Accurate and b) Transaction Accurate levels
HW/SW interface is more complex and, subject of on-going researches [126, 40]. Ignoring timing is indeed a way to define a nondeterministic behavior that, if the simulation environment supports it, allows for different interleavings of the processes. Such nondeterminism is a way to ensure that the system design does not rely, for proper functioning, on specific low level issues such as arbitration and so on.
6.3 6.3.1
Cycle Accurate Simulation General Description
As instruction level simulators are key tools for processor design [110, 58, 180], modeling entire multiprocessor systems with cycle accuracy is a key technology for the design of core based systems that include both hardware and software. The cycle accurate simulation of such systems is very time consuming because embedded systems contain several processors running software, dedicated coprocessors, and I/O controllers connected together. To be able to check a system for functional correctness, or evaluate the quality of a hardware/software partition, a huge number of cycles has to be simulated. The simulation speed is then the key point, even though CA simulators are required mainly for the validation of the implementation of the hardware abstraction layer. Other uses include precise estimation of application execution time (possibly enhanced with power metrics) or detailed debugging of synchronization in case of deadlocks or live locks appearing only when introducing precise timings.
158
System Level Design with .NET Technology
As related in [164], most of the simulation time in classical event driven simulation is spent in the scheduling of simulation events. Since our primary objective is speed, we perform static scheduling at compilation time. We follow in that the idea of [122], but focus on higher levels of abstraction. We outline in Section 6.3.2 the type of communication protocols we want to be able to cope with taking into account that no event propagation should occur. From this point, in Section 6.3.3, we formalize the scheduling of events as a graph problem, and show that a simple topological sort achieves the proper order of execution of the components that build up the system. This leads to a compile-time schedule that produces a simulator without the need to resort to event propagation.
6.3.2
System Properties
In order to write efficient CA system simulators, one must study their distinctive features. Core Based high performance systems are usually based around a centralized interconnect, bus or NoC, for easy addition of components. The on chip interconnect must run at a high frequency and have a low latency. These two requirements are hard to fulfill, because high frequency means short combinational paths, whereas low latency can only be achieved by a small number of cycles for a given transaction. The way designers have solved the problem is to allow only a few (if any) signals to be combinational, and allow only a few components to generate them. A signal is combinational when it is emitted by a component as an answer to another signal received in the same cycle, such as the miss signal from a cache responding to the fetch request of a processor. Furthermore, even less architectures need combinational loops. Unfortunately, distributed bus arbitration schemes or daisy-chained interrupts may be implemented so. Therefore, the CA simulation strategy must be able to cope with that.
6.3.3
Formal Model
Let us now introduce more formally system modeling for CA simulation. A system at the cycle accurate level of abstraction is best described as a collection of communicating sequential processes [115, 87]. For the hardware designer, these processes are further modeled as a set of finite state machines. Definition: A FSM is defined using a quintuple {I , O, S ,T , G }, where I is the input set, O the output set, S the state set, T the set of transition functions such that T : I × S → S , and G the set of generation functions such that either G : I × S → O, for Mealy FSMs, or G : S → O, for Moore machines. We denote I, O and S, the bit vectors corresponding to the encoding of respectively I , O and S , as given by the models that implement the IP’s behaviors. Similarly, we denote T and G the vectors of boolean functions that implement respectively the T and G functions over the encodings. The distinction between Moore and Mealy machines is of great importance here: Moore machine outputs depend solely on the current value of the machine state register, whereas Mealy machine outputs depend also from their inputs. Since a Mealy
159
Simulation at Cycle Accurate and Transaction Accurate Levels
machine input change may have an impact on its outputs, we have a combinational path. We introduce the notion of time within the above sets to model the sequential or combinational nature of the signals. We note Xtk the subset k of set X at time t. Time changes on a special event: a global clock signal rising (or falling) edge. The state of a machine is modified when this event occurs: the symbol ⇐ is used to indicate ck
memorization on a, let us say rising, clock edge. For the Mealy FSMs: t+1 Sn t t Ii ⊆ I0 ∪ k=1 Ok FSM i St+1 ⇐ Ti (Iti , Sti ) i t+1 ck t+1 Oi = Gi (It+1 i , Si ) For the Moore FSMs: S It+1 ⊆ It0 ∪ nk=1 Otk j ⇐ T j (Itj , Stj ) FSM j St+1 j ck Ot+1 = G (St+1 ) j j j Definitions: Let It0 the primary inputs to the system at time t. Let G(V, A) a directed graph whose vertices V = {v1 , v2 , . . . , vn } are the FSMs, and whose arcs A represent the Mealy signals interconnecting the FSMs. We call this graph a combinational interprocess communication – CIC for short – graph. We claim that if there is no cycle in the CIC, then the system can be statically scheduled. We will prove this claim and provide an algorithm to order the system components. Proposition 1: A predefined order of FSM evaluation ensuring a correct propagation of events can be found if and only if no cycle exists in the CIC graph. Proof: Let us first introduce time in the FSM behaviors. A system is made of n Mealy and/or Moore FSMs. To simplify the proof, and because a time step corresponds to a clock tick in our model, we make the hypothesis that the primary inputs It0 are themselves held in registers. We now build the directed graph G(V, A). There are no arcs leaving the vertices representing the Moore FSMs. Only vertices representing the Mealy FSMs have leaving arcs, since an input changing at time t can change the outputs at time t. Both Moore and Mealy FSMs can have incoming arcs. Figure 6.2 illustrates a combinational path in a system of communicating FSMs. The if part: Let’s assume that a predefined evaluation order exists. We can then label each FSM with its order number using an ordering function f such that: f : {FSM1 , FSM2 , . . . , FSMn } → {1, 2, . . . , n}. This order is clearly data independent, and depends only on the relation between the combinational signals of the system, since the other signals are held in registers and do not propagate through the FSMs. This implies that the result of a combinational computation depends only on prior combinational computations, i.e., the
160
System Level Design with .NET Technology Moore (1)
Mealy (2)
Moore (3)
T
T
T
G
G
G
Corresponding CIC graph (1)
(2) (3)
FIGURE 6.2: A combinatorial path in FSMs. (1) is evaluated independently of (2) and (3). (2) must be evaluated before (3)
evaluation of FSM i depends on the combinational signals computed by FSM k , for 1 ≤ k < i. We restate f as a relation φ : V → {1, 2, . . . , n} on the CIC graph, such that if (v p , vq ) ∈ A, φ (v p ) < φ (vq ). The relation φ is trivially derived from f . If G is not acyclic, it contains at least a closed walk v1 → v2 → · · · → v p → v1 . By definition of φ , φ (v1 ) < φ (v2 ) < · · · < φ (v p ) < φ (v1 ). Thus φ (v1 ) < φ (v1 ) which contradicts the definition of φ . So G is acyclic. The only if part: if the graph G is an acyclic directed graph, DAG, the relation drawn by this graph has the usual for a DAG yet interesting property of being a partial order[178]. We can then number the vertices using a function λ : V → {1, 2, . . . , n} so that the predecessors of a vertex have lowest indexes than that vertex itself, and that the successors of the vertex have highest indexes. Once the vertices are labeled, the graph has the following property: if (v p , vq ) ∈ A, λ (v p ) < λ (vq ). By induction, if there exist a path va → vb → · · · → v p → vq , we have λ (va ) < λ (vb ) < · · · < λ (v p ) < λ (vq ). Considering now the vertices as FSMs to be simulated, this property translates to: ∀i | 0 ≤ i < n, Iti+1 ⊆ It0 ∪
i [
Otk
k=1
This implies that traversing the graph in increasing values of λ propagates the events in the required order for proper simulation. Let us assume there is a cycle in the graph. Any vertices vi ∈ V in this cycle are a predecessor and successor of itself. This means that examining the cycle {vi , vi } S generalizes to all cycles. We have Iti ⊆ It0 ∪ ik=0 Otk and Oti = Gi (Iti , Sti ). A substituS t t tion gives Iti ⊆ It0 ∪ Gi (Iti , Sti ) ∪ i−1 k=0 Ok . The relation is such that Ii depends on itself.
Simulation at Cycle Accurate and Transaction Accurate Levels
161
The solution to this equation of the form x = f (x) can be found only using iterative methods using the values actually computed. This proposition has a direct corollary of great interest for simulation. Corollary: If a CIC graph is a DAG, each component has to be evaluated only once. Proof: In the above proposition proof, we have noted the following property on the S propagation of events: ∀i | 0 ≤ i < n, Iti+1 ⊆ It0 ∪ ik=1 Otk . If we write down the sequence of evaluations implied by a CIC graph traversal, we have: It1 It2 It3 Iti
⊆ It0 ⊆ It0 ∪ Ot1 ⊆ It0 ∪ Ot1 ∪ Ot2 ⊆ It0 ∪ Ot1 ∪ Ot2 ∪ · · · ∪ Oti−1
This clearly shows that all (k = 1, 2, . . . , i − 1), Otk have to be evaluated only once and in sequence to ensure a proper propagation from Otk to Otk+1 . Practically speaking, ordering a DAG can be done in O(|V | + |A|) time using a topological sort as described in [62]. Once the graph is labeled, a traversal in increasing vertex number with evaluation of each FSM once provides a correct simulation. Let us now handle the case where cycles exist, for which we have to resort to an iterative method. Proposition 2: A cycle exists in the CIC graph if and only if there is a combinational loop in the system. Proof: Assume a combinational loop, defined by Ii being a function of itself: Iti = Gc1 (Gc2 (...Gci (Iti , Sti )..., St2 ), St1 ), exists in the system. We build the digraph (directed graph) part that represents this equation as described above: vi → v j → · · · → v2 → v1 → vi . Clearly, this walk is a cycle. Conversely, we assume that there is a cycle in the CIC graph v1 → v2 → · · · vn → v1 . We now write down the evalutation path in terms of FSM generation functions: It1 = Gcn (...Gc2 (Gc1 (It1 , St1 ), St2 )..., Stn ). This describes precisely a combinational loop. Proposition 3: No static schedule can be found if the CIC graph contains at least one cycle. Proof: The vertices, representing the FSMs, are arbitrary numbered from 1 to n. Let us assume there is a cycle C ⊆ V in the graph. Any vertex vi ∈ C in this cycle {vc1 , vc2 , . . . , vcn } is a predecessor and successor of itself. This means that we can reduce the problem to the simple cycle {vi , vi } without loss of generality. Thus, using S the FSM combinational equations, we have Iti ⊆ It1 ∪ nk=1 Ock t with Oci t = Gci (Iti , Sti ). To make the dependence on Iti stand out, we rewrite the equations as Iti ⊆ It1 ∪ S c c t Sn ct t Gi (Iti , Sti ) ∪ i−1 k=1 Ok ∪ k=i+1 Ok . The relation is such that Ii depends on itself, so no schedule can be found at compile time.
162
System Level Design with .NET Technology
Proposition 4: Given a set of n FSM i , i ∈ {1, 2, . . . , n}, a correct concurrent processe, evaluation step, i.e. from time t to time t + 1, is obtained in two stages. The first state evaluates once the Ti and Gi functions using a static schedule. The second stage repeatedly evaluates the Gci functions until the output signals of all FSMs are stable. Proof sketch: We have proved that the first evaluation stage correctly computes the t+1 St+1 of the Moore FSMs and the Osi t+1 of the Mealy FSMs previously. i , the Oi This algorithm finds out a correct evaluation order using a topological sort on the CIC graph. Since Ti is never called again during a simulation cycle, the state is steady. What it remains to prove is that iterative evaluations of the Gci , where vi ∈ C, compute correctly the outputs. This is indeed a result of well known mathematical fixed-point methods if the computation converges. The practical interest of this fourth proposition is that only the combinational parts of the relevant FSMs have to be evaluated several times. These parts are easilly identified when writing a model, and are usually small.
6.3.4 6.3.4.1
Simulator Implementation Simulation Strategies
It exists in the literature three general types of algorithms for simulating combinational circuits:2 Relaxation All the models are evaluated in any order until complete inputs and outputs stabilization. This algorithm can be optimized by ordering the models in a “clever” way when the dependency graph is acyclic. Unfortunately, in the current case, the graph contains cycles. This algorithm is inspired from the mathematical solution to equations of the form x = f (x), and dates back to the early days of computer simulation, Event-driven [167] Only the instances of which inputs have changed are evaluated until stabilization. This algorithm has to manage events appearing on the instances’ inputs. This requires to be able to say exactly which signals have changed, at what time and which instances have inputs driven by these signals. In practice, this algorithm limits the number of changes, called events, that are applied to each instance, Demand-driven [184] The only models evaluated are the ones whose outputs values are actually needed. This requires a reverse propagation of events from the outputs to the primary inputs, taking into account the support of the evaluated functions. This strategy is optimal in the sense that only the required instances shall be evaluated. However, it needs complicated information, such as which output depends on which input, for each instance. This is unfortunately particularly badly adapted to IP simulation, where most models are considered as black boxes.
2
These strategies are presented in order of complexity regarding the management of events.
Simulation at Cycle Accurate and Transaction Accurate Levels 6.3.4.2
163
Executable Models
Each executable FSM model has to be specified as three sequential functions in the implementation language of choice. These are the functions that implement the transition T, the generation of the Moore signals, that depend solely on the state, Gs , and the generation of the Mealy signals, Gc , that we call T , Gs and Gc . Denoting i the component instance under consideration, the Ti and Gsi are to be called once and only once per cycle according to Proposition 4. As the Gci function computes the outputs that depend on inputs as a combinational function of the possibly changing inputs and the current steady state, it may be necessary to call it several times to achieve a fixed point if it belongs to a cycle of the CIC. This means that some kind of event management – even very primitive – is required then, but also ensures that there is no performance degradation for statically schedulable systems. 6.3.4.3
Choice of a Simulation Algorithm
Since we have few models to evaluate, and these models are quickly evaluated, it seems unreasonable to manage a signal precise event list coupled to the sensitivity list of each instance to check if an evaluation must take place. As a matter of fact, each model is evaluated in a few hundreds of host instructions, thus is usually less complex than the handling of the scheduler and the sensitivity lists necessary for event dependent simulation. Also, experience has shown that the systems to be simulated with high level tools contain few combinational signals and even less combinational loops. Thus the simplest solution, relaxation, is quite acceptable. However, devising a clever compile time ordering of the components may help in avoiding as much as possible unneeded evaluations. Two points have to be clarified when implementing a relaxation algorithm: how to indicate a change and in what order shall the components be evaluated. 6.3.4.3.1 Stable State Detection Relaxation has to check if the system, or the part of the system under concern, is in a steady state. This can be implemented in a straightforward manner: each time a signal value is computed by a Gci function, we check if the driven value is different from the previous one. In that case a flag is set. As long as this flag is set, the system is re-evaluated. This incurs a small computational overhead as there is only one comparison and one possible flag setting per signal written during an FSM evaluation. 6.3.4.3.2 Scheduling The idea is to determine at compilation time an order of evaluation such that the number of reevaluations at execution time will be minimized. For this, we define a heuristic that guarantees first that if Gci and Gcj are such that Gcj depends on a result computed, at some stage, by Gci , and that the converse is not true, then Gci will be evaluated before Gcj , and second that if any two functions are independent they will not be evaluated together (i.e., in the same loop).
164
System Level Design with .NET Technology
1. We take the CIC digraph G(V, A) as the starting point. Usually a system level graph is disconnected, so each independent sub-digraph can be evaluated independently. 2. We now have to treat each sub-digraph D(V 0 , A0 ) with V 0 ⊆ V and A0 ⊆ A. If the sub-digraph has no cycle, it is sufficient to call all the Ti functions in any order, then all Gsi function in any order and then the Gci in the order defined by a topological sort,3 as it was proved in [171]. If the sub-digraph contains at least one cycle, we identify its strong components, classically defined as follows [31]: If ∀v ∈ C, where C is a sub-graph of D, there exists a path from vi to v j with i 6= j, then C is a strong component of D. This phase is more general than simple cycle identification in the sense that it is built on the largest sets of vertices containing cycles. All vertices that belong to C follow the equivalence relation “is reachable from” and thus the set of all strong components of a digraph is a partition. 3. We build the graph D∗ (V ∗0 , A∗0 ) such that V ∗0 = {C1 ,C2 , . . . ,Cn } and A∗0 = {(Ci ,C j )}, i 6= j with v ∈ Ci and u ∈ C j such that (v, u) ∈ A. This operation, called condensation [179], produces an acyclic digraph; thus a partial order of the vertices of D∗ exists. We use the algorithm of [130] to generate the topologically sorted strong components of the sub-digraph. 4. Until now, we have postponed the handling of cycles. We state the scheduling problem as an ordering of the vertices that belong to a strong component, as it seems more optimal than treating the problem on a cycle per cycle basis. This will guarantee that there will be no evaluation loop within an evaluation loop. The proposed approach is illustrated in Figure 6.3, where the shaded region represents the strong component of the sub-digraph. It is quite clear that reach-
1 2 6 1
2
3
3 4
5
7
FIGURE 6.3: An example sub-digraph containing a strong component
3
4
5
FIGURE 6.4: A strong component that has no Hamiltonian path
There may be many acceptable such orders, any will do.
Simulation at Cycle Accurate and Transaction Accurate Levels
165
ing stability for the nodes v3 and v6 is useless, because the path v3 → v4 → v5 → v6 exists only, from a FSM evaluation point of view, because it can induce a change in the v6 outputs that drive v3 . Thus it is probably more desirable to simulate until stabilization the path v3 → v4 → v5 → v6 . We use v3 to start the evaluation because it is the unique vertex of the strong component, C, that can be accessed. Should several vertices belong to the inreach set of C(V 00 , A00 ), then one would be taken at random. The same holds for the outreach set. So the characteristics of the order we want to determine is that it is a path, i.e vi and v j appear in sequence if and only if (i, j) ∈ A00 , and that each v ∈ V 00 appears once and only once. Such a path is termed a Hamiltonian path. Since all v ∈ C are seen, a single path is built, giving the guaranty that no inner loop has to be evaluated within a loop. Unfortunately, all strong components do not have the property of having a Hamiltonian path. Figure 6.4 is an example where we cannot find a Hamiltonian path in the digraph. In that case, we artificially add an arc, (v4 , v5 ), to avoid evaluating v3 twice. The constraint during the construction of the Hamiltonian path on a not yet Hamiltonian graph is to minimize the number of arcs added. Finding Hamiltonian paths is computationally intensive, but well known. We use the method proposed in [54] that works well even for reasonably large number of nodes. Figure 6.5 summarizes the construction of a schedule from a CIC digraph. The parts labeled A, B and C are the disconnected components of the digraph: they can be evaluated independently. The parts labeled 1, 2, 3 are identified using steps 2 and 3 of the approach: they must be evaluated in sequence. Finally, vertices labeled 1.1, 1.2, etc, belong to cycles: they must be iteratively evaluated until stabilization.
1.1
1 1.2 1
2.1
A 2
2.2
B 2 2.3
C 1
3
FIGURE 6.5: Scheduling algorithm illustration
166
System Level Design with .NET Technology
6.3.4.3.3 Handling multiple clock domains Even though the efficiency of Cycle Accurate simulation relies on the clear identification of clock, static4 multisynchronous systems can be handled by a static schedule. Assuming all couples of frequencies are rational, the way to build a correct static schedule is as follows: 1. Compute the least common multiple l of the t = 1/ f periods of the system, 2. Build an evaluation loop of l cycles instead of 1, and number the cycles from 1 to l, 3. At each cycle c ∈ {1, . . . , l}, execute the modules whose clock modulo c equals zero, 4. At each cycle, also execute all Mealy generation functions Gc , as per nature these functions are independent of the clocks. 6.3.4.4
Implementation
A system whose combinational interprocess communication graph satisfies the property of being a DAG can be scheduled at compile time and simulated correctly without a dynamic, i.e., data dependent, scheduling of events. Furthermore, this model ensures that each process is evaluated once and only once per cycle. The implementation of a simulator based upon these principles is easy. Informally, the simulator kernel is a loop in which each process is called once. The calling order is computed as follows. The interprocess communication links are described as a netlist of processes. The only supplementary information needed on this netlist is a tag on the FSM outputs that are combinationally dependent on their inputs. A topological sort on this netlist using only the Mealy signals (signals connected to a combinational output) as possible arcs produces a correct evaluation order (ties are broken at random). If a cycle is detected, the generation fails, and the user must resort to an event driven simulator. Practically speaking, the CIC graph is very sparse and contains many disconnected sub-graphs. Let us now sketch the implementation. The sequential simulator needs two tables: the first table contains basically the values of the signals as computed during the previous cycle, and the second table contains the values of the signals computed during the current cycle. At the end of the cycle, the tables are swapped, to make the current values of the signals the input of the next cycle. The second table is only written into, and correct simulation model implementation ensures that no signal is written by several drivers. The role of the first table is however more complex. The combinational signals being read and written in the same simulation cycle, the combinational signals of the Mealy FSMs are read and written in this table. The correctness of the simulation is ensured by the absence of cycles in the CIC graph. Note that the two tables are necessary, as the inputs must be stable and alike for all Moore FSMs during the simulation cycle. 4
As opposed to dynamic where frequency can be adjusted at run time.
Simulation at Cycle Accurate and Transaction Accurate Levels
167
Two tools, one based on pure C [170, 24] and one on SystemC [44], have been developed and demonstrated based on these principles. Even though this strategy is quite efficient, as it allows to reach approximatively 106 cycles per second for a uniprocessor system with caches, memories and a few peripherals, it is not efficient enough to simulate very complex architectures with a high number of processors, memory hierarchies, networked interconnects and so on. This is the reason why higher abstraction levels have been introduced, like the transactional level that is the subject of the next section.
6.4 6.4.1
Transaction Accurate Simulation General Description
TA simulation aims at hiding the details of the hardware, by providing purely functional views of the hardware components. However, most existing TA approaches abstract the behavior and communication of the peripherals, memory and coprocessors, but still use Instruction Set Simulators (ISSs) for the execution of software. This leads to the need of a complete and coherent set of software layers, including the lowest ones, i.e., the HAL, and to potentially long simulation times, as the interpretation of instructions is usually quite slow. Dynamic recompilation techniques [30], that have been introduced recently due to the computing requirements of the virtualization technologies, tend to have very good timing figures; however, they are quite complex to use for modeling a new processor and still suffer from the need for the whole cross-compiled software layers. So, unlike the Cycle Accurate and usual Transaction Accurate levels, we propose to use a “native” software execution approach. This means that the software is not cross-compiled, but is simply compiled on the host machine (the one that runs the simulator) and attached to the simulator to indeed realize what the embedded software is expected to do. The MPSoC architectures considered at the Transaction Accurate level in this approach are partitioned into so called hardware and software nodes, connected through a communication network as illustrated in Figure 6.6. A hardware node is a component that does not provide any programming capabilities. A communication network connects all the nodes together and provides the communication between all the nodes. Software nodes on which the Transaction Accurate level focuses provide the execution environment for the software tasks of the MPSoC applications. The software is organized in layers which are the (usually multi-threaded) application layer, the Operating System (OS) and library’s layer (that provide software supports for the application execution) and the Hardware Abstraction Layer (HAL) that hides the hardware specificities. The hardware part of a software node, called CPU sub-system in this chapter, contains one or more processors used in a symmetric multiprocessor
168
System Level Design with .NET Technology
FIGURE 6.6: MPSoC and SW Node Architecture
manner. The important fact is that a single operating system or kernel runs on top of it. As most of the design time is spent in software validation, fast and if possible accurate simulation platforms are required. Several approaches based on native execution of the software at the TA level have been devised in the past few years. [88], [126] and [191] are some of them. Most of the recently proposed approaches are based on native simulation to achieve high simulation performances and flexibility. However, the current native approaches are suitable only for high level simulation and show their limits if the underlying hardware has to be considered with more details. A typical example is the memory accesses from native software for which the host machine’s simulator has no control. This problem is addressed in [126], where it is solved by an instrumentation of the source code that permits catch and remap memory accesses. In the approach detailed here, annotations can be used for performance estimation, but it is also possible to run nonannotated code on the simulation environment, keeping the interaction with the underlying hardware. Considering the lack of Operating System model for the System Level Design raised in [152] and [92], several propositions have been presented in [41, 92, 108, 152, 173]. In these approaches, native simulation of software tasks takes place over a very abstract model of the Operating System. Compared to these proposals that introduce the final Operating System only on ISS (CA or TA) based platforms, we advocate the use of a hardware executable model able to handle a real Operating System very early in the design flow. Thus, a large code base of the Operating System can be validated natively. The native simulation of a real Operating System has also been proposed in [39]. However, in this work the hardware resources like memories are considered local to the processors. This restriction is not suitable for Multi-Processors architectures where shared resources (memories, synchronization resources ...) are of utmost importance.
Simulation at Cycle Accurate and Transaction Accurate Levels
169
Note also that in most current approaches based on abstract hardware models, the interaction between the native software and the hardware model is generally kept implicit, even though the host hardware simulation platform implementation (e.g., SystemC[7]) can provide a detailed hardware view for the native software execution.
6.4.2
Basic Concepts
Doing abstraction is possible only if a well defined layered structure is defined, through an application programming interface (API). This ensures that the foundation over which one layer is built is completely defined, and forms itself the new foundation of the layer above it. Therefore, it is possible to implement exactly the behavior of the functions required by the API, using what seems the most appropriate for the considered level of abstraction. At the CA level, this will end up with load and store, but at higher levels, the host system will be able to substitute whole primitives, such as building a task context or doing memory copies, to the low level implementations. 6.4.2.1
Hardware Abstraction Layer API
A Hardware Abstraction Layer API is a set of functions that allows the software to interact with the hardware devices at an abstract rather than at a detailed hardware level. This abstraction layer hides the details of the physical hardware to the software that runs on it. The HAL is especially important for early designs of the portable Operating System for different hardware platforms. This portability remains valid for the Transaction Accurate model since there is no way to access the hardware but through the HAL. A well done HAL can be found in the eCos Operating System [75], as it provides a strict separation from the upper software layers. Each element of the HAL API is defined as a set of C macros that can be implemented by C functions or target processor assembly code as needed. Two classes of APIs provide a suitable HAL interface; they are respectively processor and platform dependent. 6.4.2.2
Processor HAL API
The processor HAL API abstracts the specificities of the processor. Typical APIs for context management are HAL CONTEXT [INIT|LOAD|SWITCH|SAVE], for low level interrupt management HAL IT [MASK|UNMASK] and HAL CPU TO PLATFORM and HAL PLATFORM TO CPU for endianness translation. HAL SPIN [LOCK|UNLOCK] provides low-level synchronization API which is essential for multi-processor support. These calls are implemented in case of SW lock support only, and a platform wide API is provided for HW semaphore engines. Other types of required calls are related to external accesses of different size, interrupt and exception handling and processor identification.
170
System Level Design with .NET Technology
Dynamically Loaded
Dynamic Library Binary (.so)
2
Software Application Part void entry() { ... function(); ...
void function() { ... HAL_READ(UINT32,); ...
C HAL API
C/C++ Wrapper
3
SystemC HAL API 1
SystemC module
port
function call
FIGURE 6.7: Transaction Accurate software API
6.4.2.3
Platform HAL API
In an heterogeneous architecture, processors can have different endianness. A reference endianness must be defined to allow communication between these processors. The reference endianness is attached to the platform and can be known with PLATFORM IS [BIG|LITTLE] ENDIAN. In case of hardware lock support, the HAL SPIN [LOCK|UNLOCK] synchronization calls are implemented in the platform dependent part. For multiprocessor purpose, numbers of processors in the platform are handled here. The platform HAL provides API for system clock management, e.g., PLATFORM CLOCK [INITIALIZE|CLOCK|ACK] 6.4.2.4
Transaction Accurate Level
The executable model of the Transaction Accurate level is written following an approach that ensures a unified system view of the hardware and software components [91]. Using this approach, it is possible to define a model which provides view of a software API on one side of the component and a hardware interface on the other side of the component. At Transaction Accurate level, the HAL API is provided through software ports as depicted in Figure 6.7 (1). The implementation of this API is made by modules which are interconnected through their ports. The HAL API supports upper software layers execution. The C source code of these layers (composed of the Operating System, the libraries and the application) is compiled for the host machine and encapsulated in a dynamic library as depicted in the Figure 6.7 (2). As different languages are usually used for the simulator and the
Simulation at Cycle Accurate and Transaction Accurate Levels
171
application, a wrapper (Figure 6.7 (3)) is needed to adapt the HAL API view of the software dynamic library. At initialization time, this wrapper will act as a program loader by linking the dynamic library to the simulation executable. Functions provided by the embedded software in the dynamic library can be called through the wrapper. This is typically the case of the Operating System boot process. From the application point of view, the functions provided by the API can be called as usual. The fact that they are not really implemented (in C or assembly for the target platform) but emulated through simulated components is completely transparent. The wrapper will convert these API C calls to port accesses (underlying simulation language method calls) and transfer execution to the connected module. Thus, the implementation of the C HAL macros at Transaction Accurate level consists only of function calls which are handled by the wrapper and redirected to the simulation model. The following source code shows an implementation of the HAL SPIN LOCK API for a SPARC processor and for native simulation in a C++ simulator environment. /* SPARC V8 implementation */ #define HAL_SPIN_LOCK(spin) \ { \ register uint32_t res; \ do { \ __asm__ ("lda [%1]0x20,%0 ":"=r"(res):"r"(spin)); \ } while (res != 0); \ } /* NATIVE implementation */ #define HAL_SPIN_LOCK(spin) __wrapper_hal_spin_lock(spin)
6.4.3 6.4.3.1
Native Simulation for MPSoC Key Ideas
The first key idea in this Transaction Accurate approach is abstract communication but still to keep visible the details within native simulation, such as shared resources between processors to be able to identify contention (e.g., variables in memory or synchronization mechanisms). The following draws up a list of essential issues to model an efficient and accurate native software simulation at Transaction Accurate level: Memory representation: As already mentioned, memories are usually considered private to each processor. This simplification is not acceptable in multiprocessor architectures, and more generally in architectures where multiple masters can access the same memory space. Software execution: The Transaction Accurate model of the platform should be able to model multiple executions of the same software binary code, which is the basis of Symmetric Multi Processor-like architectures. Synchronization: The support for a spinlock mechanism in a multi-processor architecture is necessary to guarantee the atomic access to shared resources.
172
System Level Design with .NET Technology
FIGURE 6.8: Simplified SW Node Architecture
6.4.3.2
Memory Representation
The Figure 6.8 depicts a Transaction Accurate executable model. In this figure, three memory mappings are represented: 1. The memory mapping of the processor sub-system (a crossbar in this example). At Cycle Accurate level, this memory mapping is specified by the architecture designers. 2. The memory mapping obtained by the software compilation for the host machine (Application + OS + Libs). This mapping is host machine dependent and defined at the dynamic library creation time. 3. The memory mapping seen by the simulator executable process which handles the simulation of the designed architecture. This mapping is also host machine dependent. Host machine memory representations (2) and (3) are classical mappings of executable software. The mapping can be summarized to three memory segments which are .text containing the binary code, .data for initialized global data5 and .bss for the global data to be zeroed. 5
Or static data, as a static data is a global data with a reduced scope of visibility.
Simulation at Cycle Accurate and Transaction Accurate Levels
173
The main difficulty is to manage two memory mapping point of views which are the host machine dependent and the user defined platform mappings (dynamic library and executable can be considered as one mapping). The solution which consists in using the target memory mapping in the sub-system model (1) and using remapping techniques in the memory component is not suitable because of total overlappings between these two memory spaces. The value of an address cannot be used to determine which memory space is concerned. The solution is to perform the memory mapping (1) over (2) and (3) at execution time. The mapping addresses given to the CROSSBAR component correspond to valid addresses in the host simulator process memory space. Undefined external symbols in the application dynamic library are automatically initialized at execution time by a runtime linker which ensures the homogeneity of the memory mapping between the native software and the modeled hardware. This implies that all needed symbols must be defined and present in the hardware model somehow, as explained thereafter. Hardware devices must implement their internal registers in contiguous memory (e.g., array of integers). The memory mapping is built during the initialization phase of the simulation by requesting the start address and the size of the register array of each component. The memory mapping consistency is ensured by the host machine. Memory components that provide a memory address space are implemented using the same approach. In Figure 6.8, the heap memory of the platform is allocated in the simulation executable address space. The .bss and .data sections of the software application have to be accessible by hardware components (e.g., DMA devices). During native compilation of software code, symbols identifying the beginning and the end of the .bss and .data sections are automatically created by the compiler (e.g., bss start and end for .bss section). These symbols are used by the MEMORY component and provided to the platform mapping to make these sections accessible through the CROSSBAR. 6.4.3.3
Software Execution
Software execution is supported by a module called Execution Unit (EU) in Figure 6.9. This EU which acts as an abstract processor is connected to a boot port of HAL API from where the entry point function provided in the dynamic library can be called (1). All the software is then executed sequentially (2) within the caller EU module context. Since software code is executed without time consideration, the application dynamic library has to be annotated in order to ”consume” simulation time. Several strategies can be used for that. The annotation approach used in [126], consisting of inserting C lines in an intermediate source code, as well as the approach based on “assembly level” C code [134] are adapted to our simulation model. An other promising approach that ensures similar target and host execution paths through the graph of basic blocks resulting from the compilation is proposed in [40]. Although software annotation is a key component of the native TA approach when performance estimation is a goal, this is not mandatory and nonannotated software can still be simulated on the proposed simulation platform (as if the target processor
174
System Level Design with .NET Technology
FIGURE 6.9: Software Execution on TA Model
had infinite frequency). Special care should be taken to avoid simulation deadlock (for example in case of busy-waiting using an infinite loop in the software code). Several EU can be connected to the same boot port of the HAL API, which give to our simulation platform the Symmetric Multi-Processor capability. As the software application has no notion of the module on which it is executed, the wrapper must redirect the C HAL API call to the correct port of the HAL API implementation (3). To do this, it is necessary to make use of a function that returns the current active module which can then be used to identify the corresponding port.6 6.4.3.4
Synchronization
As the platform provides a real environment that exhibits true parallelism, low level synchronization must be available to protect shared resources accesses as well as for Virtual Prototype platforms. Since the HAL SPIN [LOCK|UNLOCK] calls are processor specific, they will be implemented by the Execution Unit. The implementation will depend on the processor and communication we want to model. If the processor and the communication components do not support the test-and-set mechanism, additional dedicated components must be provided in the hardware platform model (e.g., the SEMRAM hardware semaphore engine in Figure 6.9). As explained in Section 6.4.2.4, the HAL SPIN LOCK(spin) HAL macro is defined as a hal spin lock(spin) C function call. When caught by the C/C++ wrapper, 6
This is typically done using the sc get curr simcontext() function when using SystemC.
Simulation at Cycle Accurate and Transaction Accurate Levels
175
this call is redirected to the correct Execution Unit through the corresponding HAL API port. As an example, the EU implementation of the spinlock is given below. void EU::__hal_spin_lock(spin) { register uint32_t res; do { crossbar.read(UINT32,spin,res); } while (res != 0); }
\ \ \ \ \ \
If test-and-set are supported by the hardware, EUs will use protected access provided by the communication module to test-and-set the semaphore directly in standard memory.
6.5
Summary and Conclusions
Modeling at different levels of abstraction is necessary for the new generation of highly programmable integrated circuits. The complexity of these circuits calls for methods that allow validation of the software prior to the availability of the hardware. To ensure this, a layered software approach must be followed. Two very important abstraction layers have been introduced in this chapter: the Cycle Accurate level that is used to validate the Hardware Abstraction Layer implementation, and check that the system timing properties can be indeed enforced, and the Transaction Accurate Level, that allows to validate the OS and communication implementations. To be efficient in this task, the TA level must reflect the real architecture and propose mechanisms that handle at a low implementation cost the memory references. For sure, abstraction has a cost, and currently timing estimates at high levels are rough. However, for annotated functional validation, the TA level is at least two orders of magnitude faster than the CA level,7 and is a level of choice also for the simulation of whole embedded applications.
7
It raises to 3 orders of magnitude for purely functional modeling.
7 An Introduction to Cosimulation and Compilation Methods Mathieu Dubois Universit´e de Montr´eal - Canada Fr´ed´eric Rousseau Laboratoire TIMA UJF/INPG/CNRS - France El Mostapha Aboulhamid DIRO Universit´e de Montr´eal - Canada 7.1 7.2 7.3 7.4
7.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cosimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compiler Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
177 180 190 202
Introduction
The embedded system design requires tools and methodologies to create products as soon as possible with a minimal cost and resources. For instance, a telecom application such as Wimax platform is a system that has several parts such as an antenna, an encoder, a FFT. There are two main areas, the first one is analog and the second one is digital. The tools for analog and digital applications are different. In an analog design, the goal can be to get a good amplification for an antenna. For the digital, a numerical treatment can be done by using formulas such as a FFT. In this chapter, we will focus on the digital world. Today, there are a lot of tools available on the market. Each digital tool is not efficient at each abstraction level [147]. These levels are the link between the main ideas of a product to a final implementation. When an application requires to be implemented, there are several steps. A top level is a modeling of a system like Wimax without detailed information and a bottom level is a physical implementation in a system on chip with all information such as latency, power, speed and area. A designer is able to mix these abstraction levels by using interoperability between
177
178
System Level Design with .NET Technology
tools. The aim is to be able to simulate a low abstraction level with a high level. The advantage is to accelerate the development time. The complexity of the application can be split in parts or functionalities with several design teams. Each team can be responsible for their subsystem and can be regrouped at the end to get a final product. The team can be connected at a high level model until they finish their refinements to bottom level without waiting for another team. One of the purposes of this chapter is to present different mechanisms about cosimulation and also compilation methods that can be used for simulating or compiling a model. A simulation engine executes a model. An engine and a model can be described within the same language like SystemC and ESys.NET [13, 159]. However, the simulation engine can be also described in a different language than the model. For instance, SystemC is a C++ library and ESys.NET is a C# library and their model has to be described in their respective language. In embedded system design, it is possible to define an application with several modules. These modules can have some functionality to do a specific task. For example, a VDHL’s module can be described as follows: 1 2 3 4 5 6 7 8
e n t i t y Module i s port ( Output1 : out s t d l o g i c ; Output2 : out s t d l o g i c ; Input1 : in s t d l o g i c ; Input2 : in s t d l o g i c ); end e n t i t y Module ;
In this case, the module is hardware but a system can have some modules in software or hardware. The link between them can be a communication channel as illustrated in Figure 7.1. There are two modules with some threads and they exchange
Threads Channels or events
Threads Channels or events
Channels Threads
Threads
Module
Module System
FIGURE 7.1: Channel model
An Introduction to Cosimulation and Compilation Methods
179
FIGURE 7.2: Framework for simulation of models
data through some interfaces by using a channel. This latter uses some specific functions and acts as a link between module’s. Users can specify their own communication mechanisms without changing the modules interfaces. For instance, a communication between two modules can be done by using a First in First Out (FIFO) channel for a user and a semaphore for another. A framework as illustrated in Figure 7.2 is needed to build a simulation engine. A simulation framework is based on four concepts: languages, models, simulation engines and abstraction levels. Languages are used to write a simulator and models by using a grammar and syntax. The models are composed of threads, modules, communication, channel, signals and events as previously illustrated in Figure 7.1. The key issue to build a simulation engine is to define some computation models. The computation models are one of the three primary domain types according to Rosetta [19]. There are also units-of-semantics and engineering domains. The unitsof-semantics are the extension of the computation models to define general computational models. The engineering domains are the extension of the units-of-semantics to provide specification capabilities for specific engineering domains. The computational models can be the finite state, the continuous time, the discrete time and the frequency models. A simulation engine can support one or more of them. In a level system design environment, many applications may need more than one computational model. For instance, a telecom device may have different models. A signal can be received by using an antenna and amplified to be converted from analog to digital. It needs to be modeled at least in three domains: frequency, discrete time and power domain. The interaction between abstraction levels (domains) is done by using translator and functor blocks in Rosetta. The translators are used to move
180
System Level Design with .NET Technology
FIGURE 7.3: Framework for cosimulation of models
data between domains by using interfaces. Functors define mechanisms for moving models between specification domains. Functor definitions play two important roles in Rosetta modeling. First, they are used to move information between domains to perform analysis. Second, functors are used to move information between domains to switch modeling abstractions. Interaction between domains needs an intermediate format with the sames types. Types can be events, signals and variables. When the types are not compatible, one needs to create a translator between them. Each model in a specific domain must be described to be executed by a specific computation model. Figure 7.3 illustrates the cosimulation between two frameworks and their interactions. ESys.NET and SystemC could be such frameworks. Both of them can support different computation models. The data and the communication between framework must be done by using an interconnection that both of them can access such as a shared memory, TCP/IP, etc. The way to connect the frameworks together is very important. For example, a SystemC simulation engine cannot be directly mapped with ESys.NET. The link between them must be compatible and have a right synchronization. A framework can be based on a managed or unmanaged code or both as will be described in the following.
7.2
Cosimulation
Cosimulation executes concurrently two and more simulation engines using some synchronization mechanisms. The aim is to bind two models in a different environment by using a cosimulation bus (Figure 7.4). There are three main methods that are widely used: same binary file, shared memory and TCP/IP. This section presents also new mechanisms such as COM, statical function, pinvoke and a managed wrap-
An Introduction to Cosimulation and Compilation Methods
181
FIGURE 7.4: Standard cosimulation with several kernels
per. These new mechanisms are well adapted to the .NET environment. This chapter focuses on the discrete time domain simulation. The synchronization between different engines is done on discrete time frontiers. These frontiers can be a common clock tick for synchronous systems, a common event or jumps from one discrete time to the next by a specified amount of time. More refined synchronizations such as delta time are abstracted. Unless specified otherwise we assume that a specific simulation engine acts as a server, the other as a client. The server has some entries points and the client uses them to control the server. Before delving in the different implementations we will describe first what is meant by managed and unmanaged code.
7.2.1
Preliminaries: Managed and Unmanaged Code
Managed code is computer program code that executes under the management of a virtual machine, unlike unmanaged code, which is executed directly by the computer’s CPU [201]. Managed code has the benefits of programmer convenience and enhanced security guarantees. Microsoft uses languages for creating managed code like C#, Visual Basic.NET and C++/CLI. Programs in any programming language can be compiled into either managed or unmanaged code. In practice, however, each programming language is typically compiled into one type. For instance, the Java programming language is almost always compiled into managed code, although there are Java compilers that can generate unmanaged code such as GNU Compiler. A virtual machine is a software implementation of a machine (computer) that executes programs like a real machine[203]. Indeed, there is no direct correspondence to any real hardware. For instance, a program in C# receives services from the Common Language Runtime (CLR) which is a component of the .NET framework. By providing these services to the program, the .NET software is acting as a virtual machine, taking the place of the operating system or hardware for which the program would ordinarily be tailored. Virtual machines are separated into two major categories, based on their use and degree of correspondence to any real machine. The
182
System Level Design with .NET Technology
first one is a system virtual machine that provides a complete system platform which supports the execution of a complete operating system (OS). The second one is a process virtual machine that it is designed to run on a single program, which means that it supports a single process. Another important characteristic of a virtual machine is the software running inside is limited to the resources and abstractions provided by the virtual machine and it cannot go outside.
7.2.2
Same Binary File
The same binary file is used by two engines to cosimulate complementary parts of a model. Dynamic and static approaches are used for cosimulation. Seamless [175] uses a dynamic approach; it can call a library at the run time without knowing the information about the called library. SystemC uses a static approach; it knows all libraries before its execution. The static libraries are compiled with the main source code before the execution. The dynamic libraries are precompiled and linked with the execution of the program. The link between tools can be done by declaring some prototypes of functions that are used in common.
7.2.3
Shared Memory
A same memory space is shared by both engines. This space is initialized before executing the model. The shared memory is divided in two main parts: the exchanged data and the controls necessary for synchronizing the simulators. This can be helpful for a sort of asynchronous cosimulation at the Transaction Level Modeling. For instance, the Figure 7.5 illustrates three consumers and a producer. There are three FIFOs for the model and three other ones to do the communication with the shared memory. Each thread in both languages can be executed concurrently without reducing their simulation speed because FIFOs are used to buffer data and regulate the throughput.
7.2.4
TCP/IP
It is a standard TCP/IP communication. At the initialization, the server listens to the communication port and the client has to ask for a connection. In every abstraction level, the data between simulators must be encapsulated in TCP/IP format. It requires a communication structure that both server and client can understand. At the run time, an event should be mapped to a synchronous or asynchronous link and also the synchronization should be transmitted by TCP/IP. At a specific time, the client can read data from the channel and resume its execution by advancing its time and if needed it sends an answer to the server. A client can also wait for a TCP/IP packet to resume its execution. Another possibility is that the server and the client can advance their simulation time for a specific amount of time. The next sections will present some cosimulation methods specific to .NET use.
An Introduction to Cosimulation and Compilation Methods
183
FIGURE 7.5: Shared memory example
7.2.5
COM
Component Object Model (COM) is an interface standard introduced by Microsoft in 1993 [42, 199]. It is used to enable interprocess communication and dynamic object creation in a large range of programming languages. It is a language-neutral way of implementing objects that can be used in environments different from the one they were created in, even across machine boundaries. A user does not need to know the internal implementation of objects. An object must be created and destroyed after using it. The main advantage is that the interface is separated from its implementation. The interface has some functions that can be used by a user. A COM cosimulation between two simulators is also possible by creating some functions to control a targeted simulator. For instance, we create a COM object with one or two modules and all functions of the targeted simulator to permit its controls. Two main functions are initialization and close the simulator. Moreover, functions are needed to transmit input and output data of models included in the COM object. A last element is needed for a right synchronization, a function that permits advancing the time of the simulator in the COM. Figure 7.6 illustrates this client/server relationship. In this case, ESys.NET is compiled with its models in a COM. The following code illustrates SystemC interaction with the COM. Lines 1 to 12 initialize the COM, complete a simulation and release the COM. Lines 13 to 18 describe an interface to communicate with ESys.NET. 1 2 3 4 5 6 7
C o I n i t i a l i z e (NULL ) ; / / I n i t i a l i z a t i o n o f SystemC hr= spmatesys2 . C r e a t e I n s t a n c e ( u u i d o f ( matesys ) ) ; i f (SUCCEEDED( h r ) ) { s p m a t e s y s 2 −> i n i t ( ) ; } e l s e { p r i n t f ( ” FAIL\n ” ) ; } f o r ( i = 0 ; i m a t I n p u t 1 = d a t a o u t 1 ; s p m a t e s y s 2 −>S t e p ( ) ; / / Advance t h e s i m u l a t i o n s t e p
184
System Level Design with .NET Technology
FIGURE 7.6: Cosimulation with COM
8 9 10 11 12
13 14 15 16 17 18
clk =true ; s c c y c l e ( 5 0 ) ; clk = f a l s e ; s c c y c l e ( 5 0 ) ; d a t a i n = s p m a t e s y s 2 −>m a t O u t p u t 1 ; } / / S e t t h e o u p u t s i g n a l s s p m a t e s y s 2 −>EndEsys ( ) ; / / C l o s e t h e s i m u l a t o r } s p m a t e s y s 2 −>R e l e a s e ( ) ; / / R e l e a s e t h e s i m u l a t o r } C o U n i n i t i a l i z e ( ) ; / / R e l e a s e t h e COM o b j e c t } I n t e r f a c e a c c e s s i b l e ( I n E s y s COM) public i n t e r f a c e Imatesys2 { double matInput1 { s e t ; } double matInput2 { s e t ; } double matOutput1 { g e t ; } v o i d S t e p ( ) ; v o i d i n i t ( ) ; v o i d EndEsys ( ) ; }
The ESys.NET initialization permits to create the called modules and to make necessary connections to the COM by using a class. Functions D and E are for the writing or the reading of the modules in the COM. Function N represents other user defined functions (Figure 7.6).
7.2.6
Static Function
This approach allows binding managed and unmanaged simulators in a same file. A static function is useful when an unmanaged code has to call a managed code. Figure 7.7 illustrates an unmanaged simulator (SystemC client) interaction with a
An Introduction to Cosimulation and Compilation Methods
185
FIGURE 7.7: Static function
managed simulator (ESys.NET server). In this approach, ESys.NET is wrapped in managed C++ (a C++ with managed extensions), and static intermediate functions are created to control and exchange data between simulators. A client can call a static function with a specific choice of actions. Choices 1 to 3 control ESys.NET simulation progression in synchronization with SystemC and choice 4 to N are meant for exchanging data between simulators. Exchanging data can be done also by adding global variables to bind SystemC with ESys.NET.
7.2.7
Pinvoke
By opposition, to the previous method, we explore a scheme where SystemC acts as a server and ESys.NET as a client. To achieve this the SystemC library is encapsulated in a dynamic link library (DLL) with entry points that can control its kernel. A DLL can contain code or resources which are available for other applications [155]. We use a platform invocation service, commonly referred to as P/Invoke [202]. It enables managed code to call unmanaged code. The native code is referred via the metadata which describe the exported functions of a DLL. Figure 7.8 illustrates this cosimulation. In C#, we will call the SystemC’s DLL in the following way: 1 2 3 4 5
[ D l l I m p o r t ( ” MyCustomDll . d l l ” ) ] public s t a t i c extern void I n i t ( ) ; [ D l l I m p o r t ( ” MyCustomDll . d l l ” ) ] p u b l i c s t a t i c e x t e r n v o i d Run ( i n v e c t o r invect , o u t v e c t o r outvect , int simtime ) ;
For the run function, we pass the entry vectors and outputs as well as the time that we want to advance in the SystemC simulator. There are three sub-functions: PutStim, Stimulate and GetOutput. 1
e x t e r n ”C” v o i d Run ( i n v e c t o r ∗ i n v e c t o r i n t ,
186
System Level Design with .NET Technology
FIGURE 7.8: Example of pinvoke methods
2 3 4 5 6 7 8
o u t v e c t o r ∗ o u t v e c t o r i n t , int simtime ) { PutStim ( i n v e c t o r i n t ) ; / / Put t h e e n t r i e s S t i m u l a t e ( s i m t i m e ) ; / / Advance t h e s i m u l a t o r GetOutput ( o u t v e c t o r i n t ) ; / / Give t h e answer o f t h e e x i t flushall (); }
First, the entries of the SystemC’s module are fed with the outputs of the client module (ESys.NET). Second, SystemC simulator advances the time. And to finish, the last function recovers information from the SystemC’s module. The following codes show the functions in detail. 1 2 3 4 5 6
e x t e r n ”C” v o i d P u t S t i m ( v o i d ∗ I n v e c t o r ) { INVECTOR ∗ p I n v e c t o r = ( INVECTOR ∗ ) I n v e c t o r ; c l k = p I n v e c t o r −>d a t a ? t r u e : f a l s e ; d a t a i n = ( s c u i n t ) p I n v e c t o r −>d a t a [ 1 ] ; }
PutStim receives the entry’s vectors and they are mapped to the SystemC signals. 1 2 3 4
e x t e r n ”C” v o i d S t i m u l a t e ( i n t s i m t i m e ) { s c c y c l e ( ( unsigned long ) simtime ) ; }
Stimulate uses the function Sc cycle of SystemC to advance the SystemC simulator. 1 2 3 4 5 6
e x t e r n ”C” v o i d G e t O u t p u t ( v o i d ∗ O u t v e c t o r ) { OUTVECTOR ∗ p O u t v e c t o r = (OUTVECTOR ∗ ) O u t v e c t o r ; p O u t v e c t o r −>d a t a [ 0 ] = d a t a o u t 1 . r e a d ( ) ; p O u t v e c t o r −>d a t a [ 1 ] = d a t a o u t 2 . r e a d ( ) ; }
Once the simulator has advanced the time, we read the signals in a defined structure. Finally, the following ESys.NET’s code allows to cosimulate an ESys.NET’s model with a SystemC’s model.
An Introduction to Cosimulation and Compilation Methods
187
FIGURE 7.9: Managed wrapper
1 2 3 4 5 6 7 8 9 10 11 12 13 14
MyDLLWrapper DLLWrapper = new MyDLLWrapper ( ) ; sim . AssembleModel ( ) ; MyDLLWrapper . I n i t ( ) ; f o r ( i n t i = 0 ; i I n i t ( ) ; } void S t i m u l a t e ( unsigned long simtime ) { ac−>S t i m u l a t e ( s i m t i m e ) ; } ˜ SystemCWrapper ( ) { d e l e t e a c ; } };
The following class gives access to SystemC, because the term nogc defines an unmanaged class. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
nogc c l a s s SystemCClass { public : void I n i t ( void ) ; void S t i m u l a t e ( unsigned long simtime ) ; }; void SystemCClass : : I n i t ( ) { the module . clk ( clk ) ; the module . data out1 ( data out1 ) ; the module . data out2 ( data out2 ) ; the module . d a t a i n ( d a t a i n ) ; sc initialize (); } void SystemCClass : : S t i m u l a t e ( unsigned long simtime ) { s c c y c l e ( simtime ) ; }
The control and data of SystemC are encapsulated in a class. 1 2 3 4 5 6
#pragma unmanaged s c s i g n a l c l k ; s c s i g n a l d a t a o u t 1 ; s c s i g n a l d a t a o u t 2 ; s c s i g n a l d a t a i n ; module t h e m o d u l e ( ” t h e m o d u l e ” ) ;
Thus, the following line presents the way of binding the signals of two modules in different simulators. 1
sym−>a d d e r −>p o r t a −>V a l u e = d a t a o u t 1 ;
An Introduction to Cosimulation and Compilation Methods
189
The segment of left is a signal of an ESys.NET’s module and dataout1 is a SystemC’s signal. The last step to cosimulate two models is shown by the following code. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
#pragma managed i n t t m a i n ( i n t a r g c , TCHAR∗ a r g v [ ] ) { i n t i =0; i n t count =0; S i m u l a t o r ∗ sim ; MySystem ∗sym ; SystemCWrapper ∗ wsystemc ; sim = new S i m u l a t o r ( ) ; sym = new MySystem ( sim , ” t o p ” ) ; wsystemc = new SystemCWrapper ; sim−>AssembleModel ( ) ; / / B u i l d E s y s model wsystemc−> I n i t ( ) ; / / SystemC i n i t i a l i z a t i o n f o r ( i = 0 ; i a d d e r −>p o r t a −>V a l u e = d a t a o u t 1 ; sym−>a d d e r −>p o r t b −>V a l u e = d a t a o u t 2 ; sim−>S t e p ( 1 ) ; clk =1; wsystemc−>S C D U T S t i m u l a t e ( 5 0 ) ; clk =0; wsystemc−>S C D U T S t i m u l a t e ( 5 0 ) ; d a t a i n =sym−>a d d e r −>p o r t c −>V a l u e ; };
We always find the initialization functions and simulation for the time advancement. No function is needed to bring the inputs and outputs to another one.
7.2.9
Comparison of Cosimulation Implementations
For our experimentation, we used a clocked version of a simple bus [114] as illustrated in Figure 7.10, with master and slave modules. Each master can exchange data using a channel. Masters are modeled in ESys.NET except one which is modeled in SystemC. Master 1 is a blocking master that can lock the bus. Master 2 has a non-blocking access to the bus, while master 3 has a direct access. There are two memory slaves, the first one has no wait state and the second one is programmable with a wait state. An arbiter controls the grant of the bus to the different masters. An ESys.NET wrapper translates a channel communication to a set of clock cycle transactions. The wrapper retrieves data from the bus by using a bus port.diect read function and delivers it to the master modeled in SystemC. We used the different mechanisms presented earlier, Com, Pinvoke, TCP, MW, FCT and shared memory to cosimulate these modules of simple bus. Execution time is illustrated in Table 7.1.
190
System Level Design with .NET Technology
FIGURE 7.10: Simple bus for a cosimulation between SystemC and ESys.NET
The experimentation was performed on an IBM ThinkPad 1.5GHz and 512M. The best approaches are COM, TCP/IP and Pinvoke and the worst cases are MW, static function (FCT) and shared memory. The fastest way is COM because it does not need to encapsulate data into a structure. TCP/IP has more headers to transmit than COM. Simulations in the same files like MW and FCT do not seem to be the fastest way to simulate a heterogeneous model. It is clear that for the .NET environment, COM and Pinvoke are the privileged mechanisms; however, if we need asynchronicity and a mechanism not bound to .NET, TCP/IP is a strong contender for efficient cosimulation.
Table 7.1: SystemC and ESys.NET cosimulation[72] Methods COM Pinvoke TCP/IP Managed wrapper Improved FCT FCT Shared memory (estimated)
Time (ms) 1 903 2 800 3 470 25 200 54 188 64 062 3 288 800
An Introduction to Cosimulation and Compilation Methods
7.3
191
Compiler Framework
As we just saw, cosimulation is an important aspect in the design flow of heterogeneous systems. Systems are heterogeneous in terms of the nature of components (hardware or software), type (DSP, general-purpose processor . . . ), capabilities, performance, as well as abstraction levels and modeling languages during the design process. This heterogeneity leads to different models and semantics, using different languages such as ESys.NET [135] and/or SystemC [159]. Therefore, each model may be executable by at least one simulation kernel, and a communication bus is required in the cosimulation model to support communication between simulation kernels mainly for synchronization or for the data exchange between components. This is far from being efficient in terms of performances as communication is important as we saw in the previous sections of this chapter. In the remainder of this chapter, we will show acceleration that can be done by using an internal representation instead of a cosimulation mechanism. Figure 7.11 summarizes our methodology. As we mentioned earlier, the initial description may contain different models written in different languages and using different simulation kernels. In Phase 1, any model is transformed into a common intermediate format; therefore we get rid of the syntax differences between different models. In Phase 2, we create a set of objects or data structures that will fit our needs; and finally these data structures will be used, in phase 3, by different algorithms to generate an efficient self contained-code that can be used for simulation purposes. We will illustrate this approach on two important levels of abstractions: the Register Transfer Level (RTL) and the Transaction Transfer Level (TLM).
7.3.1
Common Intermediate Format
XML [105] is chosen for the Common Intermediate Format because it is a widely used standard for information exchange. Commercial open source tools working around this format are available. In phase 1, the framework starts from the initial heterogeneous models and grammars of the various languages to build a XML file. There are two main ways to bring a language in the framework. The first one is to build an abstract syntax tree (AST) by using ANTLR [163]. It is a tree representation of the abstract structure of source code written in a certain programming language. ANTLR was developed to build recognizers, compilers and translators of the models in Java, C#, C, Verilog, etc. and has been developed since 1989. ASTs can be adapted to the customer requirements by an engine according to the user’s need. This task can be hard. The second one consists in finding an XML dumper for the given language. Extensible Stylesheet Language Transformations (XSLT) can be used to transform XML file according to the framework requirement [196].
192 System Level Design with .NET Technology
FIGURE 7.11: Framework for simulation of models
An Introduction to Cosimulation and Compilation Methods
193
FIGURE 7.12: Producer Consumer using a FIFO channel
7.3.2
Internal Data Structures
In phase 2, starting from the XML common representations internal graphs are produced by using QuickGraph. Quickgraph provides generic directed/undirected graph data structures and algorithms for .NET 2.0 and up [16]. It comes with algorithms such as depth first search, breath first search, etc. These internal graphs are: control dependence graph (CDG), flow dependence graph (FDG), program dependence graph (PDG), system dependence graph (SDG). These graphs are useful for many applications. For example they can be used to slice a VDHL code[125] and also to analyze code. Slice operations consist in extracting a part of some code that can be analyzed and transformed if necessary. Graphs can be also split or merged. In the following, we will use a Producer Consumer application (Figure 7.12) to illustrate the steps of this phase. Two modules B and C exchange data by using interfaces through a FIFO channel. B is a producer and C is a consumer. This is a typical example of transaction level modeling. In Figure 7.13 to Figure 7.16 , the graphs will contain nodes representing statements of the code. These statements are: if(diamond), while(double circle), function(3d box), process(parallelogram), return(pentagon), class fields (box), call(triple octagon), other(ellipse). public class fifo : BaseChannel, MyInOut { 25. private int max = 10; 26. private char[] data = new char[10]; 27. private int num_elements; 28. private int first; 29. private Event write_event = new Event(); 30. private Event read_event = new Event(); public void write(char c) { 34. if (num_elements == max) 35. Wait(read_event); 36. data[(first + num_elements) % max] = c; 37. ++num_elements; 38. write_event.Notify(0);} public void read(ref char c) { 43. if (num_elements d(u) + w(u, v) then report that a negative-weight cycle exists end if end for
8.3.2
Max Constraint Systems
For the timing specification of systems with max-only constraints, max(t(ei ) + li j ) i
≤ t(e j ) ≤ max (t(ei ) + ui j ), we have to find the greatest lower bound and the greatest i
upper bound. An algorithm of finding the maximum separation time between any two events in the graph with only max constraints is based on the ideas presented in [194]. For each pair of nodes we have to find a greatest lower bound and a greatest upper bound. A greatest lower bound can be found by minimizing the total delay for all l-bound paths of the nodes and a greatest upper bound for the events ei and e j can be received by means of maximizing the separations ski for predecessor ek of e j . Then, in order to compute the maximum separation time between two events ei and e j we have to consider two cases: 1. There is a path from e j to ei ending in the source node. In this case all delays have to be set to their lower bounds. 2. There is no path from e j to ei ending in the source node. For this case we have
210
System Level Design with .NET Technology to maximize t(e j ) − t(ei ) by maximizing t(ek ) − t(ei ) for each predecessor ek of e j .
ALGORITHM 8.2 Maximum separation in max-only constraint systems Step 1: if i = j then return 0 end if if max separation already is calculated then return previously calculated value end if for each v ∈ V \{s} do d(v) ← ∞ end for Step 2: if path between ei and e j exists then Max ← ∞ for each vertex k, predecessor of vertex j do newMax ← MaxSep(ei , ek ) − lk j {lk j : lower bound of the arc k → j} if Max ≤ newMax then Max ← newMax end if end for else Max ← −∞ for each vertex k, predecessor of vertex i do newMax ← MaxSep(ek , e j ) + uki {uki : upper bound of the arc k → i} if Max ≤ newMax then Max ← newMax end if end for end if The complexity of the algorithm computing the separations si j between all pairs of events for the graph with the number of events (vertices) n and the number of edges e is O(ne) ≤ O(n3 ). We applied the algorithm presented above to the graph shown in the Figure 8.6 [194]. The results are given in the Table 8.2 and correspond to what was obtained in [194].
211
Timing Specification in Transaction Level Model 50, 80
A+
0, ∞
A-
135, ∞
B+
0, 20
50, 80
30, 40
B-
C+
FIGURE 8.6: Max only constraint timing model
TABLE 8.2: Maximum separation time for Figure 8.6
A+ B+ AC+ B-
8.3.3
A+ 0 -135 -55 -165 -135
B+ 8 0 8 -30 0
A80 -55 0 -85 -55
C+ 8 40 8 0 40
B8 25 8 -5 0
Max-Linear Systems
Max-linear systems are the most discussed in the literature systems. The maxlinear temporal model is widely used in the description of interfaces. Several algorithms were proposed to solve the timing verification problem for such systems. McMillan and Dill [148] proposed a graph-based algorithm with a worst case exponential running time. An interesting algorithm for max-linear systems was proposed by T. Y. Yen et al. [210]. According to the experimental results [52] [53] this algorithm is quite efficient but the exact complexity has not been given. The algorithm uses the “iterative tightening from below” approach and is based on two steps. The first step consists from a generation of a special intermediate graph, compulsory graph, containing the arcs representing bounds that must be satisfied (compulsory bounds). For max events the compulsory arcs are the arcs representing all lower bounds. For min events – arcs representing upper bounds are compulsory. Finally, for linear constraints, both lower and upper bound are compulsory. From this graph the smallest separation values that satisfy the compulsory bounds are obtained to serve as an initial estimation for the tightening process. DEFINITION 8.3 [210] Given an event graph G = (V, E) and a source node s, the corresponding compulsory graph Gc = (V, Ec ) is a weighted directed
212
System Level Design with .NET Technology [1,1]
a
[1,1]
[2,2] [0,500] s
b
[0,500] [2,2]
[2,∞]
c
h 0
e
(a)
f
0
0
[1,1] [1,1]
2
j s
c 1
-2
0
[2,∞] d
1
a
-∞
b
-500 2
0 -500 -2
2
d
j -∞ h
1
2 1
e
0 f
(b)
FIGURE 8.7: An event graph (a) and the corresponding compulsory graph (b)
graph, where Ec contains the following edges. For each linear or max constraint ci j ∈ E, ei j ∈ Ec and has weight weightc [ei j ] = ci j .lower. For each linear constraint ci j ∈ E, e ji ∈ Ec and has weight weightc [e ji ] = −ci j .upper. For each node i, esi ∈ Ec and has weight -MAXINT, where MAXINT is an arbitrary number larger than any sum of the absolute values of the finite bounds, but obeying the arithmetic rules of finite numbers. A compulsory graph for the graph presented in the Figure 8.7(a) is shown in the Figure 8.7 (b). The second step of the algorithm consists of constructing another intermediate graph, a slack graph, defining the values by which the max (min in the case of min-linear systems) constraint arcs can be tightened. After that the upper bounds for max constraints, lower bounds for min constraints, are reintroduced in the compulsory graph and the initial separations are iteratively relaxed according to the slacks. DEFINITION 8.4 [210] Given an event graph G = (V, E) and a source node s, and a separation value sepa [i] for each node i ∈ V , the slack graph Gs = (V, Es ) is a weighted directed graph, where Es is defined as follows: for each constraint cxy ∈ E, construct two edges exy and eyx and first define exy .bound = cxy .upper eyx .bound = −cxy .lower For each ei j .bound, when sepa [i] 6= ∞ or sepa[ j] 6= ∞, add edge ei j to Es with edge weight weights [ei j ] = ei j .bound − (sepa [ j] − sepa [i]) if the weight is nonnegative. When sepa [i] = sepa [ j] = ∞ add edge ei j with weights [ei j ] = 0. Mark ei j ∈ Es as max-optional if ci j is a max constraint; otherwise it is compulsory. If a node u is a max event and no max-optional edge enters u, add a compulsory edge esu with weight zero.
213
Timing Specification in Transaction Level Model 3
2 a [2,2] 0 s
[0,500] [0,500] [2,2]
0
[1,1]
b 0
0
[1,1]
0
0
s 0
d [1,1] f
3
2
500
0
2
d 1
0
0
2
0 j 0
-∞
0
3
h
(a)
1
b 1
500
3
j
[2,∞]
e
c 0
0
0
[2,∞]
[1,1]
a 0
c
-∞ h 0
1
0 e 0
1
0 f 0
(b)
FIGURE 8.8: An event graph (a) and the corresponding slack graph in the first iteration (b)
In the Figure 8.8(a) the event graph is shown with the calculated initial time separations (labels of the nodes). Figure 8.8(b) gives the slack graph after the first iteration. The value inside each node represents its maximal slack. Pseudo code for two procedures [210], computation of shortest slack and maximum separation calculation, is given below. ALGORITHM 8.3 Shortest slack calculation Shortest slack is calculated from a source node s for each node in a slack graph Gs Step 1: Initialization for each node ei do Put node ei in Queue Q Initialize safe slack estimate d[i] ← ∞ n[i] ← number of max-constraints d[s] ← 0 end for Step 2: while Queue is not empty do Find node ei in the Queue with a minimum d[i] for each edge outgoing from this node in the slack graph Gs do relax the edges: t ← d[i] + weight(i → j) if node e j is a max event then n[ j] − − m[ j] ← min(t, m[ j]) if n[ j] = 0 and d[ j] > m[ j] then
214
System Level Design with .NET Technology
d[ j] ← m[ j] end if else d[ j] ← max(d[ j],t) end if Remove ei from the Queue end for end while At the end of the execution of shortest slack procedure, d[i] is a shortest slack estimate from a source node s for each node ei . This procedure is used for the computation of the maximum achievable separations for a constraint graph. ALGORITHM 8.4 The MaxSeparation algorithm The maximum separations are found from a source node s for an event graph G with max and linear constraints Step 1: Construct the compulsory constraint graph Gc Step 2: Calculate longest paths in the graph Gc from a source node to each other node if a positive cycle exists then return inconsistent end if Step 3: Initialization of the initial separations for each node ei do Set Separation from s to ei = the weight of the longest path from s to ei end for Step 4: Iterative relaxation repeat Construct the slack graph Gs Calculate the shortest slack from a source node s to each node using the previous procedure for each node ei do if the shortest slack is ∞ then set separation from s to ei to ∞ else if the shortest slack > 0 then Increase separation by the shortest slack end if end for until the shortest slacks do not change
Timing Specification in Transaction Level Model
215
if all constraints in graph G are satisfied then return problem is consistent else return problem is inconsistent end if The authors conjecture that the complexity of this algorithm is O(V E +V 2 log(V )).
8.3.4
Min-Max Constraint Systems
McMillan and Dill proved that the problem with max and min constraints is NPcomplete [148]. To proof NP-completeness of the min/max constraint problem McMillan and Dill used the reduction from 3-SAT. THEOREM 8.1
[148] 3-SAT is reducible to min/max problem.
PROOF Let ϕ be a 3-CNF with n variables a, b, c, . . . ; pa, pb, pc , . . . na, nb, nc, . . . corresponding to the positive and negative literals respectively. Formula ϕ has m clauses. We demonstrate the reduction from 3-SAT to min/max problem that converts formulas to graphs. Structures within the graph are designed to mimic the behavior of the variables and clauses. Create graph G of a min/max problem as follows: Let s be a source event and pa, pb, pc, . . . na, nb, nc, . . . - events in constraint graph G subject to the constraints: t pv = ts + ∂spv , where 0 ≤ ∂spv ≤ 1 tnv = ts + ∂snv , where 0 ≤ ∂snv ≤ 1 The time of each event pv, nv, ranges between 0 and 1 relative to the source event s. Graphically this is expressed by the arcs going from the source event to each event pv, nv (Figure 8.6). For each pair of events pv, nv we construct a min event mv: tmv = min(t pv ,tnv ). There is also a max event q that is the latest of the mv events: tq =
(tm j )
max j∈{a,b,c,...}
For each clause we construct a max event fi which is the latest of the correspondent events pv, nv forming the clause, for example, for the clause pa + pb + pc: t fi = max(t pa ,t pb ,t pc )
216
System Level Design with .NET Technology
Finally, we construct a min event r that is the earliest of the fi events: tr = min (t fi ) 1≤i≤m
We have to find the maximum separation sqr in the constructed constrained graph. To prove that this reduction works we have to demonstrate that the maximum separation sqr equals 1 if and only if the formula ϕ is satisfiable. 1. Suppose the formula ϕ has a satisfying assignment. In that satisfying assignment at least one literal is true in every clause. For either assignment of variables pv = 1, nv = 0 and pv = 0, nv = 1 it follows that the execution time of the min events mv coincides with the earlier of pv or nv and is equal in all cases to 0. A max event q coincides with the latest of the mv events and as the mv events fire at 0, thus tq = 0. The firing time of max events f1, f2 will be always 1 since at least one delay from events representing clause variables pv , nv must be 1, leading to the defining of the execution time of the event r being 1. Thus, sqr = max(tr − tq ) = 1. 2. Suppose now that the maximum achievable separation sqr = max(tr − tq ) = 1. It means that tr = 1 and tq = 0. As tr = 1 and r is a min event, then both events f1 and f2 have to fire at time 1, t f1 = 1,t f2 = 1. Since tq = 0, at least one delay from each pair pv , nv must be 0. Thus, if pv = 1 and nv = 0, this assignment to the variables satisfies ϕ because according to the firing time t f1 = 1,t f2 = 1 each triple of events representing a clause in the graph contains a literal that is assigned TRUE. Therefore we have a satisfying assignment of the formula ϕ. Figure 8.9 shows an example of the constraint graph corresponding to the 3-SAT formula ϕ(a, b, c, d) = (a + b + c)(a + b + d). In the graph the pentagones represent min events, the squares represent max events. The numbers over the events indicate a solution of the timing constraints for satisfiable assignment of the formula ϕ.
8.3.5
Min-Max-Linear Constraint Systems
The general min-max-linear constraint problem is also NP-hard. This was demonstrated in [46] using a reduction from 0-1 integer programming. This fact justifies a branch and bound approach for resolving such problems. McMillan and Dill [148] proposed an algorithm where min constraints are eliminated by assuming that one of the min constraints is less than all the others. This results as an instance of the max-linear constraint problem is generated. The solution is the maximum of the separations for all max-linear subproblems. T. Y. Yen et al. [210] proposed an algorithm that handles all three constraint types modifying their max-linear algorithm (Section 8.3.3). In this algorithm the min constraints are recursively eliminated too, as in McMillan and Dill algorithm, each time
217
Timing Specification in Transaction Level Model 0
na
0,0
ma
1
pa
0
0,0
0
0,0
q
0,0
0,0
0,0
0
pb
0,0
0
mb
1 0,0
t1 0 0,
0 ,1
s
r
0,1
1
0,
nc
1
1
0,0
0
1
0,0 0 ,0
1 0,0
0,0
1 0,
mc
0
0,1
pc
0
0, 0
0,
nb
0,0
0, 1
0,1
1
t2
0,0
1
nd
0,0
0 0,
md
0
pd
0
0,0
FIGURE 8.9: Reduction from 3-SAT to min/max problem
generating a max-linear subproblem. If some min constraint has been satisfied by other constraints, this constraint has to be chosen; as for each min event only one earliest constraint has to be found. Furthermore, when a new subproblem is generated the algorithm continues with the current state, not from the beginning in order to optimize the calculation process.
8.3.6
Assume-Commit Constraint Systems
The assume-commit constraints were introduced to reflect the input/output nature of events [127, 78, 79]. The maximum separation time computation guarantes that the given system of constraints is consistent, i.e., has at least one solution, but this solution may not be realizable if we take into account the input/output nature of events. The input events are the events that cannot be controlled by the system and the timing for these events has to be satisfied for each value in the bounded interval [79]. On the
218
System Level Design with .NET Technology
contrary, the output events are under the control of the system and can be constrained in the given interval if needed. Distinguishing the input/output nature of events led to the definition of two constraint’s types: commit constraint and assume constraint. DEFINITION 8.5 [78] Considering the events ei and e j with constraint Ci j = (ei , e j , [li j , ui j ]), where li j ≤ t(e j ) − t(ei ) ≤ ui j . Ci j is a commit constraint if e j is an output event; otherwise it is an assume constraint. A heuristic method based on the local consistency property was used for solving interface timing specifications with commit-assume constraints [78] and is based on the following reasoning. DEFINITION 8.6 [78] An event (node) is said to be a convergence event (node) if it has more than one parent. DEFINITION 8.7 [78] Let z be a convergent node of event graph EG = (E,C), and P(z) is a set of its parents in EG. Let EG0 = (E 0 ,C0 ), where E 0 = E\{z}, and C0 = C\{(ei , z, [l, u]) ∈ C}. The node z is locally consistent if ∀e1 , e2 ∈ P(z) the maximum separation time s12 of e1 , e2 (respectively s21 of e2 , e1 ) computed over EG’ is less than or equal to the maximum separation time s12 (s21 ) computed over the subgraph contains only the set of nodes: {e1 , e2 , z}. DEFINITION 8.8 [78] An event graph, EG = (E,C), is locally consistent if each convergent node z is locally consistent. The local consistency property means that the maximum separation time between each pair of convergence node parents computed over the graph without this node and corresponding constraints is less than or equal to the maximum separation time computed over the graph formed from convergence node and its parents. The idea behind the local consistency property is that the satisfaction of the local consistency condition guarantees an existence of at least one realizable relative schedule for each locally consistent node and correspondently for locally consistent graph. In the event graph of the Figure 8.10 (without dashed arcs) the nodes ik (k = 1 . . . 3) are input events and the nodes os (s = 1 . . . 5) are output events. This graph is not locally consistent. Two convergent nodes o4 and o5 do not verify the local consistency property. An algorithm looks for a locally consistent graph. In the case when the initial event graph does not verify the local consistency property the method determines which commit constraint can be modified or added without altering the given assume constraints. For example, for o5 node the maximum separation time between the nodes i2 and i3 computed over the graph without this node and corresponding constraints (Figure 8.11, (a)) is −40 ≤ t(i3 ) − t(i2 ) ≤ 40.
219
Timing Specification in Transaction Level Model o1
[10, 40]
o3
[10, 20]
[40, 60]
i2
[10, 60] [-10, 20]
i1
[10, 30]
[-40, 20]
o5
[20, 50]
[30, 80] o2
[10, 20]
o4
[10, 30]
i3
FIGURE 8.10: Event graph o1
[10, 40]
o3
[10, 20]
i2
i2
[10, 60]
[40, 60] [10, 30]
i1
[-40, 40]
[-40, 30]
[20, 50]
o5
[30, 80] o2
[10, 20]
o4 (a)
[10, 30]
i3
i3 (b)
FIGURE 8.11: Subgraphs used in the determination of local consistency property for the node o5 The maximum separation time computed over the graph formed from o5 , i2 and i3 (Figure 8.11, (b)) is−40 ≤ t(i3 ) − t(i2 ) ≤ 30. In the presented example to make the graph locally consistent two new commit constraints were added (dashed arcs in the Figure 8.10). The constraint i3 → o3 assures the implicit assume constraint between the nodes i2 and i3 making the node o5 locally consistent. The constraint o1 → o2 has been added explicitly to render the o4 locally consistent, as the node o2 is an output event and is considered as a node under the system control. Each locally consistent node can be removed from scheduling as this node always can be scheduled using the As Late As Possible (ALAP) relative schedule, for o5 node: t(o5 ) = min(t(i2 ) + 60,t(i3 ) + 80). ALGORITHM 8.5
Algorithm for finding a locally consistent event graph
Step 1: tighten the event graph Step 2: sort the list of convergence nodes in a reverse topological order
220
System Level Design with .NET Technology
l1, u 1
l2, u2
(a) type 4 constraints
0, 0
-∞, u1
z y
l1, ∞
x
z1
x
z
l2, ∞ y
-∞, u2
z2
0, 0
(b) an equivalent set of constraints
FIGURE 8.12: Transformation of type 4 constraints into min-max-linear constraints
Step 3: for each convergence node do Determine its parents for each pair of its parents do if not locally consistent then if possible to add a commit constraint then add it else find a commit constraint among the upstream ancestors end if update the list of convergence nodes end if end for end for As mentioned earlier, the timing for input events has to be satisfied for each value in the bounded interval, i.e., having multiple assume constraints we need to satisfy each value from each bounded interval. It means that the assume constraint interpretation corresponds to type 4 constraints interpretation from Figure 8.3. In [78] the assume constraint type was considered as a special type of linear constraints but an efficient algorithm to resolve the timing verification problem with linear-assume constraints has not been given. An equivalent set of min-max-linear constraints for type 4 constraints shown in the Figure 8.12 was proposed in [209] . The existence of equivalent min-max-linear expression for type 4 constraints and discovered correspondence between type 4 and assume constraints explain the nonexistence of polynomial time algorithm due to the NP complexity of the general min-max-linear problem [46]. The mathematical interpretation of assume constraints completes the mathematical representation of all constraints types discussed in the literature.
Timing Specification in Transaction Level Model
8.3.7
221
Discussion
In previous subsections we have seen graph-based algorithms for timing specification verification. Another approach used to calculate the maximum achievable separations for constraint temporal specification is an approach based on a reformulation of constraint specification into a mathematical optimization problem. Among the algorithms using a reformulation of constraint specification into a mathematical optimization problem we mention the work of T. M. Burks and K. A. Sakallah [46] and Y. Cheng and D.Z. Zheng [52] [53]. T. M. Burks and K. A. Sakallah [46] proposed two methods to solve the min-maxlinear constraint problem: a branch-and-bound algorithm in mathematical programming and a transformation method based on the linearization of min-max inequalities into a standard mixed integer linear programming formulation. In both cases, existing solvers were used to cope with reformulated timing specifications. Y. Cheng and D.Z. Zheng [52] [53] mathematically reformulated Yen et al. [210] algorithm using min-max functions theory.
8.4
Min-Max Constraint Linearization Algorithm
In order to cope with a general min-max-linear constraint problem we propose a new graph-based algorithm. In our approach we use the standard linearization procedure of min-max constraints as in the work of T. M. Burks and K. A. Sakallah [46], but we interpret it as a graph theory problem, receiving as a result a set of graphs with only linear constraints to which we apply the shortest path algorithm to calculate the maximum separation time for all event pairs.
8.4.1
Min-Max Constraint Linearization
Min value z of two integer numbers x and y, z = min(x, y), or max value y, z = min(x, y) can be found resolving a set of linear inequalities (Figure 8.13), where α1 , α2 are binary variables and M is a suitably large positive constant. Consider the subgraph shown in the Figure 8.14 containing a max event tk, having two predecessors ti and tj, and two max constraints: cik = (ti ,tk , [li , ui ]), c jk = (t j ,tk , [l j , u j ]). The firing time of the event tk determined by two max-constraints is: max(ti + li ,t j + l j ) ≤ tk ≤ max(ti + ui ,t j + u j ) The left part of the inequality is the same as for the linear solution, i.e., we have to satisfy all lower bounds simultaneously. It means that we can leave the lower bounds without any transformation in the graph. To find a greatest upper bound to resolve the right part of the inequality shown above we rewrite it using the linearization procedure presented in the Figure 8.13 and receive a set of linear inequalities shown in the Figure 8.15.
222
System Level Design with .NET Technology z = max(x, y) z = min(x, y) z≥x z≤x z≥y z≤y z ≤ x + Mα1 z ≥ x − Mα2 z ≤ y + M(1 − α1 ) z ≥ y − M(1 − α2 ) M ≥ |x − y| M ≥ |x − y| FIGURE 8.13: Min-Max linearization inequalities
ti
-li
-lj
tj
uj
ui
tk
FIGURE 8.14: Subgraph with the max constraints
We can observe that all inequalities have only two unknown variables for a given constant M and a given value of the binary variable α, Figure 8.15. A solution of such a linear system of inequalities can be found by a shortest path algorithm applied to a corresponding graph. We can transform min constraints to a set of linear inequalities in a similar way as for the max constraints. The firing time of the event tk determined by two min constraints is: min(ti + li ,t j + l j ) ≤ tk ≤ min(ti + ui ,t j + u j )
tk ≤ z = max(ti + ui ,t j + u j ) z ≥ ti + ui z ≥ tj +uj z ≤ ti + ui + Mα z ≤ t j + u j + M(1 − α) FIGURE 8.15: Transformed max constraint
223
Timing Specification in Transaction Level Model min(ti + li ,t j + l j ) = z ≤ tk z ≤ ti + li z ≤ tj +lj z ≥ ti + li − Mα z ≥ t j + l j − M(1 − α) FIGURE 8.16: Transformed min constraint
We transform the left part of the presented above expression into a set of linear inequalities, Figure 8.16. min(ti + li ,t j + l j ) = z ≤ tk z ≤ ti + li z ≤ tj +lj z ≥ ti + li − Mα z ≥ t j + l j − M(1 − α) Graph interpretation of transformed min and max constraints of the Figure 8.15 and Figure 8.16 is given in the Figure 8.17. According to the last inequality of the Figure 8.13, the constant M in the lin ti + ui − t j − u j for max earization procedure of the Figure 8.15 has to verify M ≥ constraints and M ≥ ti + li − t j − l j for min constraints. We have also the following system of inequalities. ti + ui − t j − u j ≤ ti − t j + ui + u j ≤ s ji + |ui | + u j ti + li − t j − l j ≤ ti − t j + li + l j ≤ s ji + |ui | + u j since li ≤ ui and l j ≤ u j .
Z=MAX(ti,tj)
ti
-li
-ui
ti
tj
ui
-lj
tk
-uj 0
ui +αiM
z
Z=MIN(ti,tj)
uj +(1-αi)M
tj
uj
lj
tk
li -li +αiM
0
-lj +(1-αi)M
z
FIGURE 8.17: Graph representation of the transformed max and min constraints
224
System Level Design with .NET Technology
The constant M has to be larger than separation between the nodes t j and ti : For max-only constraint systems the maximum separation between each pair of nodes t j and t j . As we have seen, max(tm + lm ) ≤ tk ≤ max(tm + um ), where m denotes all m
m
parents of the node tk . It means that a maximum separation calculated in the graph where all constraints are considered as being max along upper bounds will be greater than the corresponding maximum separation for general constraint system. We used an efficient polynomial time algorithm for the systems with only max constraints [194] to calculate the separation between all nodes t j and ti and we used this value to approximate an upper bound for M. Thus, for min and max constraints we take the constant M = s ji max only + ui + u j (ui , u j ≥ 0) in the case of finite bounds. If one or more values in the constant M expression are equal ∞, we will consider that M is equal ∞ represented by an arbitrary number larger than any sum of the absolute values of the finite bounds. A max and min function of more than two variables can be replaced by a composition of two-variable min, max functions. Therefore, from now on, we will consider two variable min-max constraints. ALGORITHM 8.6 Algorithm for finding the maximum separations between each pair of events for an event graph with min-max-linear constraints Step 1: Calculate the maximum separations considering all constraints being max type Step 2: Calculate the number of min-max events and form binary vector of parameters αi Step 3: Construct an intermediate graph where each min and max node are transformed into a “linear” one. Step 4: repeat for each min or max node do Calculate the constant value M Adjust the weights on the corresponding edges according to the binary vector value end for Calculate the maximum separations using Bellman-Ford algorithm if linear solution exists then for all i, j do si j ← max(previous(si j ), current(si j )) end for end if until all binary combinations of αi represented by the binary vector have been explored
225
Timing Specification in Transaction Level Model [1,1]
a
[0,500] s
b
[0,500] [2,2]
c [1,1]
[2,2]
[2,∞]
0 j
[2,∞] h
d [1,1]
0 e
[1,1]
f
FIGURE 8.18: Initial graph with max constraints
Example In the presented above algorithm we illustrate on an example taken from [210]. In the Figure 8.18 we have a graph with two max events, c and f , drawn as shaded circles. There are four max constraints, a → c, b → c, d → f and e → f . We transformed two max events in the initial graph in the manner presented by the Figure 8.17 receiving the “parameterized” graph presented in the Figure 8.19. The parameters in the graph are the binary values of variables αi . The arcs whose weight has to be changed according to the binary values αi are represented by the dashed lines in the Figure 8.19. For each binary value we have to find the linear solution if it exists and take the maximum values of the separations received during calculations. A complexity of the presented algorithm is exponential in terms of min-max events. If m is a number of min-max events with two constraints each, we have to explore in the worst case 2m subproblems having only linear constraints. Thus, algorithm complexity is 2m n3 where n is the number of graph nodes and m is a number of min-max events.
8.4.2
Algorithm Optimization
In order to reduce the number of linear solution explorations we propose to use a procedure used in the algorithm of T.Y.Yen et al. [210] to eliminate unsatisfied upper bounds of max constraints or unsatisfied lower bounds of min constraints, as for both constraint types only one bound has to be satisfied, the earliest one for min constraints and the latest one for max constraints. We have presented T. Y. Yen et al.’s algorithm [210] in Section 8.3.3. The general idea of this algorithm consists from two steps: 1. Satisfaction of all compulsory bounds and obtaining the smallest separation values that satisfy the compulsory bounds. For max events all lower bounds have to be satisfied, for min, all upper bounds, for linear constraints, both.
226
System Level Design with .NET Technology 1+(1-α1)M -1
500
-1
b
1+α1M -2
0
s
c_z 0
-1
2
c
-1
a
-2
-2 ∞
500 0 -2
d
1+(1-α2)M
2
-1 -1
e
j
∞
h -1 f
0 f_z
-1 1+α2M
FIGURE 8.19: Graph with transformed max constraints
2. Construction of the slack graph, graph with the values defining an amount by which the bound value can be tightened. Iterative relaxation of the separations according to the slacks.
In our algorithm in order to eliminate the unsatisfied bounds we construct a compulsory graph and calculate the smallest separations. Then we construct the slack graph. If the initial slack values are negative it means that the corresponding bounds cannot be tightened and in the context of our algorithm we do not need to explore the corresponding binary combinations for αi . For the example of the Figure 8.18 we have received the negative slacks for the max constraints b → c and d → f . This means that constraints b → c z and d → f z (Figure 8.19) cannot be tightened and the values of α1 = 1 and α2 = 0 have not been explored, leading to the verification of the only one parameter combination: α1 = 0 and α2 = 1 instead of four binary combinations. The linear solution corresponding to the assignments α1 = 0 and α2 = 1 in the parameterized graph (Figure 8.19) gives us the solution of the initial problem. With this kind of optimization the worst case complexity stays the same as sometimes in the graph all bounds for min-max constraints can give some amount by which the bounds can be tightened and we have to find the one that provides the largest or smallest value for max or min constraint correspondingly exploring all binary assignments.
227
Timing Specification in Transaction Level Model
r
8284 clock
clock
3625 decoder
R
read
Intel 8086
2716 PROM a
74S373 latch
A
address data
FIGURE 8.20: Intel 8086 ROM read cycle
8.4.3
Experimentations
We applied the presented algorithm with optimization to the graph of timing specification in the Figure 8.18. We obtained the results exploring only one linear solution. For comparison, the algorithm of Yen et al. requires two iterations and the algorithm of McMillan and Dill almost 500 iterations to get the results [210]. The second example that we used was Intel 8086 ROM read cycle from [86]. The exploring system contains a clock generator, address decoder, address latch. The system is shown in the Figure 8.20. The clock generator emits a clock signal with a period of 204 ns. The address latch holds the address and has a delay of [0, 12] ns. The address decoder has a delay of [0, 30] ns and ensures that only the selected PROM outputs data into the bus at any given time. The designer’s problem is to verify if the 2716 PROM is fast enough to work with an 8086 having a clock period of 204 ns. In terms of timing it means that we have to verify the following timing requirements: d1 → d2 = [0, ∞] A1 → A2 = [0, ∞] c3 → d2 = [10, ∞] a2 → d1 = [0, ∞] R1 → R2 = [0, ∞] d1 → c3 = [30, ∞] The timing specification of this example consists of 13 events among which there are one min event, one max event and the rest are the linear events, Figure 8.21. The events c1 , c2 , c3 are clock transitions; a1 and a2 are the beginning and the end of valid address of the data/address bus. The events A1 and A2 are the beginning and end of valid address at the address latch outputs. The events r1 /R1 and r2 /R2 are the beginning and the end of read signal/read signal of the address decoder output. A min event, d2 , is an event of the end of valid data on the data/address bus. This event has to occur as soon as either address or read signal is removed. A max event, d1 , is the start of valid data on the data/address bus which depends on the later of the two input signals. In order to verify the timing requirements we have applied the algorithm to the system timing specification (Figure 8.21) and calculated actual timing separations (Table 8.3).
228
TABLE 8.3: Timing separations for the Intel 8086 ROM read cycle c1 0 -204 -612 -10 -214 -214 -622 -10 -214 -214 -622 -214 -214
c2 204 0 -408 194 -10 -10 -418 194 -10 -10 -418 -10 -10
c3 612 408 0 602 398 398 -10 602 398 398 -10 398 398
a1 110 -94 -502 0 -104 -104 -512 0 -104 -104 -512 -104 -104
a2 284 80 -328 274 0 0 -338 274 0 0 -338 0 0
r1 369 165 -243 359 155 0 -333 359 155 0 -333 0 155
r2 762 558 150 752 548 548 0 752 548 548 0 548 548
A1 122 -82 -490 12 -92 -92 -500 0 -92 -92 -500 -92 -92
A2 296 92 -316 286 12 12 -326 286 0 12 -326 12 12
R1 399 195 -213 389 185 30 -303 389 185 0 -303 0 185
R2 792 588 180 782 578 578 30 782 578 578 0 578 578
d1 572 368 -40 509 358 358 -50 509 358 358 -50 0 358
d2 8 8 8 8 8 8 8 8 8 8 8 8 0
System Level Design with .NET Technology
c1 c2 c3 a1 a2 r1 r2 A1 A2 R1 R2 d1 d2
229
Timing Specification in Transaction Level Model 204,204
c1
c3
10, 165
10, 110
10, 150
10, 80 0, ∞
a1
408,408
c2
0, 12 A1
0, ∞
a2
333, ∞
r1
0, 12
0, 30
A2
0, 450
0, 30
R1
0, 120 d1
r2
R2
0, ∞
0, ∞ d2
FIGURE 8.21: Timing specification of the Intel 8086 ROM read cycle
The results are the same as presented in [86]. As we can see several timing requirements are violated. A comparison of the actual separations computed by the algorithm and the required separations is given in the Table 8.4. In our algorithm for the timing specification with one max and one min event to transform into linear ones, we have to explore four linear solutions. Applying the discussed above optimization technique we have reduced the number of linear solution explorations to two. The presented examples demonstrate that regardless the exponential worst case complexity, the proposed algorithm is quite efficient.
TABLE 8.4: Required and computed separations for the Intel 8086 ROM read cycle
Constraints Required time inter- Actual timing vals [−s ji , si j ] d1 → d2 [0,8] [-358,8] c3 → d2 [10, 8] [-398,8] R1 → R2 [0,8] [303,578] A1 → A2 [0,8] [92,286] a2 → d1 [0,8] [0,358] d1 → c3 [30,8] [40,398]
separations,
230
System Level Design with .NET Technology Untimed
Timed
ALG
CP
CP+T
PV
PV+T
CA
RTL
FIGURE 8.22: TLM abstraction levels and potential flows
8.5
Timing in TLM
Transaction Level modeling is viewed differently by different researchers. Below we will consider SystemC definition of TLM proposed by A. Donlin [70]. TLM refers to a set of abstraction levels differentiating each from another in the expression degree of functional or temporal details. These levels and the possible design flows through TLM space are presented in the Figure 8.22 with an indication of their situation between the algorithmic (ALG) and register transfer (RTL) levels that are not considered as a part of TLM space. Transaction-modeling levels are: • Communicating Processes (CP) • Communicating Processes with Time (CP+T) • Programmers View (PV) • Programmers View with Time (PV+T) • Cycle Accurate (CA) In Transaction Level Modeling the first level where some quantity of timing information is added to the system description is CP+T level. At this level we have
Timing Specification in Transaction Level Model
231
to extract from system specification information and perhaps from the CP simulation results the first global temporal system model. All architectural constraints can be presented by the corresponding temporal models and can be verified analytically using the timing verification algorithms and then verified with refining functional details by simulation. The Hardware/Software partitioning process, if needed, can be guided by criteria of temporal constraints realizability. At PV, PV+T levels the communication structure can be explored and verified using timing analysis. In this subsection we demonstrate in which way the presented timing expression methodology can be applied at different TLM abstraction levels.
8.5.1
Timing Modeling at CP+T Level
For exploration at the CP+T level we use a CP audio-video server system model [60]. From the information collected at CP level we manually created the temporal server model shown in the Figure 8.23. In this figure the events appearing on one server channel are presented with its temporal relationships. This CP+T temporal model of the audio-video server system is explored and has to be refined. As we can see that server temporal model contains several conditional events such as user commands. At the CP level these commands are modeled using probabilistic generation. Furthermore some parameters are defined by their variation intervals leading to the specification of timing intervals in parameterized manner. Thus, the server behavior is quite complex and the corresponding temporal graph includes the cyclic treatment, parameterized borders of the delay intervals and several conditional events describing nondeterministic behavior. These concepts are not supported by the presented timing expression methodology based on timing constraint analysis. For this case we tried to combine the simulation and the temporal verification. In order to manage the exploration of the systems with conditional events and repetitive behavior we can use dynamic verification that combines the simulation and analytical methods. We can subdivide the graph with cyclic behavior into acyclic subgraphs and explore several subgraphs by analytical methods in the parameterized form. Having the verified subgraphs with timing intervals presented in parametric form, we can dynamically during the simulation extract the corresponding subgraphs, substitute the simulation values and verify the system requirements. In this manner we can verify and adjust temporal constraints in the temporal model and add the behavior details in the simulation model. For the audio-video server model we have extracted several parameterized acyclic subgraphs, but all of them gave us trivial solutions. This fact signifies that each acyclic subgraph in the temporal audio-video server model is not detailed enough to provide information for the temporal analysis and at the same time the whole CP+T temporal graph is very complex for the proposed temporal expression methodology. Our experimentations have demonstrated that the timing constraint analysis probably is not very appropriate for the exploration of high level system descriptions for the applications with nondeterministic and repetitive behavior. For such systems
232
System Level Design with .NET Technology 0,Td
Ts sequence size in seconds N number of transmitted fragments Td Fragment rate
Init
0,60 Server answer 0,Ts
0,30 command
0,Ts-N*Td Stop 0,Td or Ts,Ts
0,Ts-N*Td
0,Ts-N*Td
Restart 0,Td
End Sequence
0,Ts-N*Td
Pause
FW,BW
0,Td
Transmit Fragment
or
or 0, Td
0,Td
Td,Ts-N*Td
300,300
Td,Ts-N*Td 0,Td
Corrupt
Td,Td
0,Ts-N*Td
Stop Pause
Timeout
Stop Fw,Bw
0,Td 0,Td
FIGURE 8.23: Temporal model of the audio-video server system
the simulation method will be sufficient providing acceptable simulation speed due to the content of not very large amount of details in the system description at CP level.
8.5.2
Communication Exploration at PV and PV+T Levels
TLM paradigm assumes development of system computation structure separately from communication structure in the design flow. In this subsection we present a method using min-max-linear temporal model to explore and refine the system communication structure in TLM design flow. Detailed communication structure exploration and refinement is done at the PV and PV+T abstraction levels. The presented method is an extension and generalization of a heuristic algorithm based on the local consistency property for assume-commit constraints systems [78] discussed in Section 8.3.6. Structurally modern systems can be seen as one or several processing components which communicate with each other or/and the environment. The idea of distinguishing the event’s nature leads to the identification of events that can be controlled by some component or that can be grouped according to certain functionality for future implementation as a unique component in the system. A “subdivision” of the global
233
Timing Specification in Transaction Level Model c11
[10, 40]
c13
[10, 20]
c21
[10, 60]
[40, 60] [10, 30]
a1
c15
[20, 50]
[30, 80] c12
[10, 20]
c14
[10, 30]
c22
FIGURE 8.24: Temporal specification of two communicating components
temporal system specification into regions of events belonging to some structural or logical unit gives us the possibility to verify the temporal interactions between the communicating elements and eventually propose a set of rules defining a protocol of communication assuring the timing consistency of the system specification. This verification mechanism can be applied at different abstraction levels providing in this way the exploration and refinement of the system communication infrastructure. In the case of platform-based modeling this method can help to adjust several configuration parameters. Consider the event graph of the Figure 8.24 that represents the temporal specification of two communicating components. Suppose that in this graph the events c1i (i = 1..5) are the events of communicating component 1 that can be controlled by this component. Correspondingly c2 j (i = 1..2) are the events controlled by the communicating component 2. The constraints represented by the arcs finishing at these events are the commit constraints (Definition 8.5). There is only one event that can not be controlled in the Figure 8.24, an event source a1 . The events that cannot be controlled in the system timing specification are the “true” input events which form assume constraints (Definition 8.5). In the temporal system specification we have to define for each event its nature, if the component event can not be controlled by this component, the event nature is “in,” otherwise it is “out.” Determining the event’s nature has to be done judiciously because each communicating component can have several uncontrolled events and in this case they generate assume constraints with the environment events that are in general uncontrollable and form the assume constraints too. Applying to the graph in the Figure 8.24 the algorithm of local consistency verification (Section 5.2.6) to the graph in the Figure 8.24, and considering all the constraints to be linear, we obtain a solution presented in the Figure 8.25. In the initial event graph nodes c14 and c15 are not locally consistent. To make them locally consistent we have to examine the parents of these nodes and as the parents of node c14 as well as those of c15 are events belonging to the same logical partition, i.e., controlled by the same communicating component, we can add the corresponding
234
System Level Design with .NET Technology c11
[10, 40]
c13
[10, 20]
c21
[40, 60]
[10, 60] [-10, 20]
a1
[10, 30]
[-30, 40]
[20, 50]
c15
[30, 80] c12
[10, 20]
c14
[10, 30]
c22
FIGURE 8.25: Solution for the temporal specification of Figure 8.24
commit constraints. Thus, we have generated timing relationships between two communicating devices, a protocol of communication that has to be respected to assure temporally correct system functionality. In the case of the presence of multiple uncontrolled events the asssume constraints generated by these events have to be satisfied for all values of bounded intervals. To satisfy the assume constraints, the algorithm looks for a candidate for a new commit constraint among the events belonging to the same logical unit. Let us consider now that the component 2 generates only one event, c2 1 (Figure 8.26). Trying to make node c1 5 locally consistent, we should enforce the implicit assume constraint between c2 1 and c1 6 to be in [-30, 40] by means of inserting the corresponding commit constraint in appropriate logical unit. The inserted commit constraint is [-40, 20] between events c1 6 and c1 3, events under control of communicating component 1. A pseudo code of the algorithm for finding a realizable timing specification of multiple communicating components is presented below.
c11
[10, 40]
c13
[10, 20]
c21
[10, 60] a1
[-10, 20]
[10, 30]
[40, 60]
[-40, 20]
[20, 50]
c15
[30, 80] c12
[10, 20]
c14
[10, 30]
c16
FIGURE 8.26: Changed temporal specification of Figure 8.24
Timing Specification in Transaction Level Model
235
ALGORITHM 8.7 Algorithm for finding a realizable timing specification of multiple communicating components Step 1: Annotate in the timing specification graph events that can be controlled by the same communicating component and “true” input events Step 2: for each convergence node do for each pair of its parents do verify the local consistency property if the condition does not hold then if the pair of convergence node parents belongs to the same component then add or adjust the commit constraint else look for a necessary commit constraint in the corresponding component end if else signal timing inconsistency end if end for end for
8.5.2.1
Experimentations
In this section we apply the presented concepts to the example of the timing of bus arbitration [86]. Buses are the basic blocks of complex digital systems and often cause some difficult timing problems. Considering the timing specification of the multi-master system configuration on an Intel Multibus, we derive a realizable bus arbitration protocol. The multi-master system configuration on an Intel Multibus involves three masters, A, B, C. Each master has a distinct priority. The priority resolution is handled by the parallel priority resolution scheme implemented by encoder/decoder logic on Master A. In this example the bus arbitration concerns Master B and C. Master C has the lower priority. Consider the following situation. Master C is transferring the data and B requests a bus data transfer. The priority resolution logic at A asserts and removes the corresponding signals. C is allowed to complete the transfer. After that the Master C surrenders the bus and B prepares and begins the transfer. The relevant signals and events involved in Multibus arbitration with parallel priority resolution are presented in the Figure 8.27. The bus arbitration timing specification for a minimum bus clock of 100 ns (the maximum bus transfer rate) derived from Intel’s data sheets is presented in [86]
236
System Level Design with .NET Technology BCLK/
BCLK/ b2
BREQ/
b3 b4
B1
b5
B2
B3
B4
B5
b1
BREQ/
r
B
R
A BPRN/
P
Multibus
p
BPRN/ BUSY/
b
BUSY/
C
B
BCLK/ B1
B2
B3
B4 c
B5
C BUSY/
FIGURE 8.27: Events and signals involved in arbitration [86]
and is given in the Figure 8.28. In this temporal specification we can distinguish four groups of events. “True” input events are the clock events represented in the Figure 8.28 by dashed lined white circles. The second group of events contains the events controlled by the bus, R, P, c, b. The third one contains only one event, r, the request event sent by the Master B to access the bus for data transfer. Finally, the events p, C, B are the events controlled by the priority resolution logic and are forming the fourth event group. Dashed arcs represent the constraints involved in the arbitration process. Now we can explore the temporal interactions between communicating devices and if needed constrain the temporal specifications inside each group of events and between the groups generating realizable bus arbitration protocol. In this example all temporal constraints are linear. We have applied to the graph of the Figure 8.28 the local consistency property method. The results are given in the Figure 8.29. The timing situated on the dashed arcs represents a realizable bus arbitration protocol on the Intel Multibus system. As we can see, in order to satisfy a priority resolution delay of 37 ns (constraintR → p) we have to add a constraint to clock-request delay (b1 → r). If we want to avoid this, faster priority resolution logic has to be chosen. In [86] the timing for the constraint R → p was calculated manually using the specification of worst-case bounds on the corresponding signals and the constraint violation was determined; another timing violation was detected using the simulation method. In both cases the timing violations was discovered, but no solution was given. Applying our method we derived a protocol for communicating components that can satisfy a priority resolution delay of 37 ns reducing the temporal bounds of the events that can be controlled by the system.
237
Timing Specification in Transaction Level Model [100,100]
b1
[100,100]
b2
[0,35]
R
[0,3]
[37,37]
r [0,35]
p
C
[0,3]
[0,3]
[15,∞]
[100,100]
B2
b4
[100,100]
[0,70]
P
[100,100]
[100,100]
[22,∞]
[0,3]
B1
b3
B3
[0,70]
[25,∞]
B
[0,3]
c
b
[0,35]
[0,60]
[100,100]
b5
B4
[20,∞]
[100,100]
B5
FIGURE 8.28: Timing specification of the bus arbitration [100,100]
b1
[100,100]
b2
[0,35] [0,35]
R
b3
[100,100]
[0,3]
[100,100]
[0,41]
[28,63] [37,37]
b4
p
C
[0,3]
[0,3]
[0,66]
b5 [34,100]
B
[0,3]
[0,3]
r
P
[0,35]
B1
[100,100]
[26,63]
B2
[100,100]
B3
c
b
[0,35]
[0,60]
[100,100]
B4
[100,100]
FIGURE 8.29: Realizable bus arbitration timing specification
[40,100]
B5
238
8.6
System Level Design with .NET Technology
Conclusion
The proposed timing expression methodology is based on previous work and is enhanced with several important theoretical aspects. We have completed the mathematical representation of all temporal constraint types discussed in the literature by means of discovering the correspondence between min-max-linear and asssume constraints semantics (the assume constraint type can be modeled as min and max constraints). This fact explained the non-existence of polynomial time algorithm for linear-assume constraints systems due to the NP complexity of the general minmax-linear problem. Several existing timing verification algorithms have been implemented. We proposed a new general algorithm based on the linearization of minmax inequalities and optimization techniques to improve the algorithm efficiency. This method has been used for the timing analysis of all four constraint type systems. Thus the method can be applied to the exploration of a wider class of applications than previously presented ones. The timing expression based on temporal constraint analysis is completely independent from languages used for system design, providing in this manner a possibility of easy integration in different design methodologies. We integrated the min-max-linear-assume constraint timing specification methodology in TLM design flow in order to represent and explore timing descriptions at different abstraction levels. We extended the existing methods to perform the communication structure exploration and refinement in system design. The exploration of communicating components temporal model has been done using the local consistency property method which was generalized to handle min-max events. The proposed communication structure exploration methodology can be used in an automatic protocol generation, in determining of temporal specification inconsistencies and in adjusting some parameters in the case of platform-based design methodologies. All these features lead to the reduction in the time needed for the communication design space exploration.
Part III
Practical Use of ESys.NET
9 ESys.NET Environment James Lapalme Universit´e de Montr´eal - Canada Michel Metzger Universit´e de Montr´eal - Canada 9.1 9.2 9.3 9.4 9.5
9.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
241 242 260 269 279
Introduction
The need to bridge the gap between system level modeling and implementation modeling is becoming pressing as embedded systems incorporate more software components. Maybe a change of paradigm is needed for hardware and system modeling? Most current hardware description languages have two significant advantages over generic programming language: syntactic brevity for hardware semantics and better constructs for model verification. However even these advantages are melting away with the emergence of languages like ASML [1]. Even SystemC [15], a C++ based solution, has been able to incorporate fairly simple syntactic constructs, through the use of macro, to provide hardware semantics. Assertion based verification capabilities have usually only been supported by hardware description languages and specialized verification languages but generic programming languages are beginning to incorporate those capabilities such as Eiffel and ASML. The major limiting factor of using generic programming languages for hardware modeling is that hardware semantics are not directly present. But what if we could add the missing metadata? What could we do if we had the power of certain high level programming languages: the reflective capabilities of Java, the polymorphism of Ruby, the elegance of ML, or the simple power of Perl? Would all of this change the way we think of system modeling and hardware modeling? The .NET Framework
241
242
System Level Design with .NET Technology MyModule
ModuleB
ModuleA gen1 System clock
sig1 ModuleA
Adding process
sig3
testbench
sig2
gen2
generator
FIGURE 9.1: MyFirstSystem
currently makes interoperability between languages almost seamless. It also permits the integration of custom metadata. After struggling with the downfalls of SystemC, we looked for another alternative that would enable us to model systems in a simple effective manner and that would allow us to explore different types of CAD and EDA tools for partition, verification and synthesis. After looking at many environments and languages we stumbled across the C# language and the .NET Framework. We immediately noticed that C# and .NET brought together several important features from various existing solutions, i.e., Java, C++, ML, etc. and brought several new features that would probably enable the development of a new environment for system-level design based on the previous work of SystemC. This section describes the fruit of our labor: Embedded Systems with .NET, a new system-level design environment based on SystemC. ESys.NET is meant to be an evolution of SystemC by offering the same modeling capabilities but in a more elegant package. ESys.NET also innovates on SystemC by using a better underlying programming language which permits it to inherit operating system primitives, a rich software component library for rapid tool development and powerful runtime. Our environment, called ESys.NET, uses the advance programming features of .NET and C# such as reflection, attribute programming and dynamic delegate creation to produce a flexible solution that is meant to be an evolution of SystemC.
9.2 9.2.1
Modeling My First Model
The best way to present a new tool is with an example; here is a simple example called “MyFirstSystem” that we will use to present our environment. Figure 9.1 is a pictorial representation of MyFirstSystem.
ESys.NET Environment
243
“MyFirstSystem” is a model for a simple synchronous hardware component that is being tested with a testbench. The hardware component, named “generator,” generates an integer on its output port on each positive edge of the main system clock. When a new value is generated by the hardware component, the testbench is notified of the new value by the “generator.” The testbench then reads the new value from its input port and then prints it out. Like most real hardware components, the “generator” is composed of subcomponents. These sub-components are responsible for generating an integer value on each positive edge of the main system clock which is then added together by a computation process in the encapsulating component. Code example 9.1 defines the blueprint for the two sub-components (gen1 and gen2): EXAMPLE 9.1 : ModuleA blueprint 1 p u b l i c c l a s s ModuleA : BaseModule { 2 p u b l i c Clock c l k ; 3 public o u t I n t porta ; 4 5 p u b l i c ModuleA ( ) : b a s e ( ) { } 6 7 [ Process ] 8 [ E v e n t L i s t ( ‘ ‘ posedge ’ ’ , ‘ ‘ clk ’ ’ ) ] 9 p u b l i c v o i d Gen ( ) { 10 while ( true ){ 11 f o r ( i n t i =0;0 X (sig(hwdata) == "DATA")) } FIGURE 9.8: An example of a property for the AHB-Lite model
Every property starts with the keywords ltl property or rtltl property followed by the identifier of the property. The synchronization event is specified
274
System Level Design with .NET Technology
between parentheses. The property itself, enclosed between curly brackets, is made of two parts. The first one checks that the model is in a transfer phase. The htrans signal identifies the type of the frame currently transferred. It can either be sequential or not sequential, depending whether the frame is the first of a burst sequence or not. The hwrite signal indicates that the master is writing data to a slave and the hready signal is the answer of the slave to signify it is ready to transfer. The second part of the formula (sig(hwdata) == "DATA") checks that a valid data is provided by the master. The two parts are connected by an implication operator. The always ([]) operator is put in front of the formula to ensure that it is valid during the whole simulation. A graphical user interface was developed to ease the property specification process. It provides advanced edition features and a compact view of the design structure. A screenshot is presented in Figure 9.9.
LTL operators
Auto-completion
Formulae editor
Model hierarchy
FIGURE 9.9: Observer designer screenshot
The structure of the model is retrieved through introspection, without the need of a complex parser. Syntax highlighting, detailed type information and autocompletion in the editor help to prevent syntax and semantic errors. As the formalization of complex LTL properties can be difficult, a pattern instantiation mechanism was implemented to ease the specification of LTL properties. Presented in [74], patterns are typical combinations of properties which often occur in formal specifications. The use of these formula templates further reduces the risk of errors in the specification process.
275
ESys.NET Environment
The set of properties is stored in a text file which is then used by the verification tool itself. For the AHB-Lite model, the final property file contains more than 40 properties. They are derived from a textual specification formulated by a well known EDA company that used them to validate its own AMBA bus model.
9.4.4
Verifying Temporal Properties during Simulation
This section presents the mechanisms used to verify the properties during simulation. As mentioned in the introduction, LTL and RTLTL properties are first transform to automata. To verify LTL property, well-known model-checking algorithms use B¨uchi automata to recognize bad prefixes, i.e., possibly infinite sequences of states that violate the property. Of course, this technique cannot be applied to finite simulation traces. The use of temporal logic on finite traces has been discussed in [77, 146]. The approach used in our verification tool is to detect violation of a property as soon as possible by using an automaton that recognizes an invalid sequence of states. The LTL and RTLTL properties are translated to B¨uchi automata using well-known algorithms [90, 93]. As an example, the following property specifies that a request must occur and be followed by an acknowledgement some time after: (sig(req) == true && X (sig(req) == false && sig(ack) == true)) sig(req) == true, sig(req) == false and sig(ack) == true are atomic propositions. This formula is translated into the automaton presented in Figure 9.10. The initial state of the automaton is init. When the synchronization event occurs, the atomic properties are evaluated and all of the valid transitions are performed. If more than one transition can be performed, this implies that these automata are nondeterministic. This is a consequence of the use of B¨uchi automata which are nondeterministic by nature. To illustrate the way they can still be used for our purpose, we will use an example of an execution trace presented in Table 9.2. It represents the value of the two signals for each occurrences of the synchronization event. At the end of the first step, both signals are low. The only possible transition is the default (true) one. The automaton stays in the init state. During the second step the request signal is activated. Now two transitions are possible: the default one and the one going to the init state. The automaton is now in two states at the same time: init true Init
sig(req) == true
true 1
true
sig(ack) == true && sig(req) == false
final
FIGURE 9.10: A simple example of an automaton
276
System Level Design with .NET Technology
Steps sig(req) sig(ack)
1 true false
2 true false
3 4 false false false false
5 false true
6 false false
Table 9.2
and state 1. It stays in this dual state until the fifth step when the acknowledgment is transmitted. At this point of the simulation the active states are init, state 1 and final. When the simulation ends, if the set of states reached by the automata contains a final accepting state the property is said to be verified. If it does not contain an accepting state it does not mean that the property was violated. For instance, if the simulation stops at step 4, the automaton is in the init state and in state 1. The only conclusion we can make at this point is that the property was not yet violated. No conclusion can be made, since the prefix that was detected so far is valid. If at some point of the simulation the automaton cannot perform any transition then it means that the corresponding property has been violated. It is important to mention that this implies that a whole class of properties cannot be completely verified by the technique presented here. The example presented above belongs to this class of properties since there is a default transition possible in each state and therefore will never be violated. These kind of properties are still useful to analyze the coverage of the test vectors, for instance to ensure that at least one complete transaction (request/acknowledgment sequence) has been performed by the testbench. An example of the automata corresponding to the property of the Figure 9.10 is presented in Figure 9.11. Labels p0, p1, p2 and p3 on the edges represent the non-temporal (atomic) properties of the property. Respectively sig(htrans) == "NSEQ", sig(htrans) == "SEQ", var(phase) == "TRANS ADDR" and sig(hwdata) == DATA".
!p0 && !p1 init !p2
true !p2 && p3
final
!p0 && !p1 && p3
FIGURE 9.11: An example of an automaton
p3
ESys.NET Environment
9.4.5
277
Linking Different Tools
The mechanisms used by the verification engine to bind itself to the simulation kernel of ESys.NET and to evaluate the state of the system model are detailed in the following section. In Figure 9.6, we show how the two flows are executed together and we show the three main phases of the simulation: the internal representation generation phase, the elaboration/binding phase and the execution phase. During the transformation of properties to automata, the verification tool performs a series of checks to ensure the coherence of the system model and the properties. This is to make sure that the objects referenced in the properties exist in the model and that the comparisons between objects and/or literals are valid. For instance, it raises an error if it finds a property that compares a Boolean signal and a floating point number. These checks rely on the reflection capabilities of the .NET framework. By doing so, it is possible to browse through the entire system model, observe its structure and retrieve detailed information about the data types, without having to parse and analyze the source code. After the elaboration phase of the simulation kernel, the verification engine needs to bind itself to the instantiated system model. This is required to evaluate the state of the model during the simulation phase. Callback methods are also registered in the simulation engine. These callbacks notify the verification engine of the different steps of the simulation and the occurrence of the synchronization events. ESys.NET provides a set of hook-up points to facilitate the integration of third party tools. One should mention that the verification and the simulation processes are loosely coupled. The simulation kernel is not modified to integrate the verification engine. The verification layer is not bound to a specific simulator implementation. As long as the simulator is conforming with the simulation semantic defined by ESys.NET and it defines all of the mandatory interfaces, the verification tool presented here will behave as expected. During the simulation, the verification engine must be notified of the occurrence of synchronization events. If the synchronization event of property occurs, the verification tool has to evaluate the state of the model and then update the automata. As mentioned in the previous paragraph callback methods are called by the simulation kernel when a synchronization event occurs. All the properties synchronized with this event are then tagged as “to be updated”; but are not executed yet. This is because synchronization events are triggered during the execution of delta cycles, and the evaluation of the state of the model must take place when the model reaches a stable state. The execution semantic of the ESys.NET kernel states that the model cannot be considered stable during the execution of the delta-cycles. Therefore, the verification engine must wait until the end of the “simulation cycle” to evaluate the state of the model. It hooks up to the hook up point of the simulator named cycleEnd. See Figure 9.5 for more details about the simulation kernel hook up points. When the system model is in a stable state, all the automata that have to be updated are executed with the algorithm presented in Section 9.4.4.
278
System Level Design with .NET Technology
The graphical user interface of the verification tool integrates the verification engine and the ESys.NET simulation kernel in a single application. A screenshot is shown in Figure 9.12. It allows the user to load a property file, a compiled ESys.NET system model and to launch the simulation. Properties can be individually enabled or disabled. The user can also choose to record the execution trace of an automaton and the value changes of any signal in the model. At the end of the simulation the result of properties’ verification is presented in the left-hand pane with icons. The execution trace of the automaton can be replayed in the center pane where the valid transitions/states are highlighted in green and the failing states in red. A waveform view is also offered, synchronized with the automaton view.
Simulation control List of observers and verification results
Automata state
Navigation through execution steps
FIGURE 9.12: The simulation and verification application
9.4.6
Observing Results
The main purposes of the AHB-Lite case were to demonstrate the validity of the approach and make sure that simulation performances were not impacted too much by the verification process. From the point of view of the model designer, the verification tool allows us to identify several limit cases and synchronization issues. The coverage of the test vectors was also improved. From the point of view of the verification tool designer, this case study was an opportunity to evaluate the performances of the verification tool. A profiler was used to record execution time and memory allocation. The overhead due to observers’ execution was evaluated during simulation. Approximately 67% of the simulation time was dedicated to observers. A simulation of 20 000 clock cycles took approximately 1.6 seconds without observers (250k events/sec). Adding observers raised the execution time to 5 seconds. At first glance, the simulation time rate used by observers can seem quite important. The situation
ESys.NET Environment
279
can be explained however, by the important number of observers compared to the fairly low complexity of the model. The model used here is purely functional and focuses on the communication aspect of the system, without any heavy processing done on the master and the slaves. Indeed, the time overhead is directly proportional to the number of observers (to be more precise it is proportional to the number of atomic properties). The case study implied many observers (i.e., LTL formulae to check) verifying a fairly simple model. Of course, verifying the same properties on a more complex model would lower the impact of the observers overhead. Furthermore the size of each formula tends not to exceed a certain size in practice.
9.5
Conclusion
Today, in order to respect time to market and strict cost constraints, embedded system designers need new modeling and simulation solutions. These solutions must enable easier memory management. They must also permit software component modeling, component integration in a distributed web-based environment, easier debugging of complex specifications, multi-language features and mitigated connection with other existing or new CAD tools. ESys.NET fulfills the set of requirements that we enumerated in the chapter and thus with no important performance cost. The key point of our approach is the use of the advanced features present in the .NET Framework and the C# language. It offers many advantages over its predecessor. Among these are (i) a reduced set of modeling semantics due to concept unification, (ii) a simple programming basis exempt of eclectic syntactic elements, (iii) a simulation kernel that supports third-party tools integration, (iv) an overall environment that is better suited to less prone models, (v) a rich software library that permits the modeling of complex software components (especially operating system elements). The objective of this chapter was to offer the reader an overview of ESys.NET. With the aid of simple examples we have illustrated how a system designer would use ESys.NET for system-level design and verification. In addition, the design and architecture of the modeling, simulation and verification implementations were presented.
References
[1] ASML Home Page, www.research.microsoft.com/foundations/AsmL/. [2] C# language and tools: http://msdn.microsoft.com/en-us/vcsharp/aa336809.aspx. [3] DotGNU: http://www.gnu.org/software/dotgnu/. [4] ECMA-335: Common language specification. http://www.ecma-international .org/publications/standards/Ecma-335.htm. [5] ECMA and ISO/IEC: C# and common language infrastructure standards. http://msdn.microsoft.com/en-us/netframework/aa569283.aspx. [6] RapidIO Document Specifications. http://www.rapidio.org. [7] SystemC version 2.1. http://www.systemc.org/. [8] Xilinx EDK: http://www.xilinx.com/ise/embedded/edk docs.htm. [9] IEEE Standard VHDL Language Reference Manual. IEEE, 1076, 2000 edition, 2000. [10] AMBA Specification (rev2.0) and Multi layer AHB Specification, 2001. [11] .NET source code. http://www.microsoft.com/net, 2003. [12] OMG, UML Profile for Schedulability, Performance, and Time Specification. In Version 1.0, http://www.omg.org, 2003. [13] ESys.NET. http://www.esys-net.org/, 2004. [14] PROMPT-MAME Project Website. http://www.ele.etsmtl.ca/projets/PROMPT, 2005. [15] SystemC Language Reference Manual, IEEE Std 1666-2005. 2005. [16] QuickGraph, Graph Data Structures and Algorithms for .NET, 2008. [17] Postsharp, 2009. http://www.postsharp.org/. [18] Ben Albahari. A Comparative Overview of C#. developer/csharp comparative.htm.
http://genamics.com/
[19] Perry Alexander. Rosetta: Standardization at the System Level. Computer, 42(1):108–110, 2009.
281
282
System level design with .Net technology
[20] Dean Allemang and James A. Hendler. Semantic Web for the Working Ontologist: Modeling in RDF, RDFS and OWL. Morgan Kaufmann Publishers/Elsevier, Amsterdam; Boston, 2008. [21] Daniel Amyot, Luigi Logrippo, Raymond J. A. Buhr, and Tom Gray. Use Case Maps for the Capture and Validation of Distributed Systems Requirements. In RE ’99: Proceedings of the 4th IEEE International Symposium on Requirements Engineering, page 44, Washington, DC, USA, 1999. IEEE Computer Society. [22] L. Aqvist. Introduction to Deontic Logic and the Theory of Normative Systems. Bibliopolis, 1983. [23] Pontus Astr¨om, Stefan Johansson, and Peter Nilsson. Application of Software Design patterns to DSP Library Design. In 14th International Symposium on System Synthesis, Montr´eal, Qu´ebec, Canada, 2001. [24] Ivan Aug´e, Fr´ed´eric P´etrot, and Denis Hommais. A Pragmatic Approach to The Design of Embedded Systems. In DATE’01: Proc. of Design Automation and Test in Europe, pages 170–174, Munich, Germany, March 2001. IEEE. [25] Jean Bacon. Operating Systems: Concurrent and Distributed Software Design. Addison-Wesley, Boston, 2003. [26] Christopher J.O. Baker and Kei-Hoi Cheung, editors. Semantic Web : Revolutionizing Knowledge Discovery in the Life Sciences. Springer, 2007. [27] Felice Balarin, Yosinori Watanabe, Harry Hsieh, Luciano Lavagno, Claudio Passerone, and Alberto Sangiovanni-Vincentelli. Metropolis: An Integrated Electronic System Design Environment. Computer, 36(4):45–52, 2003. [28] K. Suzanne Barber, Thomas J. Graser, Jim Holt, and Geoff Baker. Arcade: Early Dynamic Property Evaluation of Requirements Using Partitioned Software Architecture Models. Requirements Engineering, 8(4):222–235, 2003. [29] Kent Beck and Ralph E. Johnson. Patterns Generate Architectures. In Proceedings of 8th European Conference for Object-Oriented Programming, pages 139–149. Springer-Verlag, July 1994. [30] Fabrice Bellard. QEMU, a Fast and Portable Dynamic Translator. In Proceedings of the USENIX Annual Technical Conference, pages 41–46. USENIX Association, 2005. [31] Claude Berge. Graphes et hypergraphes (in French), chapter 2, page 26. Dunod, 2nd edition, 1973. [32] Janick Bergeron. Writing Testbenches: Functional Verification of HDL Models, Second Edition. Kluwer Academic Publishers, Norwell, MA, USA, 2003. [33] Anders Berglund, Scott Boag, Don Chamberlin, Mary F. Fernndez, Mikael Kay, Jonathan Robie, and J´erome Sim´eon. XML Path Language (XPath) 2.0. Technical report, 23 January 2007.
References
283
[34] Tim Berners-Lee. N3 Notation: http://www.w3.org/DesignIssues/Notation3. [35] Tim Berners-Lee, James A. Hendler, and Ora Lassila. The Semantic Web. Scientific American, 284(5):28–37, 2001. [36] J. Bhasker. A SystemC Primer. Star Galaxy, 2004. [37] Scott Boag, Don Chamberlin, Mary F. Fernndez, Daniela Florescu, Jonathan Robie, and J´erome Sim´eon. XQuery 1.0: An XML Query Language. Technical report, W3C Recommendation - http://www.w3.org/TR/xquery/, 23 January 2007. [38] Grady Booch, James Rumbaugh, and Ivar Jacobson. Unified Modeling Language User Guide, The (2nd Edition) (Addison-Wesley Object Technology Series). Addison-Wesley Professional, 2005. [39] Aimen Bouchhima, Iuliana Bacivarov, Wassim Youssef, Marius Bonaciu, and Ahmed A. Jerraya. Using Abstract CPU Subsystem Simulation Model for High Level HW/SW Architecture Exploration. In ASPDAC’05: Proc. of the Asia South Pacific Design Automation Conference, pages 969–972, 2005. [40] Aimen Bouchhima, Patrice Gerin, and Fr´ed´eric P´etrot. Automatic Instrumentation of Embedded Software for High Level Hardware/Software CoSimulation. In ASP-DAC’09: Proc. of the Asia and South Pacific Design Automation Conference, pages 546–551, Piscataway, NJ, USA, 2009. IEEE Press. [41] Aimen Bouchhima, Sungjoo Yoo, and Ahmed Jerraya. Fast and Accurate Timed Execution of High Level Embedded Software using HW/SW Interface Simulation Model. In ASPDAC’04: Proc. of the Asia South Pacific Design Automation Conference, pages 469–474, 2004. [42] Don Box. Essential COM. Addison-Wesley Professional, first edition, 1998. [43] Robert King Brayton, Alberto L. Sangiovanni-Vincentelli, Curtis T. McMullen, and Gary D. Hachtel. Logic Minimization Algorithms for VLSI Synthesis. Kluwer Academic Publishers, Norwell, MA, USA, 1984. [44] Richard Buchmann, Alain Greiner, and Fr´ed´eric P´etrot. Fast Cycle Accurate Simulator to Simulate Event-Driven Behavior. In In Proc. of the International Conference on Electrical Electronic and Computer Engineering, pages 37– 40, Cairo, Egypt, September 2004. [45] Jerry Burch, Roberto Passerone, and Alberto L. Sangiovanni-Vincentelli. Overcoming Heterophobia: Modeling Concurrency in Heterogeneous Systems. In ACSD ’01: Proceedings of the Second International Conference on Application of Concurrency to System Design, page 13, 2001. [46] Timothy M. Burks and Karem A. Sakallah. Min-max Linear Programming and the Timing Analysis of Digital Circuits. In ICCAD’93: Proc. of the International Conference on Computer-Aided Design, pages 152–155, 1993.
284
System level design with .Net technology
[47] Jo´ao M. P. Cardoso and Hor´acio C. Neto. Compilation for FPGA-Based Reconfigurable Hardware. IEEE Design and Test, 20(2):65–75, 2003. [48] Jeremy J. Carroll, Ian Dickinson, Chris Dollin, Dave Reynolds, Andy Seaborne, and Kevin Wilkinson. Jena: Implementing the Semantic Web Recommendations. In WWW Alt. ’04: Proceedings of the 13th international World Wide Web conference on Alternate track papers and posters, pages 74– 83, New York, NY, USA, 2004. ACM. [49] Calin Cascaval, Colin Blundell, Maged Michael, Harold W. Cain, Peng Wu, Stefanie Chiras, and Siddhartha Chatterjee. Software Transactional Memory: Why is it only a Research Toy?, url = http://dx.doi.org/10.1145/ 1400214.1400228, volume = 51, year = 2008. Commun. ACM, (11):40–46. [50] Wander O. Cesario, Gabriela Nicolescu, Lovic Gauthier, Damien Lyonnard, and Ahmed A. Jerraya. Colif: A Design Representation for ApplicationSpecific Multiprocessor SOCs. Design and Test of Computers, IEEE, 18(5):8– 20, Sept-Oct 2001. [51] Luc Charest, Michel Reid, El Mostapha Aboulhamid, and Guy Bois. A Methodology for Interfacing Open Source SystemC with a Third Party Software. In DATE’01: Proceedings of the Design Automation and Test in Europe Conference, pages 16–20, Munich, Germany, March 2001. IEEE Computer Society. [52] Yiping Cheng and Da-Zhong Zheng. Min-Max Inequalities and the Timing Verification Problem with Max and Linear Constraints. Discrete Event Dynamic Systems, 15(2):119–143, 2005. [53] Yiping Cheng and Da-Zhong Zheng. An Algorithm for Timing Verification of Systems Constrained by Min-Max Inequalities. Discrete Event Dynamic Systems, 17(1):99–129, 2007. [54] Nicos Christofides. Graph Theory, An Algorithmic Approach, chapter 10, Hamiltonian Circuits, Paths and the Traveling Salesman Problem, pages 214– 235. Academic Press, 1975. [55] Alexandre Chureau, Yvon Savaria, and El Mostapha Aboulhamid. The Role of Model-Level Transactors and UML in Functional Prototyping of Systemson-Chip: A Software-Radio Application. In DATE’05: Proc. of the Conference on Design, Automation and Test in Europe, pages 698–703, 2005. [56] Kendall Grant Clark, Lee Feigenbaum, and Elias Torres. SPARQL Protocol for RDF. Technical report, W3C Recommendation, 15 January 2008. [57] W.F. Clocksin and C.S. Mellish. Programming in Prolog. Springer-Verlag, 1987. [58] Bob Cmelik and David Keppel. Shade: A Fast Instruction Set Simulator for Execution Profiling. In Sigmetrics 94, pages 128–138, June 1994.
References
285
[59] A. Colgan and P. Hardee. Advancing Transaction Level Modeling: Linking the OSCI and OCP-IP Worlds at Transaction Level, http://www.opensystemspublishing.com/whitepapers. [60] C.T.I. Comete. CODESIGN: Conception conjointe logiciel-mat´eriel, in French. Eyrolles, 1998. [61] James Coplien, Daniel Hoffman, and David Weiss. Commonality and Variability in Software Engineering. IEEE Software, 15(6):37–45, 1998. James O. Coplien, Daniel M. Hoffman, and David M. Weiss. Commonality and Variability in Software Engineering. [62] Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Graph Algorithms, chapter 23, pages 485–488. MIT Press, 2nd printing, 1994. [63] Gy¨orgy Csert´an, G´abor Huszerl, Istv´an Majzik, Zsigmond Pap, Andr´as Pataricza, and D´aniel Varr´o. VIATRA “Visual Automated Transformations for Formal Verification and Validation of UML Models.” In ASE’02: Proceedings of the 17th IEEE International Conference on Automated Software Engineering, page 267, 2002. ˇ [64] Robertas Damaˇseviˇcius, Giedrius Majauskas, and Vytautas Stuikys. Application of Design Patterns for Hardware Design. In DAC’03, Proc. of the 40th International Design Automation Conference, pages 48–53, Anaheim, California, USA, June 2003. ACM Press. [65] Chris J. Date. An Introduction to Database Systems, 7th ed. Addison Wesley Longman, 2000. [66] Stefan Decker, Sergey Melnik, Frank Van Harmelen, Dieter Fensel, Michel Klein, Jeen Broekstra, Michael Erdmann, and Ian Horrocks. The Semantic Web: the Roles of XML and RDF. Internet Computing, IEEE, 15(3):63–74, Sept-Oct 2000. [67] Fredrik Degerlund, Marina Walden, and Kaisa Sere. Implementation Issues Concerning the Action Systems Formalism. In PDCAT ’07: Proceedings of the Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies, pages 471–479, Washington, DC, USA, 2007. IEEE Computer Society. [68] Louise A. Dennis, Graham Collins, Michael Norrish, Richard J. Boulton, Konrad Slind, Graham Robinson, Michael J. C. Gordon, and Thomas F. Melham. The PROSPER Toolkit. In International Journal on Software Tools for Technology Transfer vol. 4, n. 2, 2003. [69] Paolo Destro, Franco Fummi, and Graziano Pravadelli. A Smooth Refinement Flow for Co-Designing HW and SW Threads. In DATE’07: Proc. of the Conference on Design, Automation and Test in Europe, pages 105–110, 2007. [70] Adam Donlin. Transaction level modeling: flows and use models. In CODES+ISSS ’04: Proceedings of the 2nd IEEE/ACM/IFIP international
286
System level design with .Net technology conference on Hardware/software codesign and system synthesis, pages 75– 80, New York, NY, USA, 2004. ACM.
[71] Fr´ed´eric Doucet, Sandeep Shukla, and Rajesh Gupta. Introspection in SystemLevel Language Frameworks: Meta-level vs. Integrated. In DATE’03: Design Automation and Test in Europe Conference, pages 382–387, Munich, Germany, 2003. IEEE Computer Society. [72] Mathieu Dubois and El Mostapha Aboulhamid. Techniques to Improve Cosimulation and Interoperability of Heterogeneous Models. Electronics, Circuits and Systems, 2005. ICECS 2005. 12th IEEE International Conference on Electronics Circuits and Systems, pages 1–4, Dec. 2005. [73] Mathieu Dubois, El Mostapha Aboulhamid, and Fr´ed´eric Rousseau. Acceleration for a Compiled Transaction Level Modeling Simulation. Electronics, Circuits and Systems, 2006. ICECS ’06. 13th IEEE International Conference on Electronics Circuits and Systems, pages 1176–1179, Dec. 2006. [74] Matthew B. Dwyer, George S. Avrunin, and James C. Corbett. Patterns in Property Specifications for Finite-State Verification. In ICSE ’99: Proceedings of the 21st international conference on Software engineering, pages 411– 420, New York, NY, USA, 1999. ACM. [75] “eCosCentric.” eCos homepage, http://ecos.sourceware.org/. [76] Stephen Edwards, Luciano Lavagno, Edward A. Lee, and Alberto Sangiovanni-Vincentelli. Design of Embedded Systems: Formal Models, Validation, and Synthesis. Proc. of the IEEE, 85(3):366–390, March 1997. [77] Cindy Eisner, Dana Fisman, John Havlicek, Yoad Lustig, Anthony McIsaac, and David Van Campenhout. Reasoning with Temporal Logic on Truncated Paths. In 15th International Conference on Computer Aided Verification, 2003. [78] A. El-Aboudi and El Mostapha Aboulhamid. An Algorithm for the Verification of Timing Diagrams Realizability. In ISCAS (1), pages 314–317, 1999. [79] A. El-Aboudi, El Mostapha Aboulhamid, and Eduard Cerny. Verificatiom of Synchronous Realizability of Interfaces from Timing Diagram Specifications. Microelectronics, 1998. ICM ’98. Proceedings of the Tenth International Conference on, pages 103–106, 1998. [80] Jan Ellsberger, Dieter Hogrefe, and Amardeo Sarma. SDL: Formal Objectoriented Language for Communicating Systems. Prentice Hall, 1997. [81] B. Berar et al. Systems and Software Verification, Model-Checking Techniques and Tools. Springer-Verlag, 2001. [82] Alessandro Fantechi, Stefania Gnesi, G. Lami, and A. Maccari. Application of Linguistic Techniques for Use Case Analysis. In RE ’02: Proceedings of the 10th Anniversary IEEE Joint International Conference on Requirements
References
287
Engineering, pages 157–164, Washington, DC, USA, 2002. IEEE Computer Society. [83] International Technology Roadmap for Semiconductor. 2004 Edition. In http://public.itrs.net/, 2004. [84] Christopher P. Fuhrman. Lightweight Models for Interpreting Informal Specifications. Requirements Engineering, 8(4):206–221, 2003. [85] Ariel Fuxman, Lin Liu, John Mylopoulos, Marco Roveri, and Paolo Traverso. Specifying and Analyzing Early Requirements in Tropos. Requirements Engineering, 9(2):132–150, 2004. [86] Anthony Joseph Gahlinger. Coherence and Satisfiability of Waveform Timing Specifications. PhD thesis, Waterloo, Ont., Canada, 1990. [87] Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong. Specification and Design of Embedded Systems. Prentice Hall, 1994. [88] Daniel D. Gajski, Jianwen Zhu, Rainer D¨omer, Andreas Gerstlauer, and Shuquing Zhao. SpecC: Specification Language and Methodology. Kluwer Academic Publishers, Boston, March 2000. [89] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison Wesley, 1994. [90] Paul Gastin and Denis Oddoux. Fast LTL to Buchi Automata Translation. In CAV ’01: Proceedings of the 13th International Conference on Computer Aided Verification, pages 53–65, London, UK, 2001. Springer-Verlag. [91] Patrice Gerin, Hao Shen, Alexandre Chureau, and Ahmed Jerraya. Flexible and Executable Hardware/Software Interface Modeling for Multiprocessor SoC Design Using SystemC. In ASPDAC’07: Proc. of the Asia South Pacific Design Automation Conference, pages 390–395, 2007. [92] Andreas Gerstlauer, Haobo Yu, and Daniel D. Gajski. RTOS Modeling for System Level Design. In DATE’03: Proc. of the Design Automation and Test in Europe Conference, pages 130–135, 2003. [93] Rob Gerth, Doron Peled, Moshe Y. Vardi, and Pierre Wolper. Simple onthe-fly Automatic Verification of Linear Temporal Logic. In Proceedings of the Fifteenth IFIP WG6.1 International Symposium on Protocol Specification, Testing and Verification XV, pages 3–18, London, UK, 1995. Chapman & Hall, Ltd. [94] Dimitra Giannakopoulou and Klaus Havelund. Automata-Based Verification of Temporal Properties on Running Programs. In ASE ’01: Proceedings of the 16th IEEE international conference on Automated software engineering, page 412, Washington, DC, USA, 2001. IEEE Computer Society.
288
System level design with .Net technology
[95] Bruno Girodias, El Mostapha Aboulhamid, and Gabriela Nicolescu. A Platform for Refinement of OS Services for Embedded Systems. In DELTA ’06: Proceedings of the Third IEEE International Workshop on Electronic Design, Test and Applications, pages 227–236, 2006. [96] Maya B. Gokhale and Janice M. Stone. NAPA C: Compiling for a Hybrid RISC/FPGA Architecture. In FCCM ’98: Proc. of the IEEE Symposium on FPGAs for Custom Computing Machines, page 126, 1998. [97] Nicolas Gorse, Pascale B´elanger, El Mostapha Aboulhamid, and Yvon Savaria. Mixing Linguistic and Formal Techniques for High-Level Requirements Engineering. Proceedings of the 16th IEEE International Conference on Microelectronics, Tunisia, 2004. [98] Nicolas Gorse, Pascale B´elanger, Alexandre Chureau, El Mostapha Aboulhamid, and Yvon Savaria. A High-Level Requirements Engineering Methodology for Electronic System-Level Design. Comput. Electr. Eng., 33(4):249– 268, 2007. [99] K John Gough. Stacking them up: a Comparison of Virtual Machines. Australasian Computer Science Communnication, 23(4):55–61, 2001. [100] Thorsten Grotker, Stan Liao, Grant Martin, and Stuart Swan. System Design with SystemC. Kluwer Academic Publishers, Norwell, MA, USA, 2002. [101] Michael Grove and Andrew Schain. POPS NASAs Expertise Location Service Powered by Semantic Web Technologies. Technical report, W3C Semantic Web Case Studies and Use Cases - http://www.w3.org/2001/sw/sweo/ public/UseCases/Nasa/Nasa.pdf, 2008. [102] Yann-Ga¨el Gu´eh´eneuc and Herv´e Albin-Amiot. Recovering Binary Class Relationships: Putting Icing on the UML Cake. In Doug C. Schmidt, editor, Proceedings of the 19th Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 301–314. ACM Press, October 2004. [103] Yann-Ga¨el Gu´eh´eneuc and Giuliano Antoniol. DeMIMA: A Multi-layered Framework for Design Pattern Identification. Transactions on Software Engineering, 34(5):667–684, September 2008. [104] Rachid Guerraoui, Maurice Herlihy, and Bastian Pochon. Polymorphic Contention Management. In DISC ’05: Proceedings of the nineteenth International Symposium on Distributed Computing, pages 303–323. LNCS, Springer, Sep 2005. [105] Elliotte R. Harold and Scott W. Means. XML in a Nutshell, Third Edition. O’Reilly Media, Inc., October 2004. [106] Tim Harris and Keir Fraser. Language Support for Lightweight Transactions. In OOPSLA’03: Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, pages 388–402, New York, NY, USA, 2003. ACM.
References
289
[107] Tim Harris, Simon Marlow, Simon Peyton Jones, and Maurice Herlihy. Composable Memory Transactions. Commun. ACM, 51(8):91–100, 2008. [108] Abdelsalam Hassan, Keishi Sakanushi, Yoshinori Takeuchi, and Masaharu Imai. RTK-Spec TRON: A Simulation Model of an ITRON Based RTOS Kernel in SystemC. In DATE’05: Proc. of the Design Automation and Test in Europe Conference, pages 554–559, 2005. [109] Claude Helmstetter, Florence Maraninchi, Laurent Maillet-Contoz, and Matthieu Moy. Automatic generation of schedulings for improving the test coverage of systems-on-a-chip. In FMCAD ’06: Proceedings of the Formal Methods in Computer Aided Design, pages 171–178, Washington, DC, USA, 2006. IEEE Computer Society. [110] John L. Hennessy and David A. Patterson. Computer Architecture, a Quantitative Approach. Morgan Kaufmann Publisher, Inc, 1990. [111] John L. Hennessy and David A. Patterson. Computer Architecture and Design, The Hardware/Software Interface. Morgan Kaufmann Publisher, Inc, 2003. [112] Maurice Herlihy. Obstruction-free Synchronization: Double-ended Queues as an Example. In Proceedings of the 23rd International Conference on Distributed Computing Systems, pages 522–529. IEEE Computer Society, 2003. [113] Patrick Heymans and Eric Dubois. Scenario-Based Techniques for Supporting the Elaboration and the Validation of Formal Requirements. Requirements Engineering, 3(3/4):202–218, 1998. [114] R. Hilderink and T. Gr¨otker. Transaction-level Modeling of Bus-based Systems with SystemC 2.0. Synopsys, Inc. [115] C. A. R. Hoare. Communicating Sequential Processes. Prentice Hall, 1985. [116] C. A. R. Hoare. Towards a Theory of Parallel Programming in the Origin of Concurrent Programming from Semaphores to Remote Procedure Cells, Springer-Verlag, pp. 231–244, 2002. [117] A. Horn. On Sentences Which are True of Direct Unions of Algebras. Journal of Symbolic Logic, pages 14–21, 1951. [118] Ian Horrocks, Peter F. Patel-Schneider, Harold Boley, Said Tabet, Benjamin Grosof, and Mike Dean. SWRL: A Semantic Web Rule Language Combining OWL and RuleML. Technical report, W3C Member Submission - http://www.w3.org/Submission/2004/SUBM-SWRL-20040521/, 21 May 2004. [119] Intel. IXP45X datasheet, http://www.intel.com/design/network/datashts/ 306261.htm. [120] ITRS. International technology roadmap for semiconductors design, 2007. http://www.itrs.net/Links/2007ITRS/2007 Chapters/2007 Design.pdf.
290
System level design with .Net technology
[121] ITU. Recommendation Z.120: Message Sequence Chart (MSC). 1996. [122] Glenn Jennings. A Case Against Event Driven Simulation for Digital System Design. In 24th Annual Simulation Symposium, pages 170–175, April 1991. [123] Glenn Jennings. A Case Against Event-driven Simulation for Digital System Design . In Simulation Symposium, 1991. Proceedings of the 24th Annual, pages 170–176, 1991. [124] Weixing Ji, Feng Shi, and Baojun Qiao. The Design of a Novel Object Processor: OOMIPS. In Proceedings of the 18th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2007), July 2007. [125] Viraj Kamat. Towards slicing vhdl. Master’s thesis, Indian Institute of Technology, Bombay, 2003. [126] Torsten Kempf, Kingshuk Karuri, Stefan Wallentowitz, Gerd Ascheid, Rainer Leupers, and Heinrich Meyr. A SW Performance Estimation Framework for Early System-Level-Design using Fine-Grained Instrumentation. In DATE’06: Proc. of the Design Automation and Test in Europe Conference, pages 468–473, 2006. [127] K. Khordoc and E. Cerny. Semantics and Verification of Action Diagrams with Linear Timing. ACM Trans. Des. Autom. Electron. Syst., 3(1):21–50, 1998. [128] Albert Carl Jan Kienhuis. Design Space Exploration of Stream-based Dataflow Architctures: Methods and Tools. PhD thesis, Delft University of Technology, 1999. [129] Holger Knublauch, Mark A. Musen, and Alan L. Rector. Editing Description Logic Ontologies with the Protege OWL Plugin. In Description Logics, 2004. [130] Donald E. Knuth. The Stanford GraphBase, chapter Roget Components, pages 512–519. Addison-Wesley Publishing Company, 1994. [131] Cedric Koch-Hofer, Marc Renaudin, Yvain Thonnart, and Pascal Vivet. ASC, a SystemC Extension for Modeling Asynchronous Systems, and Its Application to an Asynchronous NoC. In NOCS ’07: Proc. of the First International Symposium on Networks-on-Chip, pages 295–306, Washington, DC, USA, 2007. IEEE Computer Society. [132] Thomas Kropf. Introduction to Formal Hardware Verification: Methods and Tools for Designing Correct Circuits and Systems. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1999. [133] Wido Kruijtzer, Pieter van der Wolf, Erwin de Kock, Jan Stuyt, Wolfgang Ecker, Albrecht Mayer, Serge Hustin, Christophe Amerijckx, Serge de Paoli, and Emmanuel Vaumorin. Industrial IP Integration Flows Based on IPXACTTM Standards. In DATE’08: Proc. of the conference on Design, automation and test in Europe, pages 32–37, New York, NY, USA, 2008. ACM.
References
291
[134] Marcello Lajolo, Mihai Lazarescu, and Alberto Sangiovanni-Vincentelli. A Compilation-Based Software Estimation Scheme for Hardware/Software CoSimulation. In Proc. of the International Workshop on Hardware/Software Codesign, pages 85–89, May 1999. [135] J. Lapalme, E.M. Aboulhamid, G. Nicolescu, L. Charest, F.R. Boyer, J.P. David, and G. Bois. .NET Framework - a Solution for the Next Generation Tools for System-Level Modeling and Simulation. In DATE’04: Proc. of the Design Automation and Test in Europe Conference, volume 1, pages 732–733, 2004. [136] James Lapalme. ESys.NET : A New .NET based System-Level Design Environment. Master’s thesis, Universite de Montreal, 2003. [137] James Lapalme, El Mostapha Aboulhamid, and Gabriela Nicolescu. A new efficient EDA tool design methodology. ACM Transaction on Embedded Computing Systems, 5(2):408–430, 2006. [138] James Lapalme, El Mostapha Aboulhamid, Gabriela Nicolescu, Luc Charest, Franois R. Boyer, Jean Pierre David, and Guy Bois. ESys.NET: a New Solution for Embedded Systems Modeling and Simulation. SIGPLAN Not., 39(7):107–114, 2004. [139] James Lapalme, El Mostapha Aboulhamid, Gabriela Nicolescu, Luc Charest, Franois R. Boyer, Jean Pierre David, and Guy Bois. .NET Framework - a Solution for the Next Generation Tools for System-Level Modeling and Simulation. DATE’04: Proc. of Design Automation and Test in Europe Conference, 1:732–733, Feb. 2004. [140] James Lapalme, El Mostapha Aboulhamid, Gabriela Nicolescu, and Fr´ed´eric Rousseau. Separating Modeling and Simulation Aspects in Hardware/Software System Design. Microelectronics, 2006. ICM ’06. International Conference on, pages 202–205, Dec. 2006. [141] James Larus and Christos Kozyrakis. Transactional Memory. Commun. ACM, 51(7):80–88, 2008. [142] James R. Larus and Ravi Rajwar. Transactional Memory. Morgan & Claypool, 2006. [143] Edward A. Lee and Stephen Neuendorffer. MoML A Modeling Markup Language in XML, Version 0.4. Technical Report ERL/UCB M 00/12, University of California at Berkeley, 2000. [144] Jesse Liberty. Programming C#: Attributes and Reflection, O’Reilly, http://www.ondotnet.com/pub/a/dotnet/excerpt/prog csharp ch18/index.html, 2001. [145] D. B. Lomet. Process Structuring, Synchronization, and Recovery Using Atomic Actions. In Proceedings of an ACM conference on Language design for reliable software, pages 128–137, 1977.
292
System level design with .Net technology
[146] Z. Manna and A. Pnueli. Temporal Verification of Reactive Systems: Safety. Springer, 1995. [147] Grant Martin. Systemc tools. In European SystemC Users Group Meeting. [148] Kenneth L. McMillan and David L. Dill. Algorithms for Interface Timing Verification. In ICCD ’92: Proc. of the IEEE International Conference on Computer Design on VLSI in Computer & Processors, pages 48–51, 1992. [149] Nenad Medvidovic and Richard N. Taylor. A Classification and Comparison Framework for Software Architecture Description Languages. IEEE Transactions on Software Engineering, 26(1):70–93, 2000. [150] A. Mel’cuk. Dependency in Linguistic Description. http://www.olst.umon treal.ca/FrEng/Dependency.pdf. [151] Giovanni De Micheli. Synthesis and Optomization of Digital Circuits. McGraw-Hill, USA, 1994. [152] Rocco Le Moigne, Olivier Pasquier, and Jean-Paul Calvez. A Generic RTOS Model for Real-time Systems Simulation with SystemC. In DATE’04: Proc. of the Design Automation and Test in Europe Conference, pages 82–87, 2004. [153] Matthieu Moy, Florence Maraninchi, and Laurent Maillet-Contoz. Pinapa: an Extraction Tool for SystemC Descriptions of Systems-on-a-Chip. In EMSOFT ’05: Proceedings of the 5th ACM international conference on Embedded software, pages 317–324, New York, NY, USA, 2005. ACM. [154] Jos´e M. Moya, Fernando Rinc´on, Francisco Moya, and Juan Carlos L´opez. Improving Embedded System Design by Means of HW-SW Compilation on Reconfigurable Coprocessors. In ISSS’02: Proceedings of the 15th International Symposium on System Synthesis, pages 255–260, 2002. [155] MSDN. Dynamic-link libraries, http://msdn.microsoft.com. [156] Eric K. Neumann and Dennis Quan. Biodash: A Semantic Web Dashboard for Drug Development. In Pacific Symposium on Biocomputing, pages 176–187, 2006. [157] James Newkirk and Alexei A. Vorontsov. How .NET’s Custom Attributes Affect Design. IEEE Software, 19(5):18–20, 2002. [158] Ralf Niemann. Hardware/Software Co-Design for Data Flow Dominated Embedded Systems. Kluwer Academic Publishers, Norwell, MA, USA, 1998. [159] Open SystemC Initiative (OSCI). Functional specification for SystemC 2.0, 2001. http://www.systemc.org. [160] Open SystemC Initiative (OSCI). SystemC user guide, 2001. http://www. systemc.org. R [161] Samir Palnitkar. Verilog HDL: a Guide to Digital Design and Synthesis, second edition. Prentice Hall Press, Upper Saddle River, NJ, USA, 2003.
References
293
[162] T. Parr. Stringtemplate documentation, http://www.antlr.org/stringtemplate/ index.tml, 2003. [163] Terence Parr. The Definitive ANTLR Reference: Building Domain-Specific Languages. Pragmatic Programmers. Pragmatic Bookshelf, first edition, May 2007. [164] Claudio Passerone, Massimiliano Chiodo, Wilsin Gosti, Luciano Lavagno, and Alberto Sangiovanni-Vincentelli. Evaluation of Trade-offs in the Design of Embedded Systems via Co-Simulation. Technical Report UCB-ERL-9612, University of California, Berkeley, Computer Science Department, University of California, Berkeley, 1996. [165] H. D. Patel, D A. Mathaikutty, D. Berner, and S. K. Shukla. SystemCXML: An extensible systemc front end using XML, http://systemcxml. sourceforge.net/. http://systemcxml.sourceforge.net/, 2005. [166] Hiren D. Patel and Sandeep K. Shukla. Tackling an abstraction gap: cosimulating SystemC DE with bluespec ESL. In DATE’07: Proceedings of the conference on Design, automation and test in Europe, pages 279–284, San Jose, CA, USA, 2007. EDA Consortium. [167] James A. Payne. Introduction to Simulation: Programming Techniques and Methods of Analysis, chapter 2, pages 11–22. McGraw-Hill, 1982. [168] F. Pereira and D.H.D. Warren. Definite Clause Grammars for Language Analysis - a Survey of the Formalism and a Comparison with Augmented Transition Networks. In Artificial Intelligence Journal, Vol. 13, 1980. [169] Hector G. Perez-Gonzalez and Jugal K. Kalita. GOOAL: a Graphic Object Oriented Analysis Laboratory. In OOPSLA ’02: Companion of the 17th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pages 38–39, New York, NY, USA, 2002. ACM. [170] Fr´ed´eric P´etrot, Denis Hommais, and Alain Greiner. A Simulation Environment for Core Based Embedded Systems. In Proc. of the 30th Int. Simulation Symp., pages 86–91, Atlanta, Georgia, April 1997. [171] Fr´ed´eric P´etrot, Denis Hommais, and Alain Greiner. Cycle Precise Core Based Hardware/Software System Simulation with Predictable Event Propagation. In Proceeding of the 23rd Euromicro Conference, pages 182–187, Budapest, Hungary, September 1997. IEEE. [172] A. Pnueli. The Temporal Logic of Programs. In Proceedings of the 18th IEEE Symposium on Foundations of Computer Science, pages 46–67, 1977. ´ [173] Hector Posadas, Jes´us Adamez, Pablo S´anchez, Eugenio Villar, and Francisco Blasco. POSIX Modeling in SystemC. In ASPDAC’05: Proc. of the Asia South Pacific Design Automation Conference, pages 485–490, 2006.
294
System level design with .Net technology
[174] E. Prudhommeaux and A. Seaborne. SPARQL Query Language for RDF. Technical Report REC-rdf-schema-20040210, World Wide Web Consortium, Jan. 2008. [175] R. Kaye. Seamless with C-bridge: C Based Co-Verification. In Technical Papers, Mentor, page 27, 2002. [176] H. Reichel, R. Deutschmann, M. Fruth and H.-C. Reuss. Trace Checking with Real-Time Specifications. In 5th Symposium on Formal Methods for Automation and Safety in Railway and Automotive Systems, 2004. [177] Debbie Richards. Merging Individual Conceptual Models of Requirements. Requir. Eng., 8(4):195–205, 2003. [178] D. F. Robinson and L. R. Foulds. Acyclic digraphs. In Digraphs: Theory and Techniques, chapter 3.6, pages 86–90. Gordon and Breach Scientific Publishers, 1980. [179] D. F. Robinson and L. R. Foulds. Digraph structure. In Digraphs: Theory and Techniques, chapter 2, pages 43–62. Gordon and Breach Scientific Publishers, 1980. [180] Bertil Roslund and Per Andersson. A Flexible Technique for OS-Support in Instruction Level Simulators. In 27th Annual Simulation Symposium, pages 134–141, La Jolla, CA, April 1994. SCS, IEEE Press. [181] Luc S´em´eria, Koichi Sato, and Giovanni De Micheli. Synthesis of Hardware Models in C with Pointers and Complex Data Structures. IEEE Transactions on VLSI Systems, 9(6):743–756, 2001. [182] Wuwei Shen, Kevin Compton, and James Huggins. A UML Validation Toolset Based on Abstract State Machines. In ASE ’01: Proceedings of the 16th IEEE international conference on Automated software engineering, page 315, Washington, DC, USA, 2001. IEEE Computer Society. [183] Evren Sirin, Bijan Parsia, Bernardo Cuenca Grau, Aditya Kalyanpur, and Yarden Katz. Pellet: A Practical OWL-DL Reasoner. Web Semantics: Science, Services and Agents on the World Wide Web, 5(2):51–53, 2007. [184] S. Smith, M. Ray Mercer, and B. Brock. Demand Driven Simulation: BACKSIM. In DAC’87: 24st Design Automation Conference, pages 181–187, San Diego, CA, June 1987. ACM/IEEE, IEEE Press. [185] S. Stuijk. Predictable Mapping of Streaming Applications on Multiprocessors. PhD thesis, Eindhoven University of Technology, The Netherlands, 2007. [186] Ralf’s Sudelbcher. Nst transactional memory, 2007. http://weblogs.asp.net/ ralfw/archive/tags/Software+Transactional+Memory/default.aspx. [187] Stuart Sutherland. SystemVerilog tutorial. http://www.systemverilog.org/ techpapers/techpapers.html, 2003.
References
295
[188] Stuart Sutherland, Simon Davidmann, Peter Flake, and Phil Moorby. System Verilog for Design: A Guide to Using System Verilog for Hardware Design and Modeling. Kluwer Academic Publishers, Norwell, MA, USA, 2004. [189] B. D. Theelen. Performance Model Generation for MPSoC Design-Space Exploration. In QEST ’08: Proceedings of the 2008 Fifth International Conference on Quantitative Evaluation of Systems, pages 39–40, Washington, DC, USA, 2008. IEEE Computer Society. [190] K. C. Thramboulidis, G. Doukas, and G. Koumoutsos. A SOA-based Embedded Systems Development Environment for Industrial Automation. EURASIP J. Embedded Syst., 2008(1):1–15, 2008. [191] Walter Tibboel, Victor Reyes, Martin Klompstra, and Dennis Alders. SystemLevel Design Flow Based on a Functional Reference for HW and SW. In DAC’07: Proc. of the Design Automation Conference, pages 23–28, 2007. [192] Frank Vahid and Tony Givargis. Embedded System Design: A Unified Hardware/Software Introduction. John Wiley & Sons, Inc., New York, NY, USA, 2001. [193] P.H.A. van der Putten and J.P.M. Voeten. Specification of Reactive Hardware/Software Systems: The Method Software/Hardware Engineering (SHE). Ph.d., Eindhoven University of Technology, 1997. [194] Peter Vanbekbergen, Gert Goosens, and Hugo De Man. Specification and analysis of timing constraints in signal transition graph. In DAC’92: Proceedings of the 29th annual conference on Design automation, pages 302–306, New York, NY, USA, 1992. ACM. [195] Emmanuel Viaud, Frano¸is Pˆecheux, and Alain Greiner. An Efficient TLM/T Modeling and Simulation Environment Based on Conservative Parallel Discrete Event Principles. In DATE’06: Proc. of the Design Automation and Test in Europe Conference, pages 94–99, 2006. [196] W3C. XSL Transformations (XSLT), 1999. [197] Elizabeth A. Walkup and Gaetano Borriello. Interface Timing Verification with Application to Synthesis. In DAC’94: Proceedings of the 31st annual conference on Design automation, pages 106–112, New York, NY, USA, 1994. ACM. [198] Gerhard Weikum and Gottfried Vossen. Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001. [199] Wikipedia. Component Object Model — Wikipedia, The Free Encyclopedia. [200] Wikipedia. .NET Reflector, 2008. http://en.wikipedia.org/wiki/.NET Reflector. [201] Wikipedia. Managed code — Wikipedia, The Free Encyclopedia. http://en. wikipedia.org/wiki/Managed code, 2009.
296
System level design with .Net technology
[202] Wikipedia. P/Invoke — Wikipedia, The Free Encyclopedia, 2009. [203] Wikipedia. Virtual machine — Wikipedia, The Free Encyclopedia. http://en. wikipedia.org/wiki/Virtual machine, 2009. [204] Pierre Wolper, Moshe Y. Vardi, and A. Prasad Sistla. Reasoning about Infinite Computation Paths. In Proceedings of the 24th IEEE Symposium on Foundations of Computer Science, pages 185–194, 1983. [205] World Wide Web Consortium (W3C). XML specification. http://www.w3c. org, 2003. [206] Roel Wuyts. Declarative Reasoning About the Structure of Object-Oriented Systems. In Joseph Gil, editor, Proceedings of the 26th Conference on the Technology of Object-Oriented Languages and Systems, pages 112–124. IEEE Computer Society Press, August 1998. [207] Sudhakar Yalamanchili. Introductory VHDL: From Simulation To Synthesis. Prentice Hall, Inc., Upper Saddle River, NJ, USA, 2004. [208] Zhi Alex Ye, Nagaraj Shenoy, and Prithviraj Baneijee. A C Compiler for a Processor with a Reconfigurable Functional Unit. In FPGA ’00: Proc. of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays, pages 95–100, 2000. [209] Ti-Yen Yen, Alex Ishii, Al Casavant, and Wayne Wolf. Efficient algorithms for interface timing verification. Form. Methods Syst. Des., 12(3):241–265, 1998. [210] Ti-Yen Yen, Wayne Wolf, Al Casavant, and Alex Ishii. Efficient Algorithms for Interface Timing Verification. In EURO-DAC’94: Proceedings of the conference on European design automation, pages 34–39, Los Alamitos, CA, USA, 1994. IEEE Computer Society Press. [211] Hong Zhu and Lingzi Jin. Scenario Analysis in an Automated Tool for Requirements Engineering. Requirements Engineering, 5(1):2–22, 2000.
Index
Symbols .NET Framework. . . . . . . . 3, 242, 273 A Abstraction Levels Cycle Accurate . . . . . . . 156, 157 Transaction Accurate . 156, 167, 170 Adapter . . . . . . . . . . . . . . . . . . . . . . . 102 ADL . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 ALU . . . . . . . . . . . . . . . . . . . . . . . . . . 100 ASLD . . . . . . . . . . . . . . . . . . . . . . . . . 22 ASML . . . . . . . . . . . . . . . . . . . . . . . . 241 Aspect-Oriented Programming . . 142 Atomic instruction . . . . . . . . . . . . . 127 Atomicity . . . . . . . . . . . . . . . . . . . . . 123 Attribute programming . . . . . . . . . 139 b Buchi . . . . . . . . . . . . . . . . . . . . . . . . . 275 C C# . . . . . . . . . . . . . . . . . . . . . . . . . 4, 242 CAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Callback . . . . . . . . . . . . . . . . . . . . . . . . 6 CASM . . . . . . . . . . . . . . . . . . . . . . . . . 15 Cast . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 CFG . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 CIL . . . . . . . . . . . . . . . . . . . . . . . . . 3, 14 Class. . . . . . . . . . . . . . . . . . . . . . . . . . .92 Classes . . . . . . . . . . . . . . . . . . . . . . . . . 92 CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 CLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Co-routine . . . . . . . . . . . . . . . . . . . . 121 Concurrency control . . . . . . . . . . . . 127 Concurrent execution . . . . . . . . . . . 123 Conflicting operations . . . . . . . . . . 124
Consistency . . . . . . . . . . . . . . . . . . . . 29 Contention manager . . . . . . . . . . . . 132 CPU-subsystem . . . . . . . . . . . . . . . . 167 CTS . . . . . . . . . . . . . . . . . . . . . . . . . . 3, 4 D Delegate . . . . . . . . . . . . . . . . . . . . . . . . . 6 Design Environment . . . . . . . . . . . . . 61 Design Patterns . . . . . . . . . . . . . . . . . 90 Behavioral . . . . . . . . . . . 103, 113 Observer . . . . . . . . . . . . . . . 103 Creational . . . . . . . . . . . . 101, 113 Prototype . . . . . . . . . . . . . . . 101 Singleton . . . . . . . . . . . . . . . 101 Hardware Pipeline . . . . . . . . . . . . . . . . . 90 Structural . . . . . . . . . . . . 101, 113 Adapter . . . . . . . . . . . . . . . . 102 Fac¸ade . . . . . . . . . . . . . . . . . 102 DFG . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Differed update . . . . . . . . . . . . . . . . 129 Direct update . . . . . . . . . . . . . . . . . . 129 Dynamic programming . . . . . . . . . 137 E EDA . . . . . . . . . . . . . . . . . . . . . . . . . 9, 56 Embedded systems . . . . . . . . . . . . . 241 Encapsulation . . . . . . . . . . . . . . . . . . . 92 ESys.NET . . . . . . . . . . . . . . . . 3, 8, 242 Event-driven scheduling . . . . . . . . 119 Execution Unit. . . . . . . . . . . . . . . . .173 F Fac¸ade . . . . . . . . . . . . . . . . . . . . . . . . 102 Finite State Machine . . . . . . . . . . . 158 Mealy . . . . . . . . . . . . . . . . . . . . 158 Moore . . . . . . . . . . . . . . . . . . . . 158
297
298 Formal techniques . . . . . . . . . . . . . . . 30 Formal verification . . . . . . . . . . . . . . 32 G GSRC . . . . . . . . . . . . . . . . . . . . . . . . . 57 H Hardware Abstraction Layer 167, 169 platform specific . . . . . . . . . . 170 processor specific . . . . . . . . . . 169 Hardware node . . . . . . . . . . . . . . . . 167 Hardware/Software interface . . . . 156 Hook-point . . . . . . . . . . . . . . . . . . . . . 22 I IDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Inheritance . . . . . . . . . . . . . . . . . . . . . 97 Instantiation . . . . . . . . . . . . . . . . . . . . 93 Intellectual Properties . . . . . . . . . . 102 Introspection . . . . . . . . 5, 10, 261, 270 IP . . . . . . . . . . . . . . . . . . . . . . . . . 56, 102 J Jena . . . . . . . . . . . . . . . . . . . . . 69, 77, 83 L Linguistic . . . . . . . . . . . . . . . . . . . . . . 30 Look-Up Table . . . . . . . . . . . . . . . . . . 96 LTL . . . . . . . . . 20, 270, 272, 274, 275 M Members . . . . . . . . . . . . . . . . . . . . . . . 92 Metadata . 4, 5, 9, 55, 57, 61, 75, 241 Method Calls . . . . . . . . . . . . . . . . . . . 94 MoA . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 MoC . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Modeling . . . . . . . . . . . . . . . . . . . . . . . 30 MoML . . . . . . . . . . . . . . . . . . . . . . . . . 57 MPSoC . . . . . . . . . . . . . . . . . . . . . . . . . 1 N Native software execution . . . . . . . . . . . . . . . . . 167 simulation . . . . . . . . . . . . . . . . 171 Non-determinism . . . . . . . . . . . . . . 120
Index O Observer . . . . . . . . . . . . . . . . . . . . . . 103 Observer based verification. . . . . . .20 OWL . . . . . . . . . . . . . 56, 66, 76, 77, 80 P Polymorphism . . . . . . . . . . . . . . . . . . 94 Prototype . . . . . . . . . . . . . . . . . . . . . 101 R RDF . . . . . . . . . . . . . . . . . 56, 64, 77, 80 RDFS . . . . . . . . . . . . . . . . . . . 65, 77, 80 Reflectivity . . . . . . . . . . . . . 5, 6, 10, 20 Requirements . . . . . . . . . . . . . . . . . . . 29 RTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 RTLTL . . . . . . . . . . . . . . . . . . . 272, 275 S Scheduling Algorithm . . . . . . . . . . 165 SCR . . . . . . . . . . . . . . . . . . . . . . . . 61, 63 Semantic Web . . . . . . . . . 2, 55, 63, 76 Separation of concerns . . . . . . . . 2, 17 Serializability . . . . . . . . . . . . . . . . . . 124 Serializability graph . . . . . . . . . . . . 124 Simulation . . 255, 262, 265, 267, 277 Demand-driven . . . . . . . . . . . . 162 Event-driven . . . . . . . . . . . . . . 162 Relaxed . . . . . . . . . . . . . . . . . . 162 Singleton . . . . . . . . . . . . . . . . . . . . . . 101 SLDL . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Software node . . . . . . . . . . . . . . . . . 167 Software Transactional Memory . 122 SPARQL . . . . . . . . . . . . . . . . 68, 77, 82 SpecC . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 SPIRIT . . . . . . . . . . . . . . . . . . 56, 61, 79 Static Scheduling . . . . . . . . . . . . . . 163 Struct . . . . . . . . . . . . . . . . . . . . . . . . . . 92 SWRL . . . . . . . . . . . 70, 77, 79, 83, 85 Synchronization . . . . . . 127, 136, 147 Synchronization overhead . . . . . . . 133 SystemC . . 7, 119, 241, 242, 253, 255 SystemC Verification . . . . . . . . . . . . 23 SystemVerilog . . . . . . . . . . . . . . . . . . . 7
Index T Testbench . . . . . . . . . . . . 243, 245, 276 TGI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 TLM . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Transaction . . . . . . . . . . . . . . . . . . . . 122 Transaction manager . . . . . . . . . . . 129 Two-phase locking . . . . . . . . . . . . . 130 V Validation . . . . . . . . . . . . . . . . . . . . . . 29 Verification . . . . . . . . . . . . . . . 268, 277 VES . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Virtual table . . . . . . . . . . . . . . . . . . . . 95 Vtable . . . . . . . . . . . . . . . . . . . . . . 95, 96 X XML . . . . . . . . . . . . . . . . 56, 72, 73, 75 XPath . . . . . . . . . . . . . . . . . . . . . . 78, 85 XQuery . . . . . . . . . . . . . . . . . . . . . . . . 78 XSD . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
299