This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
> Application Logic/ Component Impl.
<> Interface and Types
1
3
4 generated from
<<model>> Components
<> Component Base 2
3
Fig. 4. Design flow for model-driven component/container implementation
In the first step, we define interfaces and complex types. Since an interface can be used by various components, it is crucial to define them first, independent of the components. In a second step, we define components and their communication ports. Here, we also define some of the communication parameters, for example, whether the communication in a port should happen synchronously or asynchronously. Since this affects the API against which the implementation code is developed, these definitions have to be made before the application logic is implemented (manually) in step four. Before doing this, we generate these interface APIs as well as component base code, for example, base classes in OO, wrappers in C (see step 3). This concludes the component definition phase. In step five we define which component instances we will use, how their ports will be connected, and on which hardware devices these instances will be deployed, as well as other system constraints. All these models together will then be used by the second generation step that creates all the infrastructure code, that is the container, the communication implementations, OS configuration files, build scripts, etc. In a final step (not shown) all the generated code we be compiled and linked using the generated build script, resulting in the final system. Software System families. Using a model-driven software development approach usually pays off only in the context of software system families. The various members, or products of such a family have a spectrum of features in common, allowing systematic reuse. Specifically, the DSLs used to describe the members of the family are usually the same. In order to come up with a suitable DSL, transformer/generator and plat-
Model Driven Development for Component Infrastructures
151
form (together called the domain architecture), the developers need to have a good understanding of the domain for which they develop the infrastructure. Domain analysis techniques can help to deepen this understanding. In practice, developing useful domain architecture happens incrementally and iteratively over a relatively long time, and it is based on experience. Architecture. Software architecture plays an important role in the context of modeldriven software development. Transformations rely on the availability of a well-defined meta model for the source as well as the target. They are literally rules describing how to map the concepts from the source meta model to the concepts provided by the target meta model. In order for transformations not to be overly complex, the concepts defined by the meta models must be concise, precise and limited in number. With respect to the final transformation step, that is the one that produces implementation code, this means, that the architecture of the target platform, as well as the mapping of application concepts to this platform needs to be clearly defined.
3
Example: A Simple Weather Station
Overview. This section aims at illustrating our approach with a more concrete example. We use a small distributed weather station for the purposes of the example. The weather station consists of several nodes (micro controller) connected by a bus, on each of these nodes software components will be deployed (see Fig. 5).
Outside Node Temp . Sensor
Bus
Hum. Sensor
Outside of Building Inside Building Main Node
Inside Node Readouts
Bus Temp . Sensor
Hum. Sensor
Fig. 5. Weather station example scenario
The example consists of three nodes, one outside node, the main node and an inside node, connected by a bus. This could be CAN bus, for example. In the course of this example, we will take a look at the following artifacts: – The models necessary to describe a weather station, – The tool chain necessary to validate the models and generate the code, and – How code generation works in detail.
152
Markus Voelter, Christian Salzmann, and Michael Kircher
Models. We use three different models to describe the distributed embedded system of the weather station: – a type model describes interfaces, components and their ports, – a composition model describes component instances and how they are connected, and – a deployment model describes the physical infrastructure and how component instances and connectors are mapped onto it. Interfaces are specified with a textual DSL, similar to CORBA IDL. The following example, shown in Fig. 6 defines the Sensor interface to have three operations, start, stop, and measure. The controller interface has a single operation reportProblem which sensors will use to report problems with the measurement. Instead of a textual model as interface Sensor { operation start():void; operation stop():void; operation measure():float; } interface Controller { operation reportProblem(Sensor s, String errorDesc ):void; } Fig. 6. Example for an interface description
shown in Fig. 6, we could also use a graphical model, as long as the same information is conveyed. The information included in an interface definition is described using a metamodel, such as the one depicted in Fig. 7. As explained above, the meta-model describes the constructs a DSL provides for building models. The meta-model for interface definitions is given in Fig. 7. As one would expect, the meta-model defines interfaces as artifacts that own a number of operations which each having a name, a returning a type as well as a number of parameters (each with a name and a type) as well as exceptions.
Operation Interface
{ordered}
name : String type : String
*
**
Parameter name : String type : String
Exception type : String
Fig. 7. Interface meta-model
The next step in describing a component-based system is the component model. We use a graphical notation for this aspect of the overall model, which uses UML syn-
Model Driven Development for Component Infrastructures
153
tax to be able to build these models in a UML (1.x) tool. We first define two kinds of sensors, TemperatureSensor and HumiditySensor. Both of these have a provided port called measurementPort which offers the operations defined in the sensor interface shown in Fig. 6. In addition, both of these two types of sensors have a required port called controllerPort, through which the sensors expect to communicate with their controller. On top of that, there are two kinds of sensors defined, a control component which provides a controller port and that requires a number of sensors. These artifacts are all illustrated in Fig. 8. Again, we show the meta-model for this aspect of the model in Fig. 9; we use a UML-based concrete syntax, as exemplified in the interface meta-model of Fig. 7, and we extend the UML meta-model.
<> Control <<providedport>> controllerPort
<<requiredport>> sensorsPort
<> Controller
<> Temperatur Sensor unit: String
controllerPort <<requiredport>>
<> Sensor
controllerPort <<requiredport>>
<> Humidity Sensor
measurementPort <<providedport>>
measurementPort <<providedport>>
Fig. 8. Model of components and ports
The Component type extends UML:Class. This is why we model the components as UML classes with a component stereotype in the meta model shown in Fig. 9. A Component has a number of ports. A port is modeled as a subtype of UML::Association. A port references an InterfaceRef, and it cannot technically directly reference interfaces because they are defined in another meta-model. The InterfaceRef plays the role of a proxy [6] for the Interface. Ports are abstract. Concrete subtypes are defined in the form of RequiredPort and ProvidedPort. Also note the concepts of applications, which are components that do not offer any services themselves. This is expressed by the OCL constraint that requires the ports association to contain RequiredPort objects only (illustrated in Fig. 9). Finally, a concrete system must be specified by defining component instances, containers and (hardware) nodes, as well as connections on physical and logical level. We use an XML based concrete syntax for this aspect. The XML code shown in Fig. 10 illustrates one part of the deployment definition of the weather station example. With the background of the previous explanations, and the meta-model for the deployment displayed in Fig 11, the meaning of this model (in Fig. 10) should be understand-
154
Markus Voelter, Christian Salzmann, and Michael Kircher {subsets Features} *
Attributes
UML:: Attribute
UML:: Association
UML::Class
name : String type : String
UML Metamodel
Component
ConfigParam
1
*
*
Port
ports
1
InterfaceRef
* {subsets Attributes}
RequiredPort
ProvidedPort
from
context ConfigParam inv: type == "String"
to
Application Port Dependency context Application inv: ports->collect( p | p oclIsKinfOf ProvidedPort )->isEmpty
context PortDependency inv: to.Interface == from.Interface
Fig. 9. Meta-model for components and ports
able without further explanation. This part of the system has a meta-model, too. It is shown in Fig. 11. The central concept is the System. A System consists of a number of Nodes, each Node itself consists of Containers. These contain a number of ComponentInstances which reference a Component as their type. On the other hand, Systems also contain a number of Connectors. These connect a provided and a required port of two ComponentInstances in order to allow these two instances to communicate through the respective ports. Finally, a Connector has a type, which implements one of several communication strategies, such as communication through CAN bus, through a local direct call, or through shared memory (Fig. 11). From these three different models, an overall model can be composed (this is done in code generator’s first phase, on AST level). This overall, merged model will subsequently be used as the input to the code generation phase of the generator. The overall model thus consists of several partial models describing different aspects of the entire system. However, in order to generate a useful system, the code generator (described in more detail below) must consider all the aspects at the same time. This requires a way to join the models. Technically, this is done by using different parser front-ends on the generator. However, we also need to make sure that the models can be joined logically. For example, in the component model, we must reference an interface defined in a text file. As a consequence, we use proxies [6] in the meta-model (called references). Figure 12 illustrates how the various models are joined logically. A component model uses InterfaceRefs to reference interfaces defined in the interface model. The system model uses the type attribute of ComponentInstances to refer to Components defined in the component model as well as PortRefs to reference the Ports defined as part of Components.
Model Driven Development for Component Infrastructures
155
<system name="weatherStation"> <node name="main"> <node name="inside"> <param name="unit" value="centigrade"/> <node name="outside"> <providedPort instance="tempOutside" port="measurementPort"> <requiredPort instance="controller" port="sensorsPort"> <providedPort instance="controller" port="controllerPort"> <requiredPort instance="tempOutside" port="controllerPort> ... ... ... ...
Fig. 10. Specification of nodes, instances and connectors
Tooling. In addition to a UML tool and a text editor, to create the various models shown earlier, the tooling mainly consists of a model-driven code generator, the “openArchitectureWare [oAW]” toolset in our case. The generator has three primary responsibilities: – Parse the various models and join them together; inconsistencies must be detected and reported – Verify the model against the meta-model of the domain. If the model does not conform to the domain meta-model, report errors. – In case the model is fine, generate the target code for the various platforms In the tradition of programming language compilers, the generator works in several phases, as illustrated in Fig. 13. In the first phase, one or several model parser front-
156
Markus Voelter, Christian Salzmann, and Michael Kircher
Provided PortRef
Required PortRef
1 source
ComponentRef
1 target
1 * Component Instance name : String
Connector
*
id : String
context Connector inv: source.interface == target.interface
* 1
type *
Connector Type
System
* Node
Container
{open}
DirectCall Connector
SharedMemory Connector
CAN Connector
Fig. 11. Deployment meta-model
Interfaces Interface
Components name name
Systems
InterfaceRef
Component
Port
name type name name
instance. type
PortRef
Fig. 12. Relationships among the meta-models
specification (model)
metamodel
instance of model parser
metamodel instance
written in terms of
template engine
templates
Fig. 13. Generator tool work flow
output code
Model Driven Development for Component Infrastructures
157
ends read the model. This results in the representation of the model as an object graph inside the code generator. The classes used to represent the object graph directly map to the domain meta-model. This is where meta-model constraints are checked (they are implemented as part of the meta-classes). In the second phase, code generation templates are used to actually generate output code. For illustrative purposes, in Fig. 14, we show a skeleton implementation of the Interface meta-class. The defined class in Fig. 14 is an ordinary Java class. We inherit from the UML::Class meta-class, because – it makes the ECInterface a model element, i.e., a valid generator meta-class, – it inherits the properties of UML::Classes, specifically the fact that it can have operations, that it is in a package, etc. – it allows us to use stereotypes on UML::Classes to represent instances of interfaces. public class ECInterface extends generatorframework.meta.uml.Class { } Fig. 14. Initial definition of the ECInterface meta-class
This is illustrated in Fig. 15. This same approach can be applied in many other circumstances, for example, to ensure that the port names of components are unique. Figure 16 provides another example. public class ECInterface extends generatorframework.meta.uml.Class { public String CheckConstraints() { Checks.assertEmpty( this, Attribute(), "must not have attributes." ); } // more ... } Fig. 15. ECInterface meta-class with constraints
Generating Code. Code generation is based on templates. A template is basically a piece of code with escapes in it that can access the model (represented as an object graph in the generator). The code in Fig. 17 represents a simple example that generates a C header file for a component implementation. Templates consist of two kinds of text: – The commands within the ”guillemots” are used to iterate over the model and thus to control code generation. – Text outside the ”guillemots”is code to be generated. It is literally copied into the generated code file. – Within the to-be-generated code the ”guillemots”-escape can be used to reference properties of the respective model object.
158
Markus Voelter, Christian Salzmann, and Michael Kircher
public class Component extends generatorframework.meta.Class { public String CheckConstraints() { Checks.assertEmpty( this, Operation(), "must not have attributes." ); Checks.assertEmpty( this, Generalization(), "must not have superclasses or subclasses." ); Checks.assertEmpty( this, Realization(), "must not implement any interface." ); Checks.assertUniqueNames( this, Port(), "a component’s ports must have unique names." ); } // more ... } Fig. 16. Constraints Checks for the Component meta-class
<> <> /**** Port Header File **** * * Type: <> * Name: <> * Component: <> * Interface: <> */ <> #ifndef <<portPrefixUpperCase>>_H #define <<portPrefixUpperCase>>_H #include <<middleware_types.h>> <<EXPAND Body(connector)>> #endif <<ENDLET>> <<ENDFILE>> <<ENDDEFINE>> <> <> <<EXPAND Util::ExternDecl(Component.Name"_"Name) FOREACH Interface.Operation>> <<ENDIF>> <<ENDDEFINE>> Fig. 17. Sample code generation Template
Parsing Input Models. Parsing of the input models is done using generator front-ends, as shown above. Since we need to parse several models for a certain generator run, we use the composite design pattern [6] to build a front-end that itself contains front-ends for the various models we need; Figure 18 provides an example. How the various front-
Model Driven Development for Component Infrastructures
159
package util; public class EmbeddedComponentsInstantiator extends CompositeInstantiator { private String systemConfFile = System.getProperty("EC.SYSTEM"); private String interfaceFile = System.getProperty("EC.INTERFACE"); private String componentsFile = System.getProperty("EC.COMPONENTS"); public EmbeddedComponentsInstantiator () { // a front-end that reads the UML model add( new XMIInstantiator( componentsFile ) ); // a front-end that reads the XML system spec // use ecMetamodel as package prefix when // attempting to load meta-model classes add( new XMLInstantiator( systemFile, "ecMetamodel" ) ); // a front-end that reads the textual spec // for the interfaces add( new JCCInstantiator( interfaceFile ) ); } } Fig. 18. Instantiator that reads the various models
ends work internally is beyond the scope of the chapter. Basically, they read the models and create an object graph from them. Overall Setup. Since we use several aspect models with different concrete syntaxes, the actual setup is somewhat more complicated, as shown in Fig. 19. The interfaces are represented with a textual DSL. Components are represented using profiled UML models; the deployment (or systems) are described using XML. All these different partial models refer to their respective parts of the meta-model. The complete meta-model is implemented as Java classes as illustrated above, independent of their concrete syntax. So, while a model is represented in different files using different concrete syntax, all the model parts are represented as Java objects once they have been parsed by the respective instantiator, i.e., parser or front-end. This is also the place where the references among the model parts are dereferenced. The proxy is supplied with a reference to its delegate object. At this stage, the generator back-end uses the code generation templates to generate the output.
160
Markus Voelter, Christian Salzmann, and Michael Kircher
Metamodel
concrete syntax
DSL (Interfaces)
meta model
Interfaces
metamodel
Components
System
semantics
UML Profile
Generator Meta-Metamodel (Java)
<>
metamodel <>
DSL (components) concrete syntax
<>
Domain Architecture ASCII
XML Instantiator Model Interfaces
Components
System
XMI Instantiator
Instantiated Metamodel
Textual Instantiator DSL (Systems) concrete syntax
Generator Backend
XML Transformations
Platform
Manually Coded Application Logic
Generated Code Application
Fig. 19. Overall tool chain and artifacts
3.1 Resource Optimization Due to the generative approach we were able to conduct experiments concerning optimized code generation for efficient resource allocation. Since this is one of the key requirements for the embedded world we invested considerable efforts into this topic. In our experiments we reached an additional memory allocation for middleware based embedded software of 1KB of ROM and 300 Bytes of RAM for an typical event based communication pattern in the automotive domain. The performance of the middlewarebased software was not significantly slowed down either, which fortifies the practicability of our approach. For reasons of brevity, more details are beyond the scope of this chapter.
4
Conclusions
Advantages of this Approach. A model-driven approach to software development yields a number of advantages, some of them especially important in the context of embedded systems. The following list briefly explains some of these - the order is not significant. – First of all, developer productivity is improved, since repetitive, aspects need not be coded manually over and over again. Many target platforms (such as real-time operating systems) require a lot of ”bookkeeping” code and configurations that fall in this category.
Model Driven Development for Component Infrastructures
161
– The models capture knowledge about the application structure and functionality much more explicitly, free from “implementation clutter”. – Different concerns are separated explicitly. Each of them (or subsets) are model led using their own model, making them explicit and thus more tractable, easier to change and potentially reusable. – Communication with the various stakeholders is simplified, since each stakeholder need only take a look at the models of the aspect they are interested in. – It is easier to react on changes, since the change often affects one piece of code only. Code (and other artifacts) that needs to change as a consequence is simply regenerated. – The transformations capture design knowledge for the target platform. They thus serve as a form of “codified best practices”. – Reuse (of the platform, DSLs, etc.) is made possible. – Typically, the software architecture improves since the definition of a stringent software and system architecture is necessary. Otherwise code cannot be generated efficiently. – Code quality is improved, since most of it is generated from templates. It is easier to ensure templates generate high-quality code than to assure this for each piece of code manually. – Portability is simplified. If a different platform should be used, only a new set of templates needs to be created (which, of course, can be non-trivial, too). – MDSD increases flexibility without inducing runtime overhead. The generated code can be strongly typed, maybe even relying on static memory allocation only, while flexibility is still there. The flexibility is realized at generation time and compiletime, not during runtime. – Since the mapping from models to implementation code is determined by templates (i.e. is the same each time), the quality of service characteristics of the implementation (timing behavior, memory consumption, performance) are known to some extent. This can be a big advantage in embedded system development. – Since error messages are not just reported by the compiler when compiling the implementation code, but also by the transformer/generator when reading the models, the error messages can be much more expressive. A model contains much more domain semantics that can be reported as an error messages compared to implementation code. Prejudices. Since we keep hearing the same prejudices against MDSD over and over again, here are some simple statements that developers should consider. – MDSD does not require UML, use any DSL that is suitable for your domain. – Generated code can be very readable, can include comments, etc. Generated code is often even easier to understand than manually written code, since it is more structured (it is based on the “rules” in the transformations) – MDSD does not require a waterfall process. MDSD works well using incremental, iterative processes. This is true for application development as well as for the development of the domain architecture. – MDSD is quite agile, since - once a domain architecture is in place - it allows to come up with running applications very quickly.
162
Markus Voelter, Christian Salzmann, and Michael Kircher
Challenges. There is no “free lunch”. So, even while model-driven software development has a great number of benefits, there are also some drawbacks/challenges that need to be addressed. For reasons of space we cannot go into details, and we recommend reading [21]. – The development process has to take into account the two development paths: domain architecture development and application development. An approach that has worked in practice is to have two kinds of teams (domain architecture and application development). The application development teams play the role of the customers for the domain architecture development. An iterative process with regular “deliverables” will minimize the problems. – There are no universal standards yet. Using MDSD will always tie the development to a number of tools. With the OMG’s MDA standard, this should be less of a problem in the future. Today the impact of the problem can be minimized by relying on open source tools. – The concepts and tools need to be understood. Specifically for “traditional” embedded developers this can be quite a ’cultural shock’. Terms like meta-model, DSL, etc. are often not very well-known, and not readily accepted. The best approach to attack this problem is to run an example-driven education effort that first convinces people of the benefits of the approach, and then goes into some details of the concepts behind it. Practical Experience and Related Work. Model-Driven Software Development has a long success history, although it did not always appear under that name. MDA [14], Generative Programming [3], Domain-Specific Modeling [2] and Domain-Driven Development are all either different names for MDSD or special “flavors” of the general MDSD approach. Specifically, MDA is gaining more and more importance in the enterprise software development area. In the embedded world, generating code from models is also a well-known approach, although use of such code in production systems is only slowly being adapted. Tools like ASCET [4], Matlab/Simulink [9] or Statemate [8] are well-known to embedded developers. Using MDSD to implement (component/container-based) middleware is a rather novel approach, though. The authors, as well as other people known to the authors have been using the approach with overwhelming success in the domains such as automotive, mobile phones or scientific computing. The productivity boosts promised by MDSD have largely been realized. Acceptance with developers was good after they had seen that they could understand the generated code, and that it also was efficient. In the automotive domain, the AUTOSAR consortium [1] is currently in the process of standardizing an architecture and process conceptually similar to the one explained in this chapter.
References 1. The AUTOSAR Consortium. AUTOSAR homepage. http://www.autosar.org/ 2. Domain Specific Modelling Fourm. http://www.dsmforum.org/
Model Driven Development for Component Infrastructures 3. 4. 5. 6. 7.
8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
18. 19. 20. 21. 22.
163
U. Eisenecker, K. Czarnecki. Generative Programming. Addison-Wesley 2000 ETAS Group, ASCET Homepage. http://en.etasgroup.com/products/ascet sd/index.shtml T. Ewald. Transactional COM+: Building Scalable Applications. Addison-Wesley, 2001 Gamma, Helm, Johnson, Vlissides. Design Patterns. Addison-Wesley 1995 K. Henney. Inside Requirements. Programmer’s Workshop column in Application Development Advisor, May/June 2003 (http://www.two-sdg.demon.co.uk/curbralan/papers/InsideRequirements.pdf) I-Logix, Statemate homepage. http://www.ilogix.com/statemate/statemate.cfm The Mathworks, Matlab Homepage. http://www.mathworks.com/ The openArchitectureWare generator framework. http://sourceforge.net/projects/architecturware/ Object Management Group, Minimum CORBA. http://www.omg.org/technology, 2004 Object Management Group, Real-Time CORBA. http://www.omg.org/technology, 2004 Object Management Group, CORBA Component Model Specification (CCM). http://www.omg.org/technology, 2004 OMG, Model-Driven Architecture (MDA). http://www.omg.org/mda OSGi Alliance, http://www.osgi.org, 2004 D.L. Parnas. On the Criteria To Be Used in Decomposing Systems into Modules. Communications of the ACM, Vol. 15, No. 12, December 1972 C. Schwanninger, E.Wuchner, and M. Kircher. Encapsulating Cross-Cutting Concerns in System Software, Workshop on Aspects, Components, and Patterns for Infrastructure Software, AOSD 2004 conference, Lancaster, UK, March 22-26, 2004 Sun Microsystems, Java2 Enterprise Edition (J2EE). http://java.sun.com/j2ee/, 2004 M. Voelter, M. Kircher, U. Zdun. Remoting Patterns: Foundations of Enterprise. Internet and Realtime Distributed Object Middleware, John Wiley & Sons, 2004 M. Voelter. MDSD Tutorial, http://www.voelter.de/services/mdsd-tutorial.html M. Voelter, T. Stahl, J. Bettin. Modellgetriebene Softwareentwicklung. dPunkt, to be published in 2004; an English version is in preparation. M. Voelter, A. Schmid, E. Wolff. Server Component Patterns - Component Infrastructures Illustrated with EJB, John Wiley & Sons, 2002
A Component Framework for Consumer Electronics Middleware Johan Muskens, Michel R.V. Chaudron, and Johan J. Lukkien Department of Mathematics and Computer Science, Technische Universiteit Eindhoven, P.O. Box 513, 5600 MB Eindhoven, The Netherlands {J.Muskens,M.R.V.Chaudron,J.J.Lukkien}@tue.nl Abstract. Developers of Consumer Electronics (CE) devices face the problem of the ever increasing amount of software that needs to be developed. At the same time the time to market of their products needs to decrease. In other domains Component Based software development aids in solving the resulting problems. However, existing component frameworks fail to meet some of the requirements specific for the CE domain. In order to improve this situation a component-based framework has been developed. In this chapter we describe this framework and motivate the architectural choices. These choices are influenced by the requirements on the framework. Some of these requirements are specific for the CE domain, others are more general.
1
Introduction
1.1 Background The component framework presented in this chapter has been developed in the context of the Robocop project [14]. The aim of Robocop is to define an open, component-based framework for the middleware layer in high-volume consumer electronic devices. The framework enables robust and reliable operation, upgrading, extension, and component trading. The appliances targeted by Robocop are consumer devices such as mobile phones, set-top boxes, DVD players, and network gateways. 1.2 Motivation With the increasing capacities of Consumer Electronics (CE) devices the amount and complexity of the software in these devices is growing rapidly. The software is determining to a large extent what a device is or feels like. Producers of these devices face the challenge of developing this continually increasing amount of software while the time-to-market should preferably decrease. Component Based Software Engineering (CBSE) promises to aid in solving the resulting problems. Key success factors generally attributed to CBSE are: 1. the possibility of re-use at the level of components; 2. the support for component composition such as to build new applications; 3. the promise of improved reliability because of explicit specifications and subsequent convergence of individual components to comply with their specifications. C. Atkinson et al. (Eds.): Component-Based Software Development, LNCS 3778, pp. 164–184, 2005. c Springer-Verlag Berlin Heidelberg 2005
A Component Framework for Consumer Electronics Middleware
165
In short, the success should come from being able to use software components in a similar way as we do their hardware counterparts. This includes the way components are paid for or otherwise traded. Presenting a piece of software as a component should therefore really represent a gain in terms of abstraction. In particular this calls for elimination of as many dependencies as possible and, as far as dependencies exist, to specify them explicitly and to bring them to the component interface. Several component frameworks have been developed over the last years, all addressing to some extent these three success factors. However there are also a lot of differences between the existing component frameworks, mainly due to the different requirements in the individual problem domains. Successful adoption will depend on how well the three issues mentioned above are addressed plus some other factors depending on domain specific requirements. These factors are discussed in section 2. Existing component frameworks did not meet some of the requirements particular important for the CE domain: – – – –
Robust and reliable operation Run-time upgrading and extension Low resource footprint Support for component trading
In order to improve this situation a number of European companies and universities have joined forces in an effort to develop a component-based framework for the middleware layer of network-enabled consumer devices, addressing several of the points above. This work was done in the context of the Robocop project [14] which was subsequently used as input for the Space4U project [19]. In this chapter we describe the approach defined by these projects. 1.3 Overview This chapter is structured as follows. Section 2 describes the project context and the target requirements set out for the component framework. It also relates these to existing component frameworks. Section 3 discusses the architecture and how it relates to the targets. Section 4 presents the download framework which enables runtime upgrading and extensibility of a component assembly. Concluding remarks and related work follow in section 5.
2
Background and Requirements
In this section, we discuss the most important requirements for the Robocop component framework and how they influenced the architecture. There are quite some differences between existing component frameworks. These differences are due to different requirements in the targeted application domains. Figure 1 shows the ’features’ provided by component frameworks. Some of the features are mandatory (marked gray), for example Communication. Some of the features are optional (marked white), for example Language independence.
166
Johan Muskens, Michel R.V. Chaudron, and Johan J. Lukkien
Fig. 1. Component framework features (grey=mandatory, white=optional)
2.1 Common Features of Component Frameworks In this section we discuss some features that are common to many component frameworks. We distinguish the categories of features depicted in Figure 1: – Infrastructure: All component frameworks provide an infrastructure. With infrastructure we mean mechanisms for component instantiation, binding, communication, distribution of components over hardware, announcing capabilities of components and discovery of desired components. These mechanisms are needed to create a composition of components that can cooperate in performing a certain task. • Instantiation: A Component Instance is the instantiation of a Component implementation at a specific location in the memory of a device. The relation between a component instance and a component implementation is the same as that between an object and a class. Once in operation, each component instance may create and manage its own data.
A Component Framework for Consumer Electronics Middleware
•
•
•
•
167
There is a number of different ways in which the instantiation can be achieved. The distinguishing factor is the element in the architecture that controls the instantiation. In existing component frameworks instantiation is typically controlled either by the component infrastructure, a component container, or a component factory. Binding: In the context of component-based systems, binding is the creation of a link between multiple component instances. Binding can be done at designtime, compile-time and run-time. At design time and compile time the binding is done by the developer. The link between component instances may be used for communication and navigation between component instances. The distinguishing aspect of the different ways in which binding can be organized in a component framework is the party that initiates the binding. We distinguish 1st party binding and 3rd party binding. In case of 1st party binding a component instance binds itself to another component. In case of 3rd party binding a binding between component instances is created by a party not part of any of the subjects of the binding. Communication: To facilitate communication between components, a component infrastructure must provide some interaction mechanisms. The interaction styles supported are partially defined by the architectural styles that the component framework supports. The communication styles that a component infrastructure supports determine a number of the quality properties that systems built using these components can obtain. For instance, some communication styles favor efficiency over flexibility. The most common style is request-response as implemented by procedure/ method-calling. This style is the basis of all imperative programming languages and does not require any special facilities from the component-model. The next most commonly supported interaction style is events. Typically events are used for notification; e.g. of exceptions. Often this style is used in conjunction with request-response interaction. Publish/subscribe can be seen as a generalization of events to distributed systems. Component frameworks that are aimed at supporting multi-media processing often provide mechanisms that support streaming as interaction style. Discovery: Every component framework needs to define a mechanism by which presence of components in the system can be discovered. Such a Discovery mechanism is needed to support late and dynamic binding. Discovery mechanisms are the most prominent in component frameworks with run-time changes/ binding. In systems with design-time or compile-time binding the discovery is typically guided by the designer/developer. In systems with run-time adaptation a registry is commonly used for the discovery of components. Announcement of capabilities: Usually the capabilities of a component are expressed by a number of interfaces that are implemented by a component. The way in which interfaces are specified differ between component frameworks. Some component frameworks introduce a special language for expressing interfaces, others use programming languages to specify the interfaces.
168
Johan Muskens, Michel R.V. Chaudron, and Johan J. Lukkien
– Component and application development support: Components and applications are developed using a component framework. The component frameworks have different development features. For example COM[1] and .NET support programming language independent development of components, whereas Enterprise Java Beans (EJB) [15] supports platform independence. • Language independence: Component frameworks often support component development in different programming languages. In order to achieve interoperability between the components developed in the different programming languages the interfaces must be specified in a manner that is independent of the programming language. Usually an interface description language (IDL) is used for this purpose. • Platform independence: Some component frameworks offer platform independence; this means that executable components can be executed on different platforms. This is usually achieved using a intermediate language. This intermediate language can be interpreted at run-time or compiled by a Just In Time (JIT) compiler. • Analysis support: During development of individual components and applications it can be desirable to have analysis techniques. These techniques can be used to prove correctness of the software [7], or to predict extra functional properties [5, 11]. – Support for upgrading and extension: Software evolves over time. The value and the economic life-time of a device and the software on it can be increased by supporting upgrading and extension of the software. Component frameworks can support upgrading and extension at different stages of the software life-cycle (designtime, compile-time, run-time, etc.). The current trend is that upgrading and extension is shifting more and more to the run-time phase of the software life-cycle. In this way, devices can be customized to the needs of a consumer in the period that it is owned and used by the consumer. – Support for extra functional properties: In conjunction to functional requirement, software also needs to satisfy extra functional properties like performance, security and reliability. Which extra functional properties are important for a component framework highly depends on the problem domain that it targets. The extra functional properties that are important can introduce all kinds of restrictions on a component framework. For example, when a low resource footprint is important this can exclude the use of virtual machines for interpretation of programs. – Support for trading: In order to gain the benefits of reuse and trading, a large market of components and hence component-producers is needed. The fact that components should be traded has technical implications for the component framework. 2.2 Focus for the CE Domain The requirements on a specific component framework highly depend on the problem domain in which the framework will be used. Below we will discuss the features that were specifically important for the consumer electronic domain and, consequently for Robocop:
A Component Framework for Consumer Electronics Middleware
169
– Upgradability and Extensibility: Improvements in software are developed in rapid succession. To extend the economic lifetime of devices, they should be able to upgrade software components with improved versions. In addition to upgradability, there is a need to be able to add new functionality to a device. The mechanism needed for uploading new functionality can be largely shared with that for upgrading. – Robustness and Reliability: In the area of consumer electronics, it is unacceptable that systems break down. Building stable systems requires special effort during their design [2] as well as special mechanisms at run-time [3, 12, 17]. – Low resource footprint: Consumer electronic systems must be made cost effective. This results in limited available resources on the targeted devices and imposes strict resource constraints on the components and the infrastructure. – Trading: One of the main reasons for component based development is decreasing time to market and increasing productivity by increase the (re-)use of existing components. In order to maximally exploit (re-)using existing components, it should be possible to use third-party components. This requires support for trading. In Robocop this requirement highly influenced the choice for component packaging. In section 3 we present the architecture and motivate how it was constructed driven by the requirements mentioned above. Section 4 presents a download framework that can be used in addition to the architecture to realize support for run-time upgrading and extension. In the remainder of this section we will discuss which features are realized by existing component frameworks and how they are realized. Table 1 shows which features are supported by the existing component frameworks. This Figure shows that there is a core set of features offered by all existing component frameworks. There are also a number of features at which component frameworks distinguish themselves. The features that where especially important for the CE domain are highlighted. Infrastructure features are offered by all component frameworks. These features like instantiation, binding and communication are needed to create a system out of components that cooperate to achieve a certain goal of the system. The largest differences between the individual component frameworks can be found at the level of support for extra functional properties, flexible upgrading/extension, development and trading. The need for these features, and therefore the selection of the component framework, depends highly on the target problem domain. In Table 2 we show some existing solutions for component framework features. Most features cannot be dealt within isolation. Solutions for one feature can exclude, or negatively influence, other features. This means there is no such thing as a free lunch. For example platform independence is usually realized using an interpreted or intermediate language, which negatively influences performance. Each component framework needs to make a trade-off between the different features.
3
Architecture of Framework and Component Model
In this section we will discuss the architecture and component model that has been developed during the Robocop project. We will discuss the Robocop component lifecycle, Robocop component packaging, executable component structure and the runtime execution model.
170
Johan Muskens, Michel R.V. Chaudron, and Johan J. Lukkien
Table 1. Features of existing component frameworks (features particular important for the CE domain are highlighted) Component Framework COM DCOM EJB .NET CORBA Koala Robocop PECOS AutoComp Infrastructure Instantiation + + + + + + + + + Binding + + + + + + + + + Communication + + + + + + + + + Distribution + + + + Announcement of Cap. + + + + + + + + + Discovery + + + + + Extra Functional Properties Robust. & Reliab. + + Security +/- +/Low resources + + + + + + Upgrading and Extension Design time + + + + + + + + + Compile time + + + + + + + + + Runtime + + + + + + Development Support Language independ. + + + +/+/+/Platform independ. + + Analysis techniques + +/Trading +
Table 2. Common solution for component framework features (features particular important for the CE domain are highlighted) Infrastructure Instantiation by: 1) Component Container 2) Component Factory 3) Infrastructure Binding 1) 1st party binding 2) 2nd party binding 3) 3rd party binding Communication 1) (Remote) procedure calls 2) Events 3) Publish subscribe 4) Blackboard communication 5) Streaming Distribution 1) Use location transparent communication and instantiation mechanisms Announcement of Cap. 1) Provided interfaces 2) Component descriptors Discovery 1) Registry 2) Publish discovery files on dedicated servers Extra Functional Properties Robust. & Reliab. 1) Explicit and clear dependencies/contracts Security 1) Provide mechanisms for secure communication, authentication, etc. Low resources 1) No intermediate/interpreted language, minimal runtime environment, etc. Upgrading and Extension Design time 1) Always possible Compile time 1) Compile time substitution of components Runtime 1) Binary interfaces and run-time (un/re)binding of instances Development Support Language independ. 1) Use interface description language Platform independ. 1) interpreted language 2) intermediate language Analysis techniques 1) Use models describing properties of a component used for analysis 2) Component models for RT app Trading 1) Use models to describe the interests of the different stakeholders Instantiation
A Component Framework for Consumer Electronics Middleware
171
3.1 Component Life-Cycle The Robocop life-cycle addresses Robocop components from development time, until run-time instantiation of services. During their life-cycle the Robocop components manifest themselves in different ways. Once developed, Robocop components are published in a repository. Published Robocop components can be generic in the sense that they still need to be tailored to run on a specific platform. This may involve e.g. compilation and linking for that platform. At this time the Robocop component can be loaded on the specific target. When a Robocop component is resident on the target it needs to be registered. After registration the component is ready to be used. The component can be loaded and services implemented by the component can be instantiated. The life-cycle is depicted in Figure 2.
Fig. 2. Component Life-cycle
3.2 Component Packaging Unlike Szyperski [20], we found that the unit of trading is not the same entity as the unit of deployment. We distinguish between Robocop components which are the units of trading and configuration management and executable components which are executable code that is physically present in the memory of a device during execution. During the component life-cycle a number of stakeholders are involved. These stakeholders are interested in different aspects of a component. A user of a device is interested in executable code for his device. A system developer is interested in documentation as well, maybe even the source code. Some of the stakeholders are not interested in the executable code, an system architect can decide to use a component based on a specification or simulation model. The Robocop component model can support all these different uses by using different models for the individual aspects. Usage of multiple different models is very useful in component trading, since a number of stakeholders can be involved with different concerns. Furthermore we found that it is often desirable to trade more than executable code (documentation, sources, etc).
172
Johan Muskens, Michel R.V. Chaudron, and Johan J. Lukkien
A Robocop component (component package) is a set of related models (see Figure 5). The set of models is not fixed, but may be extended if needed. Any particular Robocop component must consist of one or more of these models. The information in the models can be human-oriented or machine-oriented. An example of human-oriented models is documentation. Typically one of the machine-oriented models will be the executable component. In order to achieve technical interoperability of executable components, many issues concerning the run-time execution model need to be defined. More details about this are given in the remainder of this chapter. Other examples of (machine-oriented) models are: – A simulation model. This type of model could be of use during design of an application to analyze the interaction between components. Such models should be supplied by the developers of the component. Colored Petri-Nets [8] are a candidate for this type of model. – A resource model. This type of model describes the resource needs of a component. This can be used both during design and during the dynamic upgrading to assess whether a configuration of components fits within the available resources [4]. – A functional model. This type of model specifies the functionality of the component. A candidate specification language for this type of model is Z [18]. – Interface model (Robocop IDL). This model describes the functions provided and required by the component. Based on this model it is possible to check whether or not all dependencies in a configuration can be resolved. An example of a Robocop IDL description can be found in Figure 3. Each Robocop component contains a globally unique identifier (GUID). This is needed for configuration management. Additionally each model is also identified by a GUID. Figure 6 shows two examples of different Robocop components. Robocop component 1 contains one executable, one IDL, one simulation model and one resource model (see also Figure 4). Robocop component 2 contains two executables, two resource models, one IDL and one simulation model. Multiple executables implementing the same IDL can occur when a Robocop component provides executables for multiple platforms. In that case the resource consumption of the executables can be different, thus different resource models for the executables are included. Relations are used to indicate which resource model is related to a specific executable model. 3.3 Executable Component Structure The Robocop component package, as described in section 3.2, is applicable to any type of model. Therefore it can be applied to any kind of binaries: e.g. static or shared libraries, COM components, or even complete executables. However, the executable view of the component determines how applications can be built from these components. This view is different for the various executable models, and determines at which abstraction level systems are composed. We will now discuss the executable component model. In the remainder of this chapter we will use the term component for the executable component model. For the Robocop component package we will use the term Robocop component. Within Robocop, a
A Component Framework for Consumer Electronics Middleware module Printers { interface IPrinter {6083D1C4-0643-4ce6-B1EA-66467A65840C} { void printLn( in string line ); ... }; service SPrinter {68CCA7C4-C24A-4bee-9A5E-AF79B806483D} { provides { IPrinter printer; }; };
service SPlotter {6782A98F-06D5-436f-AE89-0D6E064AB047} complies SPrinter { provides { IPrinter printer; }; }; component CLaserPrinter {B7621504-7FCD-42ea-BCF0-90F67FE557C7} { provides SPlotter; }; };
Fig. 3. Example Robocop IDL description
<models> <model guid=’6083D1C4-0643-4ce6-B1EA-66467A65840C’ type=’resource’ location=’./component1.rm’/> <model guid=’7EDB39A0-5326-D811-87C6-0008744C31AC’ type=’ridl’ location=’./component1.ridl’/> <model guid=’D9A356E2-5873-40b8-8D95-EA2F5F4DE692’ type=’simulation’ location=’./component1.sim’/> <model guid=’28B4E880-AF84-4c86-B5FD-FC82A9FE1746’ type=’executable’ location=’./component1.so.0.0.0’/>
Fig. 4. Example Robocop component description
173
174
Johan Muskens, Michel R.V. Chaudron, and Johan J. Lukkien
Fig. 5. A Robocop component is a set of models
Fig. 6. An example Robocop Component
component contains a number of services. These services implement the functionality of the component. Each service offers this functionality through a set of named interfaces (ports). The interfaces used are COM [1] interfaces. The v-table approach used in these interfaces results in little communication overhead. The overhead is limited to one pointer dereference per operation invocation on an interface. Services have explicit dependencies through a set of named requires interfaces (ports). All interfaces have a standard error return mechanism and there is a facility to pass additional error information from a service to the client using the service. The explicit dependencies, and the error return mechanisms enable the development of a robust and reliable system. Every component implements at least one service manager. This service manager is used to instantiate services and initialize attributes of the created service instances. Each component has a fixed entry point, this entry point is used to get a service manager. Figure 7 illustrates the structure of the executable component.
A Component Framework for Consumer Electronics Middleware
175
The services implemented by components can be specified in a Robocop IDL (RIDL). RIDL is inspired by the CORBA [10] IDL. For each service the provided ports and required ports are specified as well as the public attributes of a service. RIDL allows a developer to specify that a service is compliant with another. Service X is compliant with another service Y at a structural level means: service X provides at least the named interfaces that are provided by service Y and service X requires the same named interfaces that are required by service Y. In this case code written to use service Y will also work using service X.
Fig. 7. Executable component
3.4 Run-Time Execution Model In this section we will present the run-time execution model used in Robocop. The Robocop runtime environment (RRE) has a key role in this execution model. In order to enable a low resource footprint the RRE kept minimal and can be extended with optional frameworks. The RRE supports registration of components / services and instantiation of services. All other features, like for example resource management, are optional. If these features are not used there is no resource overhead. Next we will discuss how the RRE stores the component information in a registry, how components are registered, how services of a component can be instantiated and how the functionality provided by a service can be used. The RRE Registry. The core responsibility of the RRE is to handle requests for service instances (and service managers). The desired service instance is identified by its GUID and the RRE needs to find a component that can deliver instances of the specified service. To that end, the RRE maintains sufficient information in a database (referred to as the registry). This registry contains three tables (see Figure 8). The first table contains the association between the component GUID and the physical location of the component. This
176
Johan Muskens, Michel R.V. Chaudron, and Johan J. Lukkien
physical location is formatted in a URL-like fashion. When the component is stored in a file system, the location will be file:// followed by the actual file name. For systems that store the component in a direct addressable memory space the location looks like address:// followed by the physical address containing the component code. The second table contains the relation between the component GUID and the service GUID. As indicated earlier, a component can implement multiple services. It is equally well possible that a particular service is implemented in more than one component. When multiple components are registered in this table for the same service, the implementation of the RRE is free to choose any of these components when there is a request for a service. The third table contains the complies relation for services. When a service is requested, the RRE can use this table to find compliant services that can be used to satisfy the request. In Figure 9 below, there is a graphical representation of a sample registry contents (the same contents as in Figure 8). The solid lines indicate the complies relation between services: e.g. the SPixelScreen service complies to the SCharacterScreen. So, the SPixelScreen can be used when a SCharacterScreen service is needed. The dotted lines indicate which component implements which service. In the example the CScreen component implements both the SPixelScreen and the SCharacterScreen service.
Fig. 8. The RRE registry with example contents
Fig. 9. Graphical representation of registry contents
A Component Framework for Consumer Electronics Middleware
177
When the SCharacterScreen is requested, the RRE will activate the CScreen component and send it a request to instantiate the SCharacterScreen service. When the SOutput service is requested, there is no component registered that can instantiate the service. The RRE might activate the CLasterPrinter, and request it to instantiate the SPlotter service, or it might activate the CScreen component, and request it to instantiate the SPixelScreen or the SCharacterScreen service. The actual component and service used, is left to the RRE. Registration. The behavior of the RRE is driven by the information in the registry. When components are resident on a system, information needs to be added to the registry (registration) in order for the RRE to be able to use the component and the implemented services. The components themselves play a passive role in this registration process: the system configurator controls the registration process. This role can be exercised by a person assembling the system or can be automated to some degree. We do not let components themselves be in control of this registration process, because it is a system-wide aspect, i.e. can only be done well with system-wide knowledge. For example, when two components can offer the same service, neither of the components can decide which one should be used in the system. Instead of using an overwrite strategy (e.g. the component most recently registered takes precedence), this choice needs to be made external to the individual components. The registration information needed is either directly known to the system configurator (e.g. location of the component on the target) or can be obtained from the RIDL description (component GUID, service GUIDs, complies relations) or via one of the other models in the Robocop Component. Note that a system configurator may decide not to use all of the information for the registration. The configurator can decide not to register all of the services, or ignore some of the complies relations. This is useful, because the complies relation on the RIDL level only expresses substitutability at a structural level. This does not address extra-functional aspects. The resource usage of the service may exceed what the system is willing to spend, and thus it makes sense not to register this service on the device. Instantiating Services. Clients can get a Service Instance by calling functions in the RRE. Instantiating a service is a multi step process, with shared responsibility between the RRE and the Component that can instantiate the Service. The process can be broken down in three steps: – Locating and activating the component – Retrieving the Service Manager – Retrieving the Service Instance Locating a component has been described earlier in this section (The RRE Registry). Once the correct component is located, the RRE will activate the component. This entails loading the component in OS terms, which means mapping the component into (executable) memory space; e.g. from the file system where the component is stored. Each component supports a fixed entry point. This entry point is used to get a service manager, and to check if a component can be unloaded.
178
Johan Muskens, Michel R.V. Chaudron, and Johan J. Lukkien
The Service Manager is part of the factory pattern for the services [6]. The service instances are created using the Service manager. Additionally, the service manager can be used to pre-set attributes of the service instances. Interacting with Service Instances. As described earlier, the service instances expose their functionality through named interfaces (ports). Each service implements the rcIService interface. It supports methods to retrieve the provided ports of service instances, and the bindTo and the unBind method to bind and unbind interface instances (provided ports of a different service instance) to requires ports of the service. The methods to do this are not strongly typed. In order to have a type safe mechanism, each service definition gives rise to an interface definition that is implemented by the service. This interface is a descendant of the rcIService interface. For each provided port, there is a specific method GetProvides in this interface that returns an interface of the type as mentioned in the service definition. The is the actual port name used in the service definition. Similarly, there are two methods in this interface to bind and unbind requires interfaces: the bindTo and unBind. In general, a service instance can not operate when its requires ports are not bound. In order for a client to signal that it has finished binding its required interfaces and wants to start using the service instance, the client calls the Start method. This method can be called either in the rcIService interface, or in the service specific descendant of that interface.
4
Download Framework
In this section we will present the download framework. The download framework has been developed as part of Robocop to fulfill the requirements concerning upgrading and extension. During the period that a CE device is owned by a user there is a need for improving and extending the software on the device in order to extend the economic lifetime of the device. The download framework is responsible for transferring Robocop components from a repository to a target and their registration with the RRE. In the component life cycle (see 3.1) this corresponds to the phases tailoring, target loading and registration. Within the Robocop project little work has been done on tailoring of components and automated registration of components. For locating Robocop components and target loading a solution has been developed that consists of the following conceptual roles: – – – – –
Initiator Locator Decider Repository Target
At run-time these roles can run at different devices. The target role runs at the terminal to which a component is transferred. The roles are discussed in detail in sections 4.1, 4.2, 4.3, 4.4, and 4.5 respectively. Within the download process we distinguish the following phases:
A Component Framework for Consumer Electronics Middleware
– – – –
179
Location of entities that participate in the download process Decision about the feasibility of a given download Target loading, the actual transfer of the Robocop component to the device. Confirmation of the download and registration of the downloaded component at the RRE.
All roles that participate in the download process communicate using gSOAP [21]. The gSOAP technology enables communication between the different roles even though they might be deployed on different hardware nodes. The download process is depicted in Figure 10. Next, we will discuss the individual roles in more detail.
Fig. 10. The download procedure
4.1 Initiator Role In this subsection we will discuss the initiator role in the download process. The main responsibility of the initiator is verifying the presence of all the entities in the download process and coordinating the download process. The initiator role can be implemented by a component on the target device. It is also possible that the initiator role is implemented on a different device, this can be useful in case of remote maintenance. An initiator starts the download process in response to some external event. Such an event may be triggered for instance by the component upgrading process or as result of a change in user preference settings. An initiator can participate in multiple download processes at the same time. The download processes are identified by the GUID of the Robocop component that needs to be transferred and a unique name for the target. In order to verify the presence of all the entities and to coordinate the download process, the initiator needs the addresses of all the entities. To that end the initiator has
180
Johan Muskens, Michel R.V. Chaudron, and Johan J. Lukkien
a table with the addresses of one or more locators ordered by priority or proximity (see Figure 11). Using one of these locators, the initiator can retrieve the addresses of the other entities involved in the download process.
Fig. 11. Example of deployed download roles
4.2 Locator Role In the process of downloading a Robocop component from a repository to a target, the locator is responsible for locating a repository which is to provide the Robocop component to be downloaded, and the target where the Robocop component is to be transferred. In addition, the locator is responsible for locating the entity playing the decider role which is responsible for deciding whether the given download will take place or not. Hence, the locator provides the addresses which can be used to contact the repository, the target and the decider. Although the functionality of the locator does not depend on the interconnection network used for the download, the latter determines the type of the address returned by the locator. For the sake of precision, the term address in the above definitions stands for a comprehensive descriptor of an entity in an interconnection network which allows its holder to contact that entity. For example, in an all-IP network this descriptor is the IP address of the entity. In order to be able to provide the address information the locator maintains three tables. The first table is used to store all the registered targets, the second table is used to store the registered repositories, and the last is used to store the registered deciders. For
A Component Framework for Consumer Electronics Middleware
181
all these entities the name and address is stored. For the deciders also the decider class is stored which can be used to distinguish different type of deciders. Information about which repository contains which components is not stored at the locator. When the locator receives a request to locate a repository that contains a Robocop component it will query all known repositories for that specific Robocop component using its GUID. A locator provides a registration mechanism for targets, deciders and repositories. Entities implementing these roles use this mechanism to register themselves at a number of locators. 4.3 Decider Role A decider awaits a request (typically from an initiator) to determine if it is possible that a given Robocop component can be downloaded onto and subsequently registered on a given target. This is done by investigating and matching the acquired profiles of repository resident components and profiles of the target. After this matching process the decider will notify the involved parties of the decision. The decision procedures performed by the decider may be sophisticated. For example, in the case of components that are subject to a real-time scheduling policy, a schedulability-test may be performed. Within the Robocop project little work has been done on defining advanced decision procedures. 4.4 Repository Role The main responsibility of a repository is the storage of Robocop components in order to be transferred to a target. It is also possible that the repository tailors Robocop components; for example by compiling a component for a specific platform. During the download process the repository is first queried for the availability of a Robocop component. At a later phase in the download process the decider will request the component profile. After the repository has received a positive decision from the decider, it must prepare the specific Robocop component to be transferred to the target. The actual transfer of the Robocop component is activated by the initiator. The download framework supports two types of transfer strategies: Push and Pull. Figure 10 illustrates the Push strategy. In that case the initiator sends a message to the repository to start transferring the Robocop component. In the latter case the initiator will send a message to the target that it should fetch the Robocop component at the repository. 4.5 Target Role The target role has two main responsibilities. The first is that a target should be ready to receive the Robocop component that is transferred during the download process and make it resident on the device. The second responsibility is providing the a profile of the target device to the decider. This is necessary for the decider to be able to make sure that the downloaded Robocop component is suitable for the target. In case of a Pull strategy for the transfer of the Robocop component, the target will have an additional responsibility: fetching the Robocop component. The example download process in Figure 10 is based on the Push strategy in which this is not the case.
182
5
Johan Muskens, Michel R.V. Chaudron, and Johan J. Lukkien
Concluding Remarks
5.1 Discussion In recent years a number of component models have been developed. These component models aim to increase productivity and reduce time to market by enabling re-use of software. These benefits are also desirable in the CE domain. However, existing component models have failed to meet some of the specific requirements in the CE domain: – – – –
Robust and reliable operation Run-time upgrading and extension Low resource footprint Support for component trading
In this chapter we presented the Robocop component framework that is specifically developed to satisfy these requirements in order to be suitable for the for the CE domain, or more general for the domain of high volume embedded systems. Robust and reliable operation have been achieved by using explicit dependencies between components and by providing hooks for analysis techniques (models describing the properties of components and services). Runtime upgrading and extension have been realized using dynamically loadable components. Each component provides a single entry point, that gives access to the services provided by the component. In order to achieve runtime upgrading and extension, services needed a binary interface. At runtime, services can be instantiated and connected by a third party. We used the v-table approach that is also used in COM [1]. Low resource footprint was one of the high priority requirements. To achieve this requirement we adhered the following principle: ”Do not pay for what you don’t use!”. We designed a minimal runtime environment that offers the core functionality of a component framework, e.g. discovery, instantiation, binding, and an efficient communication mechanism. Additional features are provided using optional frameworks. If an optional frameworks is not used, then there no overhead is incurred during run-time. Support for component trading should enlarge the market of components in order to increase the benefits of component based development. We distinguish Robocop components that are the unit of trading and configuration and executable components which are executable code that can be executed on the target device. To increase the tradeability of the components, it is desirable to have more information about a component than its executable only. For example, documentation, source code, specification, may be very useful. Therefore, Robocop components are defined to be a set of models. One of these models typically is the executable component; other models may provide complementary information about the component. 5.2 Related Work Over the last couple of years a number of component models have been developed. A taxonomy of the most well known component models has been presented in section 2. Koala [13], PECOS [22], and AutoComp [16] are component models developed especially for embedded systems. These component models have some of the features
A Component Framework for Consumer Electronics Middleware
183
required in the CE domain; they enable robust & reliable operation and a low resource footprint. Although run-time upgrading was already supported by existing component models like COM [1], EJB [15] and .NET [9], this was not the case for component models for embedded systems. To remedy this, the Robocop component model enables run-time upgrading and increase the support for run-time trading of components. In the Robocop framework we combine ideas for run-time upgrading from COM with ideas for robust and reliable operation from Koala. 5.3 Contributions With the development of the Robocop framework we combined a number of solutions from existing component models to create a component model suitable for the CE domain. In addition we introduced the notition of a Robocop component as a set of models. Different model-types can be used to describe different aspects of a component, similar to the way multiple views are used in describing software architectures. The different models of a component increase the trade-ability since customers are usually not interested in just the executable code, but need additional information to assess suitability.
Acknowledgments We are grateful to all the partners of the Robocop project for their contributions. Philips Electronics, Nokia, CSEM, Saia Burgess, ESI, Fagor, Ikerlan, University Polytechnic de Madrid, and Eindhoven University of Technology. The Robocop project has been funded in part through the European ITEA programme.
References 1. D. Box. Essential COM. Object Technology Series. Addison-Wesley, 1997. 2. I. Crnkovic and M. Larsson. Building Reliable Component-Based Software Systems. Artech House Publishers, 2002. 3. E. Dashofy, A. van der Hoek, and R. Taylor. Towards architecture-based self-healing systems. In Proceedings of the first workshop on Self-healing systems. ACM, Nov. 2002. 4. M. de Jonge, J. Muskens, and M. Chaudron. Scenario-based prediction of run-time resource consumption in component-based software systems. In Proceedings: 6th ICSE Workshop on Component Based Software Engineering: Automated Reasoning and Prediction. ACM, June 2003. 5. E. Eskenazi, A. Fioukov, D. Hammer, and M. Chaudron. Estimation of static memory consumption for systems built from source code components. In 9th IEEE Conference and Workshops on Engineering of Computer-Based Systems. IEEE Computer Society Press, Apr. 2002. 6. E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patters: Elements of Reusable Object oriented Software. Addison Wesley, 1995. 7. G. Holzmann. Spin Model Checker. Addison Wesley, 2003. 8. K. Jensen. Coloured Petri Nets. Basic Concepts, Analysis Methods and Practical Use. Springer-Verlag, 1997.
184
Johan Muskens, Michel R.V. Chaudron, and Johan J. Lukkien
9. J. Lowy. .NET Components. O’Reilly and Associates, 2003. 10. T. Mowbray and R. Zahavi. Essential Corba. John Wiley and Sons, 1995. 11. J. Muskens and M. Chaudron. Prediction of run-time resource consumption in multi-task component-based software systems. Technical Report TR-117, Technische Universiteit Eindhoven, 2003. 12. J. Muskens and M. Chaudron. Integrity management in component based systems. In Proceedings of the 30th EUROMICRO conference, Rennes France, Aug. 2004. 13. R. van Ommering, F. van der Linden, J. Kramer, and J. Magee. The Koala component model for consumer electronics software. IEEE Computer, 33(3):78–85, Mar. 2000. 14. Robocop Consortium. Robocop: Robust open component based software architecture for configurable devices project (http://www.extra.research.philips.com/euprojects/robocop/), 2001. 15. E. Roman, S. Ambler, and T. Jewell. Mastering Enterprise JavaBeans. John Wiley and Sons, 2001. 16. K. Sandstrom, J. Fredriksson, and M. Akerholm. Introducing a component technology for safety critical embedded real-time systems. In 7th ICSE Workshop on Component-Based Software Engineering, May 2004. 17. B. Schmerl and D. Garlan. Exploiting architectural design knowledge to support selfrepairing systems. In Fourteenth International Conference on Software Engineering and Knowledge Engineering, 2002. 18. G. Smith. The Object-Z Specification Language. Kluwer Academic Publishers, Boston, 1999. 19. Space4U Consortium. Space4u: Software platform and component environment 4 you (http://www.extra.research.philips.com/euprojects/space4u/), 2003. 20. C. Szyperski. Component-based Software Engineering beyond object orientation. AddisonWesley, 1998. 21. R. van Engelen and K. Gallivan. The gsoap toolkit for web services and peer-to-peer computing networks. In 2nd IEEE Internation Symposium on Cluster Computing and the Grid. IEEE Computer Society Press, May 2002. 22. M. Winter, T. Genssler, A. Christoph, O. Nierstrasz, S. Ducasse, R. Wuyts, G. Arevalo, P. Muller, C. Stich, and B. Schonhage. Components for embedded software - the PECOS approach. In 2nd ECOOP Workshop on Composition Languages, 2002.
Connecting Embedded Devices Using a Component Platform for Adaptable Protocol Stacks Sam Michiels, Nico Janssens, Lieven Desmet, Tom Mahieu, Wouter Joosen, and Pierre Verbaeten K.U.Leuven, Dept. Computer Science, Celestijnenlaan 200A, B-3001 Leuven, Belgium {sam.michiels,nico.janssens}@cs.kuleuven.ac.be
Abstract. Research domains such as sensor networks, ad-hoc networks, and pervasive computing, clearly illustrate that computer networks have become more complex and dynamic. This complexity is mainly introduced by unpredictable and varying network link characteristics, heterogeneous capabilities of attached nodes, and the increasing user expectations regarding reliability and quality of service. In order to deal with this complexity and dynamism of computer networks, the system’s protocol stack must be able to adapt itself at runtime. Yet, to handle this complex challenge effectively and efficiently, we claim that it is essential for protocol stacks to be developed with run-time adaptability in mind. This chapter presents a software architecture tailored to build highly adaptable protocol stacks, along with a component platform that enforces this architecture. Although the presented software architecture focuses on protocol stacks in general, we zoom in on the application of its founding principles in the domain of embedded network devices.
1
Introduction
The use of mobile embedded devices to offer users network connectivity anywhere and anytime increases significantly [1]. In order to achieve seamless interoperability of heterogeneous devices in a highly dynamic network, the protocol stack of each connected device often needs to exhibit a similar degree of dynamism. Connected devices can vary from powerful portable PCs or PDAs to resource limited embedded devices like mobile phones or sensors. This chapter presents a software architecture [2] tailored to build highly adaptable protocol stacks, along with a component platform [3] that enforces this architecture. We refer to this combination as DiPS+, the Distrinet Protocol Stack [4]. The key focus in DiPS+ is run-time adaptability to application- and environment-specific requirements or characteristics. The strength of the DiPS+ approach is twofold. On the one hand, DiPS+ provides for two essential aspects of run-time adaptability: it offers support for controlling concurrency behavior of the protocol stack and for swapping components in a transparent manner, while sharing a common component platform core. This considerably facilitates system management, since it allows for modular integration of non-functional extensions that cross-cut the core protocol stack functionality. C. Atkinson et al. (Eds.): Component-Based Software Development, LNCS 3778, pp. 185–208, 2005. c Springer-Verlag Berlin Heidelberg 2005
186
Sam Michiels et al.
On the other hand, DiPS+ proposes a design method that imposes the separation of basic protocol stack functionality from additional run-time adaptability support. As will be illustrated further in this chapter, the employed separation of concerns allows for a programmer to concentrate on a single concern (e.g. the behavior of a DiPS+ protocol stack) without being distracted by other concerns scattered across the same functional code (such as additional adaptability support). This is essential for making adaptable protocol stacks more comprehensible, reusable and flexible. We believe that the DiPS+ component platform is a convincing case study to illustrate the potential of using fine-grained components and separation of concerns in building highly adaptable network systems. We argue that (1) in order to achieve runtime adaptability, the software must be developed with flexibility in mind, and that (2) modularity and strict separation of concerns are two main characteristics of an adaptable design [4]. Obviously, there are many other specific concerns when developing embedded systems, such as performance control, resource awareness, and real-time constraints. Experience shows that at least the first two of these “embedded system characteristics” benefit from our software architecture as well. We do not claim that the DiPS+ component platform can be used as-is in networked embedded systems; however, we are convinced that its founding principles can be beneficial for this kind of software. Throughout the chapter, we will clarify the advantages of the DiPS+ ideas for embedded systems. We have validated DiPS+ successfully, a.o. in the context of concurrency control [5, 6]. The DiPS+ component platform has also been applied in research domains different from component swapping and concurrency control. Discussing all related research tracks would lead us too far and certainly transcends the scope of this chapter. In summary, DiPS+ offers support for unit testing [7], automatic component composition [8, 9] and framework optimization [4], while current research extends DiPS+ from a local node architecture to a distributed management platform that allows for controlling and adapting multiple connected protocol stacks [10]. The remainder of this chapter is structured as follows. Section 2 sketches the domain of our case study: it explains the need for providing flexibility in protocol stacks with respect to concurrency control and component hot-swapping. Section 3 presents the DiPS+ component platform, which offers core programming abstractions to improve the development of adaptable protocol stacks. The two sections that follow each describe a specific extension of the DiPS+ platform to control and manage the underlying protocol stack: Section 4 focuses on dynamic load management; Section 5 explains how transparent component swapping is supported. Section 6 describes the DiPS+ prototype and its validation in various research projects and Master’s theses. It also explains how our positive experiences in the domain of networking software can be applied in the broader domain of component-based embedded systems. Section 7 positions our work with respect to related research. Conclusions are presented in Sect. 8.
2
The Specific Case of Protocol Stacks
The development of protocol stacks is often complex and error-prone, especially when additional preconditions (such as the need for run-time adaptability) are imposed. Be-
Platform for Adaptable Protocol Stacks
187
fore elaborating on how advanced separation of concerns contributes to making adaptable protocol stacks more comprehensible, accessible and reusable, we elaborate in this section on the importance of run-time adaptability in the domain of protocol stack software (whether or not in an embedded system). More precisely, we focus on nonfunctional adaptations (load management) and on functional adaptations (component hot-swapping). 2.1 Load Management Management of system load in networked systems tries to prevent systems from being overwhelmed by arriving network packets. Load management is highly important for both embedded devices in an ad-hoc network (since they may have limited resources available), and network access devices (which may receive considerable access demand peaks when a large group of users connects in parallel). Since cooperating nodes in an ad-hoc network may be highly heterogeneous with respect to available processing resources, memory, and data transfer capabilities, low-end router nodes easily get overloaded by data transfers induced by more powerful machines. By consequence, adaptive load management is highly relevant for embedded network devices. Solutions for system load control often depend on run-time circumstances and/or application-specific requirements. In addition, system load should be controlled and managed at run-time to handle changing network circumstances gracefully. These changes can, for instance, be induced by (1) popular services being offered on the network, resulting in increasing network traffic to the server, (2) more clients being added dynamically and/or clients with varying quality of service requirements, or (3) decreasing processing capabilities when the battery of a stand-alone device is getting low. In other words, circumstances may vary at the side of the server, the clients, and the network nodes themselves. In order to enable (low-end) devices to handle overload situations gracefully, our approach proposes to dynamically balance resource consumption based on applicationand environment-specific requirements. This goal is achieved by detecting internal bottlenecks and deploying a solution to the problem in the running protocol stack. A bottleneck occurs when many more packets arrive at a component than can be processed immediately. Bottlenecks can be processed in many different ways. We concentrate on three approaches: packet classification and prioritization, input rate control, and thread reallocation from underloaded to overloaded areas. It is important that solutions (e.g. packet classification, input rate control, thread re-allocation) can be performed at any place in the protocol stack and, by consequence, can be based on information not yet available when the packet arrives. Protocol headers, for instance, only release their information when they have been parsed; however, this information may considerably influence further processing of the packet. In addition, the classification strategy used to differentiate between packets may be based on application- and/or environment-specific requirements, in order to take into account changing circumstances at run-time (e.g. ad-hoc network topology, available system resources, network load, etc.). Thread reallocation focuses on tasks to be executed instead of packets. This allows to customize processing of particular areas in the system by adding or removing threads locally. For
188
Sam Michiels et al.
example, system performance is improved by increasing parallelism in areas that have become (temporary) I/O bottlenecks. Our approach is complementary to existing load management techniques that are, or can be, used to handle overload situations gracefully. The most relevant techniques in this context are quality-of-service (QoS) protocols, load balancing [11], and active networks [12]. Our approach complements these distributed techniques by offering a local platform that is able to detect and (partially) handle overload situations. 2.2 Component Hot-Swapping Research domains such as ad-hoc networks, sensor networks, 4G wireless networks and pervasive computing, clearly indicate a trend towards more heterogeneous mobile computer networks. Network heterogeneity manifests itself in the form of increased diversity in the type of communication technology that devices are equipped with (such as Bluetooth, WiFi, HomeRF and satellite links), as well as in the types of embedded devices connected to the network (differing in memory capacity, processing power and battery autonomy). In addition, performance characteristics of network nodes and communication links most often change over time, a.o. due to disturbing influences. These heterogeneous and dynamic performance specifications will affect the inter-operability of connected nodes, and as a result are most likely to compromise the communication quality of the network, in particular when a best-effort communication model is employed. For instance, a Bluetooth scatternet (operating at 2Mbps) will probably become a bottleneck when interconnecting a number of 802.11 MANET’s (22Mbps throughput). To fully exploit the potential of such heterogeneous and dynamic networks, it is essential for the protocol stacks of the connected embedded devices to adapt themselves at run-time as the environment in which they execute changes (e.g. by installing a compression service to boost the quality of the slow Bluetooth scatternet). To this end, we aim at coping with the increasing user expectations regarding quality of service. By consequence, the underlying protocol stacks should exhibit a similar degree of dynamism, which illustrates the need for employing programmable [13] (i.e. adaptable) network nodes. These programmable networks are strongly motivated by their ability to rapidly change the protocol stack of network nodes without the need for protocol standardization. In addition, protocol stack reconfigurations should be performed at run-time (transparently for end-user applications) to promote permanent connectivity of the embedded devices and thus exploit the full potential of mobile wireless networks. This requires the node architecture to conduct adaptations (recomposition) of the protocol stack functionality without having to shut down and restart active connections. As a result, a running DiPS+ protocol stack can be customized by a third party (such as a network operator or intelligent self-healing network support), without interfering with the execution of applications using the network. More in detail, we focus on unanticipated protocol adaptations, such as feature additions and protocol revisions. Since these adaptations are not anticipated at design-time or deployment-time, component hot-swapping is essential to achieve seamless run-time evolution of protocol stacks in mobile embedded devices. In addition, component hot-
Platform for Adaptable Protocol Stacks
189
swapping is justified by the memory constraints inherent in connected limited embedded devices, such as intelligent sensors and mobile phones. Depending on the protocol to be adapted, additional support is required to prevent the replacement of DiPS+ components from jeopardizing the functionality of a running stack, which would compromise the correct functioning of the ad-hoc network. This includes avoiding packet loss during a reconfiguration (a.o. essential when changing protocols like TCP that aim to provide full reliability) as well as imposing a safe state over the DiPS+ components before conducting the actual reconfiguration. As will be illustrated in Sect. 5, the latter is essential to prevent reconfiguration of a composition from breaking the consistency of the components making up the protocol stack [14, 15].
3
The DiPS+ Component Platform
As stated in the introduction, DiPS+ aims for modular integration of non-functional extensions (such as support for load management and component hot-swapping), which share a common component platform. Strict separation of such non-functional behavior has proven to be an essential feature of adaptable, maintainable and reusable software [16]. To separate non-functional behavior from basic protocol stack functionality, the DiPS+ architecture represents data (packet) processing and protocol stack management as two planes on top of each other, respectively the data and the management plane. The data plane in the DiPS+ architecture houses the functional part of the system, i.e. the protocol stack. This plane identifies components and how they are connected on the one hand, and offers layers as a composition of basic components on the other hand. On top of the data plane, DiPS+ offers one or more management planes, which act as meta-levels to extract information from the data plane and control its behavior. Each management plane is responsible for a specific concern (e.g. load management or component hot-swapping) and is clearly separated from the data plane. In this way, a management plane can be added or removed without affecting components in the data plane. In the remainder of this section, we elaborate on the architectural styles employed by the data plane and describe how the provided abstractions in the DiPS+ component platform enable run-time adaptability. Afterwards, in Sects. 4 and 5, the modular extendibility of the data plane with support for load management and component hotswapping will be demonstrated. 3.1 Data Plane: Combination of Architectural Styles When taking a closer look at the architecture of the data plane, we can identify three main architectural styles – the pipe-and-filter, the blackboard, and the layered style. By employing these architectural styles, the DiPS+ platform offers a number of framework abstractions (such as components, connectors, and packets) to ease development of adaptable protocol stacks. Pipe-and-Filter Style. The pipe-and-filter style is very convenient for developing network software, which maps naturally to the pipeline style of programming. A protocol stack can be thought of as a down-going and an up-going packet flow.
190
Sam Michiels et al.
Fig. 1. Example of a DiPS+ component pipeline with a dispatcher that splits the pipeline into two parallel component areas. More processing resources have been assigned to the upper concurrency component
The core abstractions of a typical pipe-and-filter software architecture are connectors (pipes) and components (filters). Connectors provide a means to glue components together into a flow. Each functional component in DiPS+ represents an entity with a well-defined and fine-grained functional task (e.g. constructing or parsing a network header, fragmenting a packet or reassembling its fragments, or encrypting or decrypting packet data). Our architecture distinguishes additional component types for dispatching and concurrency (see Fig. 1). These are not only highly relevant abstractions for protocol stack software, identifying them as explicit entities also facilitates their control. The dispatcher serves as a de-multiplexer, allowing to split a single flow into two or more sub-flows. Concurrency components and component areas are described in Sect. 3.3. Blackboard Style. The blackboard interaction style is characterized by an indirect way of passing messages from one component to another, using an in-between data source (blackboard). This style is very convenient in combination with the pipe-and-filter style to increase flexibility and component independence. The blackboard model is mapped onto the DiPS+ architecture as follows (see also Fig. 2). In order to finish a common task, DiPS+ components forward an explicit message (packet) object from the source to the sink of the component pipeline. In addition, each message can be annotated with meta-information. Attaching meta-information allows to push extra information through the pipeline along with the message, for instance
Fig. 2. Anonymous communication via a blackboard architectural style: a blackboard data structure has been coupled to each message to carry meta-information from one component to another
Platform for Adaptable Protocol Stacks
191
to specify how a particular message should be processed. The message represents the blackboard, which encapsulates both data and meta-information. In this way, components that consume specific meta-information do not have to know the producer of these data (and vice versa). By consequence, components become more independent and reusable since they do not rely on the presence of specific component instances. Layered Style. Introducing an explicit layer abstraction in a protocol stack architecture is highly relevant for several reasons. First and foremost, it is very natural to have a design entity that directly represents a key element of a protocol stack. Secondly, each layer offers an encapsulation boundary. Every protocol layer encapsulates data received from an upper layer by putting a header in front. Finally, from a protocol stack point of view, layers provide a unit of dispatching. The general advantage of applying the layered style is that it allows to zoom in and out to an appropriate level of detail. When not interested in the details of every fine-grained component, one can zoom out to a coarse-grained level, i.e. the layer. 3.2 Explicit Communication Ports The employed architectural styles have resulted in the design of the DiPS+ components. A component in DiPS+ is developed as a core surrounded by explicit component entry and exit ports. DiPS+ Component. Component activity is split into three sub-tasks: packet acceptance, packet processing, and packet delivery. The DiPS+ framework controls packet acceptance and delivery by means of explicit component entry and exit points (the packet receiver and forwarder). The design of a DiPS+ component consists of three entities (see also Fig. 3). Packet processing is taken care of by a DiPS+ Unit class, which forms the core of a component. The PacketReceiver (PR) and PacketForwarder (PF) classes act as unit wrappers and uncouple processing units. The DiPS+ Component class is a pure framework entity that is transparent to programmers. A component encapsulates and connects a unit together with its packet receiver and forwarder. All components in DiPS+ share a common functional packet interface incomingPacket(Packet p). Some components may offer one or more management interfaces next to their functional interface, as will be described further in Sect. 3.3. With an eye to enable fine-grained management, the DiPS+ data plane is designed to be open for customizations in a well-defined way [17, 18]. DiPS+ components allow for transparent packet interception at the communication ports via their associated Policy object (see Fig. 3). The policy delegates each packet to a number of pipelined ManagementModule objects, which may be registered by an administration tool at application level. Unlike functional components, management modules encapsulate non-functional behavior (e.g. throughput monitoring, logging, or packet blocking). Advantages. The combination of the pipe-and-filter and the blackboard architectural style results in two main advantages. First of all, it supports the design of so-called
192
Sam Michiels et al.
plug-compatible components [19], i.e. components that are unaware of any other component, directly or indirectly. The pipe-and-filter style uncouples adjacent components by means of a connector (represented in DiPS+ by a PF-PR combination). The blackboard style, for its part, allows for anonymous component interaction (represented in DiPS+ by packets and their associated meta-information). Secondly, the combination enables fine-grained and unit-specific management and control. Both the PR and the PF serve as attachment hooks for the management plane. Such hooks are designed in DiPS+ as separate entities, called policies, which are responsible for the handling of incoming and outgoing packets. Thanks to this plug-compatible component model and fine-grained management and control of the data plane, extending a protocol stack with load management and/or component hot-swapping becomes much easier and understandable (see Sect. 2).
Fig. 3. A DiPS+ component (consisting of a packet receiver, the core unit, and a packet forwarder) with a policy object that intercepts incoming packets (p). The policy delegates incoming packets to a pipeline of management modules
3.3 Explicit Concurrency Components Finally, to separate the employed concurrency model of a DiPS+ stack from basic functionality, functional components are complemented with concurrency components. This allows for a developer to concentrate on the concurrency aspect of a DiPS+ stack, without being discarded by other concerns scattered across the same functional stack and vice versa. A concurrency component allows to increase or decrease the level of parallelism in the component area behind it. In addition, it controls which requests are scheduled and when. Each concurrency component breaks the pipeline into two independent component groups, which will be referred to as component areas (see also Fig. 1). Concurrency components exploit the benefits of both the pipe-and-filter and the blackboard architectural style. The pipe-and-filter style divides the system into plugcompatible components. As a result, concurrency components can be added anywhere in the pipeline, without affecting the functional components within. The DiPS+ dispatcher allows to split a component pipeline into parallel sub-pipes. In this way, each sub-pipe can be processed differently by putting a concurrency component in front of
Platform for Adaptable Protocol Stacks
193
it. Thanks to the blackboard style of data sharing associated with each individual message, component tasks are typically packet-based, i.e. each component handles incoming packets by interpreting or adding meta-information. This allows to increase parallelism since most components have no local state that is shared by multiple threads in parallel. The design of the concurrency component consists of three major entities: a packet queue, one or more packet handlers, and the scheduler strategy. Its behavior during overload or under-load can be customized via its management interface (see Figure 4), which allows to register specific overflow and underflow strategies. In this way, the concurrency component can be controlled without exposing its internal attributes (such as the packet queue). A packet handler is a thread that guides a packet through the component area behind its concurrency component. The scheduler strategy of a concurrency component decides which packet will be selected next from the packet queue. The scheduler strategy can be customized via the scheduler interface of a concurrency component (see Fig. 4).
Fig. 4. A DiPS+ concurrency component with its management and scheduler interface
Advantages. Having explicit concurrency components shows three major advantages. First of all, it allows not only to reuse functional components whether or not concurrency is present, but also to reconfigure and customize the system where concurrency needs to be added. In this way, the system’s structure can be fine-tuned to specific circumstances and requirements, for instance, by adding concurrency components only if needed. Secondly, it allows for fine-grained and distributed control of scheduling in the protocol stack. Each concurrency component may incorporate a customized scheduling strategy, using all meta-information attached to the request by upstream components. This information may not yet be available at the beginning of the component pipeline. In this way, packet processing can be adapted to both request-specific information (e.g. content type, size, or sender) and the system’s state (e.g. available resources) as the packet traverses the component pipeline. A third advantage of having concurrency components spread throughout the system, is that it allows to prioritize not only between incoming packets, but also between component areas. On the one hand, this considerably facilitates finding and solving I/O bottlenecks, i.e. component areas that are overwhelmed because too many arriving packets require I/O access. On the other hand, concurrency components may help prioritize particular component areas based on application-specific requirements. DiPS+
194
Sam Michiels et al.
concurrency components allow, for instance, to associate additional threads with those component areas that are about to release resources that have become scarce.
4
Management Plane for Load Management
As a first validation of the flexibility of the abstractions offered in DiPS+, we illustrate how a DiPS+ composition is extended with load management support in a modular manner. The need for load management (as described in Sect. 2.1) has resulted in the DMonA (Dips+ Monitoring Architecture) management plane, which controls and customizes the behavior of the protocol stack. DMonA allows for handling certain overload situations in an application-specific manner via interventions at protocol stack level. These interventions focus on packet classification, controlling the packet arrival rate, and optimally distributing processing threads over the tasks to be executed. DMonA is a feedback-driven management platform. This means that DMonA (1) extracts information from the underlying protocol stack (via the policy associated with a PR and/or PF), (2) decides whether or not action must be taken (using a monitor policy), and (3) deploys this solution in the protocol stack. The rest of this section describes how DMonA handles load management, viewed from three complementary perspectives: packet classification, request control, and concurrency control. 4.1 Packet Classification Packet classification differentiates between packets based on meta-information that is collected in each packet as it traverses the protocol stack. By consequence, the further a packet has traversed the component pipeline, the more meta-information is available for its classification. Packet differentiation can be based, for instance, on parameters such as destination, data size, encapsulated protocol, packet type (connection establishment or data transfer), or on application-specific preferences passed via meta-information. Packet classification is highly relevant when different categories or types of packets can be recognized, and service quality should be guaranteed for specific categories. During overload, the most important packets can be handled with priority. Packet classification can easily be added to a protocol stack thanks to three abstractions offered in the DiPS+ component platform: meta-information, dispatchers, and concurrency components. Meta-information is used by applications or components to annotate packets. These annotations influence how dispatchers and concurrency components process packets. A dispatcher is associated with a specific classification strategy, which is used to demultiplex the component pipeline in parallel sub-pipelines based on meta-information. A concurrency component for its part encapsulates a packet buffer and a specific scheduler strategy, which decides what packet to process next from the buffer. Either, the dispatcher can delegate packets to different concurrency components, one for each category; in this case, the packet scheduler selects packets from multiple queues. Or, the dispatcher can delegate packets to one ordered buffer that puts high priority packets first; in this case the packet scheduler is associated with one packet buffer and fetches packets in priority order.
Platform for Adaptable Protocol Stacks
195
Given the flexibility of DiPS+, DMonA support can be limited in order to allow for system administrators to install specific classification strategies (in the dispatchers) and scheduler strategies (in the concurrency components). Packet classification has been validated in the context of an industrial case study that customized the RADIUS authentication protocol so as to differentiate between gold, silver, and bronze types of users [5, 6]. 4.2 Controlling Arrival Rate From a request control perspective, system load is managed by limiting or shaping the arrival rate of new requests to a sustainable level. Such traffic control may, for instance, selectively drop low-priority packets to preserve processing resources for the most important requests. This is crucial when too much requests arrive to be handled by the available processing resources. Request control is highly relevant to protect the system from packet bursts and to allow for it to handle them gracefully by removing incoming packets early in the processing pipeline (e.g. in the protocol stack of the system). In addition, by prioritizing packets based on packet- and application-specific knowledge, the least important packets are removed first. Traffic control has been effectively employed in networks, for example, to provide applications with quality-of-service guarantees by individually controlling network traffic flows (also known as traffic shaping) [20]. Typically, a leaky bucket algorithm [21] is used to adjust the rate at which incoming packets are forwarded. In addition, a variety of performance metrics have been studied in the context of overload management, including throughput and response-time targets [22–24], CPU utilization [25–27] and differentiated service metrics based on a given performance target [28, 29]. Welsh [22] proposes the 90th percentile response-time as a realistic and intuitive measure of clientperceived system performance. It is defined as follows: if the 90th percentile responsetime is t, then 90% of the requests experience a response-time equal to or shorter than t. When applying DMonA in the context of traffic control, we need to provide information collectors (i.e. sensors) at the entry of a monitored component area, a monitor policy that decides on the actions to be taken, and a component area to be controlled. As a concrete example, we use the 90th percentile approach of Welsh [22]. First of all, a response-time sensor measures the response-times for packets passing through a component area. Such sensors are installed at each concurrency component’s packet forwarder and determine how long it takes between a request leaving the concurrency component and the release of the associated thread. A DMonA information collector collects the response-times of all packets that have passed through a component area. Secondly, the 90th percentile algorithm itself is offered as a monitor policy, which processes the collected information at regular times. In this case the algorithm checks whether 90% of the packets experience a response-time equal to or shorter than some pre-defined threshold t. Thirdly, the leaky bucket controls the admission rate of packets entering the monitored area. The leaky bucket is installed as a management module, associated with the packet receiver policy of the concurrency component in front of the area under control. This packet receiver is the perfect place for such control, since it represents the entry of a component area.
196
Sam Michiels et al.
4.3 Concurrency Control While packet classification and request control focus on packets, concurrency control focuses on the tasks to be executed in the protocol stack. From a concurrency perspective, load management distributes the available processing power (i.e. threads) across the system’s component areas (tasks) such that the overall system performance is optimized [30]. This means that the DMonA management plane should be able to detect performance bottlenecks, i.e. component areas where packets arrive faster than they can be processed. In addition, the management plane should solve these bottlenecks by migrating processing resources associated with the concurrency component in front of a component area, from underloaded to overloaded component areas. Concurrency control is an effective technique for load management, since it allows to control how processing threads are applied at any time (e.g. to handle the highest priority tasks first), and compensates for blocking invocations inside the protocol stack. Because our approach allows for concurrency components to be added at arbitrary places in the protocol stack, bottleneck areas can easily be detected by measuring the throughput of each area. In addition, concurrency components allow for handling bottlenecks intelligently by increasing or decreasing the number of associated packet handler threads in certain component areas, which can be highly effective for parallel areas with blocking components. Moreover, as already described in Section 4.1, concurrency components support packet classification via their specific scheduling strategy.
Fig. 5. Illustration of DMonA attached to DiPS+ via two policies, one associated with the packet receiver and one with the packet forwarder. Processing resources are retrieved from a pool of free resources and allocated to a concurrency unit via its scheduler interface
More specifically, DMonA monitors the packet stream by installing throughput sensors, i.e. management modules that count the number of passed packets. Figure 5 shows how sensors are plugged in at both the packet receiver and forwarder of a concurrency component. The DMonA monitor collects on a regular basis the information stored in both sensors and resets them to start the next collecting phase. One possible monitor policy adjusts thread scheduling based on the concurrency component’s progress [4],
Platform for Adaptable Protocol Stacks
197
comparable to the feedback-driven approach proposed by Steere [31]. Based on this status information, the DMonA monitor decides when and how to adapt local concurrency behavior to improve performance. Proposed monitor decisions can be deployed in two ways. On the one hand, a concurrency component can be linked with or unlinked from a packet handler thread. This is done via the concurrency unit’s scheduler interface (see Figure 5). On the other hand, the buffer overflow and underflow strategies of a concurrency component can be replaced by calling its management interface.
5
Management Plane for Transparent Component Hot-Swapping
The need for run-time adaptable protocol stacks (as described in Section 2.2) has resulted in the development of the CuPS (Customizable Protocol Stack) platform, a modular extension to the DiPS+ framework responsible for conducting seamless reconfigurations of a running protocol stack (illustrated in Figure 6). Since we aim for unanticipated adaptations, protocol stack reconfigurations imply changing a stack composition, rather than being limited to parameter tuning. The algorithm employed by the CuPS platform to orchestrate a reconfiguration of a protocol stack composition at run-time, involves three stages: Installation of new component area. The adaptation process starts with the installation of the new functional components, resulting in the co-existence of the old component area (still in use) and the new version (not yet activated). Activation of new component area. Next, the newly installed functional components become activated. This is achieved through stopping and disconnecting the old component area by redirecting packets towards the new version. At this point in the adaptation procedure, the new component area is plugged into the stack composition and will process transmitted packets. Removal of old component area. Finally, the old component area is removed. Since it has been stopped during the activation stage, it can safely be removed. In the remainder of this section, we elaborate on the activation phase to illustrate how an existing DiPS+ composition is extended with CuPS support in a modular manner. The terms reconfiguration and adaptation are used as an alternative for activation. 5.1 Self-contained Components in a Best-Effort Environment A first category of reconfigurations encloses the deployment of component areas strictly composed of functional components that are self-contained, i.e. components not depending on cooperation with other components to implement a service. Two examples of such self-contained protocol stack components are a filter component to relieve a congested node and a logging component. In addition, this class of reconfigurations assumes for packet loss or packet scrambling not to compromise the correct functioning of the network. Since performance (throughput) is an important characteristic of a protocol stack, most network protocols (such as IP) offer best-effort services and as such comply with this requirement. When both conditions are fulfilled, activating such a component area boils down to adapting the current composition. No additional support is needed to control the state
198
Sam Michiels et al.
(activity) of the DiPS+ component area that is subject to activation. By consequence, the activation phase is limited to removing the connectors binding the old component area into the protocol stack, and plugging in the new area. With this, packets that are processed by the old component area during the activation stage (depending on the employed concurrency model) will get lost. Note that due to the use of plug-compatible components, the actual recomposition of DiPS+ component areas has been reduced to a trivial problem of adding and removing connectors. 5.2 Self-contained Components Demanding Safe Deployment Depending on the properties of the network service that is subject to adaptation, packet loss during protocol reconfiguration could compromise the correct functioning of the protocol. As an example, we refer to the adaptation of a running TCP stack. When packets are lost during the activation process, TCP will consider these errors as packet loss due to congestion and hence will reduce its congestion window [32]. This will cause a substantial degradation of performance in terms of throughput, even though sufficient bandwidth might be available. As such, this family of protocol stack reconfigurations covers seamless adaptation of self-contained components, enforcing a safe state to be imposed on the component area under change. Since no other components depend on self-contained components to complete a service, such a safe state for reconfiguration is obtained when the component area is made passive. This implies that the functional components (1) are currently not processing any packets and (2) have no pending packets to be accepted and processed. In this way, packet loss caused by packets being processed while a component is swapped can be prevented. 1) Packet Blocking. By consequence, CuPS support is needed to block packet flows before passing through a DiPS+ component area facing a reconfiguration. This is achieved by holding up all outgoing packets of adjacent packet forwarders directed to the component area that is subject to adaptation. When the reconfiguration is completed, the execution of these blocked packets will be resumed. To extend the targeted DiPS+ components with such blocking support in a modular and transparent manner, their packet forwarders are equipped with special Policy objects for intercepting packets (conducted by the CuPS platform). The employed separation between the functionality of DiPS+ components (offered by a programmer) on the one hand and additional CuPS support to deactivate other components on the other hand, has a number of advantages: First of all, minimal interference with the rest of the system can be guaranteed. Interrupting interactions in a composition can be restricted to those locations where an actual reconfiguration is needed. Instead of stopping the concurrency components (as proposed in [33]), only the adjacent DiPS+ components that initiate interactions (by forwarding packets) on the component that is subject to adaptation need to be blocked (as illustrated in Fig. 6). With this, conducting a safe reconfiguration does not depend on the employed concurrency model implemented by the number of concurrency components and their location (controlled by the DMonA platform). This implies that CuPS
Platform for Adaptable Protocol Stacks
199
and DMonA can operate simultaneously, but independently from each other, sharing the same DiPS+ protocol stack. Secondly, due to separating support to block outgoing packets from the functional behavior of a DiPS+ component, changing the way of holding up packets at the packet forwarder will not interfere with existing component functionality and vice versa. As an example, we demonstrate the possibility to choose between two different blocking strategies. To obtain a safe reconfiguration, one could decide to block the execution thread in which the outgoing packets are initiated, using a ThreadBlockingPolicy. An alternative could be to queue outgoing packets without interrupting the execution thread by selecting the PacketQueueingPolicy. Such a change can be achieved transparently by only adapting the packet forwarders of the DiPS+ components that are involved. Finally, the impact of a blocking operation on a DiPS+ component can be made more fine-grained. Instead of stopping all interactions initiated by a component (e.g. by interrupting the execution thread of that component), only the packet forwarders initiating interactions that engage components that need to become passive should get blocked (illustrated by Component A in Fig. 6). In this way, packets that are sent out using other packet forwarders can still be initiated.
Fig. 6. Illustration of CuPS attached to DiPS+
2) Activity Monitoring. In addition to holding up packets to be accepted and processed by the component area subject to adaptation, safe adaptation also requires this component area to be inactive (i.e. currently not processing any packets). Due to the reactive behavior of a functional component, monitoring code to check whether such a DiPS+ component is active or idle can (automatically) be added by simply extending the policy employed by the packet receiver of this component. In case
200
Sam Michiels et al.
of concurrent interactions, activity inside a DiPS+ component can be monitored by means of a counter situated at its packet receivers, which is incremented on invocation and decremented upon return [34] (illustrated by means of the ActivityMonitor Policy in Fig. 6). When only sequential interactions are used, a counter can be replaced by a boolean flag. This reduces the monitoring overhead for each interaction.
5.3 Safe Deployment of Tightly-Coupled Components The last category of reconfigurations encloses the activation of component areas containing tightly coupled components, i.e. components depending on cooperation with other components (locally, or in a distributed fashion) to implement a service. This cooperation is formalized by means of a transaction, consisting of a sequence of one or more asynchronous interactions. Referring to a fragmentation service, a transaction to fragment and reassemble a packet encapsulates a number of interactions, each representing the transfer of one fragment (packet) from a fragmenter to a reassembler. This cooperation implies that, from a reconfiguration point of view, the cooperating components are only consistent after termination of a transaction (i.e. when all fragments are received by the reassembler and the original packet has been restored). As a consequence, when imposing a safe state for reconfiguration of a tightly coupled component, the dependencies formalized by the transaction should be taken into account. Kramer and Magee [33] have stated that achieving safe software reconfigurations requires the software modules that are subject to adaptation (in this context the reassembler) to be both consistent and frozen (passive). When software modules are consistent, they do not include results of partially completed services (or transactions). By forcing software modules to be frozen (passive), state changes caused by new transactions are impossible. Kramer and Magee describe this required consistent and frozen state as the quiescence of a component. As stated in the previous section, forcing a component area to be frozen has been accomplished (in a modular manner) by separating the functional behavior of a module from potential support to block its outgoing interactions. Since there is no knowledge about the state of the tightly coupled component at the moment packets are blocked, reconfiguration may lead to inconsistency (caused by replacing the component when protocol transactions are only partially completed). When referring to the fragmentation service, replacing the reassembler when it has not yet received all fragments (and thus could not reassemble the original packet) will break the consistency between fragmenting and reassembling component (and in that way, the correct functioning of the fragmentation service). By consequence, additional support is required to drive a component area into a consistent state. This has been achieved by extending DiPS+ packet forwarders with special policies allowing “controlled” packet blocking support. After blocking the packet forwarders that are directing packets to the component that will be replaced, it should be possible for the CuPS platform (which conducts the actual reconfiguration) to check whether safe reconfiguration of the component is achievable. When this is not the case, blocked interactions are resumed one by one until the required safe state for reconfiguration is attained.
Platform for Adaptable Protocol Stacks
201
Table 1. Source code example to illustrate the layer property description for the DiPS+ IPv4 protocol 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Checking whether safe reconfiguration of a component is achievable requires verification of its execution state. For that purpose, we have extended tightly-coupled DiPS+ components (that are eligible for reconfiguration) with monitoring code to reflect their current execution state. More in detail, verifying the internal state of a DiPS+ component is achieved by checking its internal Unit through introspection via the ManagementInterface. CuPS will only check this state in the face of an actual reconfiguration when the targeted component is idle.
6
DiPS+ Prototype and Validation
6.1 Prototype To validate the DiPS+ component platform and its potential for supporting run-time adaptability, we have developed a proof-of-concept prototype in Java, running on standard PC hardware. The protocol stack in Java is integrated in the Linux OS using a virtual Ethernet device (via the ethertap module in the Linux kernel). The DiPS+ prototype allows for building a protocol stack from an architecture specification. The DiPS+ architecture is represented in XML [35]. This representation specifies the core architecture entities, like components and protocol layers, along with how these entities are interconnected. To this end, descriptions for component connectors and layer glues, dispatchers and concurrency components are provided as well.
202
Sam Michiels et al.
By way of example of a DiPS+ description, the source code listing in Table 1 zooms in on the IP layer of a protocol stack. It lists all essential elements that layers can be composed of: components (lines 3, 3-6, and 13), connectors (lines 8-11, and 22-24), a dispatcher (lines 17-20), a concurrency component (line 15), and the upper and lower entry point (lines 7 and 21). Each of these items is represented as such in the architecture description, which makes the listing self-explaining (Table 1). Having an architecture description separated from the implementation has major advantages. First of all, the internals of the DiPS+ platform are transparent for the protocol stack developer, resulting in a black-box framework. Developing a DiPS+ protocol stack boils down to designing the appropriate components and providing in a correct composition description. A stack builder tool is used to automatically transform the architecture descriptions into a running protocol stack. By consequence, a developer can configure different compositions without having to write extra code, or to change or recompile the source code of individual components. A second advantage is that the use of an architectural description allows for specific (optimizing or test) builders to be applied to the same architecture description. Testing a protocol layer in isolation, for instance, reuses the architecture description, but creates the layer in a different context (i.e. a test case instead of a protocol stack). Finally, an architecture description allows for optimization, in the sense that an optimizer can analyze the architecture and change it in order to become more efficient. When, for instance, a network router is known to be connected to two networks with the same maximum segment size, it can be reconfigured to omit reassembly and refragmentation of forwarded packets, since they are fragmented in sufficiently small pieces already. Only packets for local delivery must be reassembled in this case. 6.2 Validation We have successfully validated the DiPS+ approach, a.o. in an industrial case study that compared a DiPS+ and a commercial Java implementation of the RADIUS authentication and authorization protocol [5, 6]. Performance results clearly show the advantage of using application-specific scheduling strategies during overload. Moreover, the DiPS+ RADIUS server is able to gracefully cope with varying (over)load conditions. DiPS+ did not only facilitate the development of the RADIUS protocol, it also allowed to experiment with different scheduling strategies without having to change any functional code. In addition, the DiPS+ framework has been validated in the context of on-demand composition of a protocol stack, based on application-specific requirements. We have built a prototype in DiPS+ that allows an application to express high-level service requirements (e.g. reliability of data transfer, encryption, local or networked transfer, etc.) that must be supported by the underlying protocol stack. Based on these requirements, a combination of protocol layers is suggested by a stack composition tool [8, 9], and a protocol stack is built by the DiPS+ builder [36]. This illustrates the flexibility of the DiPS+ platform. Multiple Master’s theses have explored and validated the DiPS+ component platform from various perspectives. First of all, DiPS+ has been used to design and implement particular protocols (e.g. SIP [37], IPv6 [38], a TCP booster [39], dynamic routing
Platform for Adaptable Protocol Stacks
203
protocols in [40, 41], an IPSec based VPN solution [42] and a stateful firewall in [43]). Secondly, DiPS+ has been applied in various domains to explore its applicability. The work in [44], for instance, describes how network management techniques can be used in combination with DiPS+ protocol stacks. Finally, more research related theses have explored fundamental extensions to the DiPS+ component framework and architecture (e.g. self-adaptability [45], and concurrency control [46]). Two main conclusions may be drawn from our experiences in guiding Master students during their thesis. First of all, the DiPS+ framework and architecture can quickly be assimilated, even by students with limited experience in software architectures and a mainly object-oriented design background. Nevertheless, creating a high-level modularized DiPS+ design of a network protocol was not always trivial, and sometimes required assistance of a DiPS+ team member to put the student on the right track. In our view, the students’ lack of design experience and the often poor documentation of protocol specifications lie at the basis of the complicated modularization process. The main advantage, compared to an object-oriented design, is that packet flows are clearly defined and well-identifiable, which makes a DiPS+ design much more understandable. Once the high-level design becomes clear, development of individual components is straightforward. Secondly, the theses show that DiPS+ allows for a highly incremental software development process. Stated differently, the first running prototype can usually be delivered quickly after implementation has started (i.e. after a few weeks). From then on, the prototype can easily be customized and extended towards the stated requirements. Although the DiPS+ component framework has been proposed in the context of protocol stacks, we are convinced of its applicability in other operating system domains. The first research results in the context of USB device drivers [47, 48] and file systems [30, 45] are very promising. These systems reflect a layered architecture, which perfectly matches the DiPS+ architecture.
7
Related Work
7.1 Protocol Stack Frameworks Although multiple software design frameworks for protocol stack development have been described in the literature [49–52], we compare the DiPS+ approach to three software architectures, which are tailored to protocol stacks and/or concurrency control: SEDA [22], Click modular router [53], and Scout [54]. SEDA [22] offers an event-based architecture for supporting massively concurrent web servers. A stage in SEDA can be compared with a DiPS+ component area along with its preceding concurrency component. Yet, the SEDA controller and associated stage are tightly coupled, whereas DiPS+ clearly separates a concurrency component from the functional code. As such, SEDA does not provide a clean separation between the functional and the management level. In addition, SEDA does not provide developers with an architecture specification, which makes it difficult for developers to understand the data and control flow through the set of independent stages. The Click modular router [53] is based on a design very analogous to DiPS+. Although one can recognize a pipe-and-filter architectural style, Click pays much less
204
Sam Michiels et al.
attention to software architecture than DiPS+. Click supports two packet transfer mechanisms: push and pull. DiPS+ offers a uniform push packet transfer mechanism and allows for active behavior inside the component graph by means of explicit concurrency components. The Scout operating system [54] uses a layered software architecture, yet does not offer fine-grained entities such as components for functionality, dispatching, or concurrency. Scout is designed around a communication-oriented abstraction called the path, which represents an I/O channel throughout a multi-layered system and essentially extends a network connection into the operating system. 7.2 Concurrency and Separation of Concerns A critical element in our research is the separation of concurrency from functional code. Kiczales [55] defines non-functional concerns as aspects that cross-cut functional code. An aspect is written in a specific aspect language and is woven into the functional code by a so-called aspect weaver at pre-processing-time. Although this approach clearly separates all aspects from the functional code (at design-time), aspects tend to disappear at run-time, which makes it very difficult (if not impossible) to adapt aspects dynamically. Apertos [16] introduces concurrent objects that separate mechanisms of synchronization, scheduling and interrupt mask handling from the functional code. This makes software more understandable, and reduces the risk of errors.
8
Conclusion
Our contribution represents a successful case study, DiPS+, on the development of component-based software for protocol stacks that are adaptable at run-time. The employed architectural styles and the resulting component abstractions (1) increase the flexibility and adaptability of protocol stack software and (2) facilitate the development process of software that is complex and error-prone by nature, especially when additional concerns (such as the need for run-time adaptability) are imposed. The combination of the pipe-and-filter and the blackboard architectural style has resulted in the design of plug-compatible DiPS+ components. By consequence, DiPS+ components are unaware of other components they are connected to, directly or indirectly. This is a major advantage in terms of flexibility, as it allows for individual components to be reused in different compositions. In addition, by employing these architectural styles (together with the layered style), the DiPS+ platform offers a number of framework abstractions (such as components, connectors, and packets) to ease the development of adaptable protocol stacks. Finally, separate component types for functionality, concurrency and packet dispatching allow for a developer to concentrate on a single concern (e.g. concurrency) without being distracted by other concerns that are scattered across the same functional code. As stated in the introduction, a second objective of the DiPS+ component platform is to allow for modular integration of non-functional extensions that cross-cut the core protocol stack functionality. In this chapter, we have illustrated (by means of DMonA and CuPS) that the use of explicit communication ports is essential to transparently
Platform for Adaptable Protocol Stacks
205
extend a DiPS+ protocol stack with support for controlling the packet flow. More precisely, they serve as hooks for connecting the data and the management plane. We have discussed our experiences with using the DiPS+ component platform in real-life situations. Although a seamless transformation of the DiPS+ platform towards embedded systems is not yet feasible, we argue that the principles behind DiPS+ (i.e. a combination of component-based development and separation of concerns) are crucial for component-based embedded network systems. In our opinion, this combination does not only facilitate the implementation of component hot-swapping and concurrency control (as we have demonstrated), but also seems very useful for other concerns such as on-demand and safe software composition [8, 9, 56], transparent data flow inspection [22], performance optimization [53], isolated and incremental unit testing [7], and safe updates of distributed embedded systems [10]. In our opinion, such concerns as data flow monitoring, component hot-swapping, unit testing, performance optimization, and safe composition are crucial for embedded software and will become even more so with the ongoing trend towards mobile and adhoc network connectivity of (highly heterogeneous) embedded devices. We hope that this case study can convince embedded system developers of the need and the power of a well-defined software architecture and component platform.
Acknowledgments Part of the work described in this chapter has been carried out for Alcatel Bell and supported by the Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT SCAN #010319, IWT PEPiTA #990219). Additional support came from the Fund for Scientific Research – Flanders (Belgium) (F.W.O. RACING #G.0323.01).
References 1. Hubaux, J.P., Gross, T., Boudec, J.Y.L., Vetterli, M.: Towards self-organized mobile ad hoc networks: the Terminodes project. IEEE Communications Magazine 31 (2001) 118–124 2. Shaw, M., Garlan, D.: Software Architecture - Perspectives on an emerging discipline. Prentice-Hall (1996) 3. Schneider, J.G., Nierstrasz, O.: Components, scripts and glue. In L. Barroca, J.H., Hall, P., eds.: Software Architectures – Advances and Applications. Springer-Verlag (1999) 13–25 4. Michiels, S.: Component Framework Technology for Adaptable and Manageable Protocol Stacks. PhD thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium (2003) 5. Michiels, S., Desmet, L., Joosen, W., Verbaeten, P.: The DiPS+ software architecture for self-healing protocol stacks. In: Proceedings of the 4th Working IEEE/IFIP Conference on Software Architecture (WICSA-4), Oslo, Norway, IEEE/IFIP, IEEE (2004) 6. Michiels, S., Desmet, L., Verbaeten, P.: A DiPS+ Case Study: A Self-healing RADIUS Server. Report CW-378, Dept. of Computer Science, K.U.Leuven, Leuven, Belgium (2004) 7. Michiels, S., Walravens, D., Janssens, N., Verbaeten, P.: DiPS: Filling the Gap between System Software and Testing. In: Proceedings of Workshop on Testing in XP (WiTXP2002), Alghero, Italy (2002) 8. S¸ora, I., Verbaeten, P., Berbers, Y.: A description language for composable components. In: Proceedings of 6th International Conference on Fundamental Approaches to Software Engineering (FASE 2003). Volume 2621., Warsaw, Poland, Springer-Verlag, Lecture Notes in Computer Science (2003) 22–36
206
Sam Michiels et al.
9. S¸ora, I., Cretu, V., Verbaeten, P., Berbers, Y.: Automating decisions in component composition based on propagation of requirements. In: Proceedings of 7th International Conference on Fundamental Approaches to Software Engineering (FASE 2004), Barcelona, Spain (2004) 10. Janssens, N., Steegmans, E., Holvoet, T., Verbaeten, P.: An Agent Design Method Promoting Separation Between Computation and Coordination. In: Proceedings of the 2004 ACM Symposium on Applied Computing (SAC 2004), ACM Press (2004) 456–461 11. Joosen, W.: Load Balancing in Distributed and Parallel Systems. PhD thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium (1996) 12. Wetherall, D., Legedza, U., Guttag, J.: Introducing new internet services: Why and how. IEEE Network, Special Issue on Active and Programmable Networks 12 (1998) 13. Campbell, A.T., De Meer, H.G., Kounavis, M.E., Miki, K., Vicente, J.B., Villela, D.: A survey of programmable networks. SIGCOMM Comput. Commun. Rev. 29 (1999) 7–23 14. Janssens, N., Michiels, S., Mahieu, T., Verbaeten, P.: Towards Transparent Hot-Swapping Support for Producer-Consumer Components. In: Proceedings of Second International Workshop on Unanticipated Software Evolution (USE 2003), Warsaw, Poland (2003) 15. Janssens, N., Michiels, S., Holvoet, T., Verbaeten, P.: A Modular Approach Enforcing Safe Reconfiguration of Producer-Consumer Applications. In: Proceedings of The 20th IEEE International Conference on Software Maintenance (ICSM 2004), Chicago Illinois, USA (2004) 16. Itoh, J., Yokote, Y., Tokoro, M.: Scone: using concurrent objects for low-level operating system programming. In: Proceedings of the tenth annual conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA’95), Austin, TX, USA, ACM Press, New York, NY, USA (1995) 385–398 17. Kiczales, G., Lamping, J., Lopes, C.V., Maeda, C., Mendhekar, A., Murphy, G.C.: Open implementation design guidelines. In: Proceedings of the 19th International Conference on Software Engineering (ICSE’97), Boston, MA, USA, ACM Press, New York, NY, USA (1997) 481–490 18. Kiczales, G., des Rivi`eres, J., Bobrow, D.G.: The Art of the Metaobject Protocol. MIT Press, Cambridge, MA (1991) 19. Szyperski, C.: Component Software: Beyond Object-Oriented Programming. ACM Press/Addison-Wesley Publishing Co., New York, NY, USA (1998) 20. Breslau, L., Jamin, S., Schenker, S.: Comments on the performance of measurement-based admission control algorithms. In: Proceedings of IEEE INFOCOM 2000. (2000) 1233–1242 21. Tanenbaum, A.S.: Computer Networks. Prentice Hall (1996) 22. Welsh, M.F.: An Architecture for Highly Concurrent, Well-Conditioned Internet Services. PhD thesis, University of California at Berkeley, Berkeley, CA, USA (2002) 23. Chen, H., Mohapatra, P.: Session-based overload control in QoS-aware web servers. In: Proceedings of IEEE INFOCOM 2002, New York, NY, USA (2002) 24. Chen, X., Mohapatra, P., Chen, H.: An admission control scheme for predictable server response time for web accesses. In: Proceedings of the tenth international conference on World Wide Web, ACM Press, New York, NY, USA (2001) 545–554 25. Abdelzaher, T.F., Lu, C.: Modeling and performance control of internet servers. Invited paper at 39th IEEE Conference on Decision and Control (2000) 26. Cherkasova, L., Phaal, P.: Session based admission control: a mechanism for improving the performance of an overloaded web server. Technical Report HPL-98-119, HP labs (1998) 27. Diao, Y., Gandhi, N., Hellerstein, J.L., Parekh, S., Tilbury, D.: Using mimo feedback control to enforce policies for interrelated metrics with application to the apache web server. In: Proceedings of Network Operations and Management Symposium, Florence, Italy (2002) 28. Kanodia, V., Knightly, E.: Multi-class latency-bounded web services. In: Proceedings of 8th IEEE/IFIP International Workshop on Quality of Service (IWQoS 2000), Pittsburgh, PA, USA (2000)
Platform for Adaptable Protocol Stacks
207
29. Lu, C., Abdelzaher, T., Stankovic, J., Son, S.: A feedback control approach for guaranteeing relative delays in web servers. In: Proceedings of the 7th IEEE Real-Time Technology and Applications Symposium (RTAS), Taipei, Taiwan (2001) 30. Michiels, S., Desmet, L., Janssens, N., Mahieu, T., Verbaeten, P.: Self-adapting concurrency: The DMonA architecture. In Garlan, D., Kramer, J., Wolf, A., eds.: Proceedings of the First Workshop on Self-Healing Systems (WOSS’02), Charleston, SC, USA, ACM SIGSOFT, ACM press (2002) 43–48 31. Steere, D.C., Goel, A., Gruenberg, J., McNamee, D., Pu, C., Walpole, J.: A feedback-driven proportion allocator for real-rate scheduling. In: Proceedings of the third USENIX Symposium on Operating Systems Design and Implementation (OSDI’99), New Orleans, LA, USA, USENIX Association, Berkeley, CA, USA (1999) 145–158 32. Hoebeke, J., Leeuwen, T.V., Peters, L., Cooreman, K., Moerman, I., Dhoedt, B., Demeester, P.: Development of a TCP protocol booster over a wireless link. In: Proceedings of the 9th Symposium on Communications and Vehicular Technology in the Benelux (SCVT 2002), Louvain la Neuve (2002) 33. Kramer, J., Magee, J.: The evolving philosophers problem: Dynamic change management. IEEE Transactions on Software Engineering 16 (1990) 1293–1306 34. McNamee, D., Walpole, J., Pu, C., Cowan, C., Krasic, C., Goel, A., Wagle, P., Consel, C., Muller, G., Marlet, R.: Specialization tools and techniques for systematic optimization of system software. ACM Transactions on Computer Systems 19 (2001) 217–251 35. Harold, E.R., Means, W.S.: XML in a Nutshell. Second edn. O’Reilly & Associates, Inc. (2002) 36. Michiels, S., Mahieu, T., Matthijs, F., Verbaeten, P.: Dynamic Protocol Stack Composition: Protocol Independent Addressing. In: Proceedings of the 4th ECOOP Workshop on Object-Orientation and Operating Systems (ECOOP-OOOSWS’2001), Budapest, Hungary, SERVITEC (2001) 37. Vandewoestyne, B.: Internet Telephony with the DiPS Framework. Master’s thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium (2003) 38. Janssen, G.: Implementation of IPv6 in DiPS. Master’s thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium (2002) 39. Larsen, T.: Implementation of a TCP booster in DiPS. Master’s thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium (2004) 40. Buggenhout, B.V.: Study and Implementation of a QoS router. Master’s thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium (2001) 41. Elen, B.: A flexible framework for routing protocols in DiPS. Master’s thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium (2004) 42. Vandebroek, K.: Development of an IPSec based VPN solution with the DiPS component framework. Master’s thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium (2004) 43. Cornelis, I., Weerdt, D.D.: Development of a stateful firewall with the DiPS component framework. Master’s thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium (2004) 44. Bjerke, S.E.: Support for Network Management in the DiPS Component Framework. Master’s thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium (2002) 45. Desmet, L.: Adaptive System Software with the DiPS Component Framework. Master’s thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium (2002) 46. Michiels, D.: Concurrency Control in the DiPS framework. Master’s thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium (2003) 47. Coster, W.D., Krock, M.D.: CoFraDeD: a Component Framework for Device Drivers. Technical report, internal use only, PIMC/K.U.Leuven, Dept. of Computer Science, Leuven, Belgium (2001)
208
Sam Michiels et al.
48. Michiels, S., Kenens, P., Matthijs, F., Walravens, D., Berbers, Y., Verbaeten, P.: Component Framework Support for developing Device Drivers. In Rozic, N., Begusic, D., Vrdoljak, M., eds.: International Conference on Software, Telecommunications and Computer Networks (SoftCOM). Volume 1., Split, Croatia, FESB (2000) 117–126 49. Hutchinson, N.C., Peterson, L.L.: The x-kernel: An architecture for implementing network protocols. IEEE Transactions on Software Engineering 17 (1991) 64–76 50. Bhatti, N.T.: A System for Constructing Configurable High-level Protocols. PhD thesis, Department of Computer Science, University of Arizona, Tucson, AZ, USA (1996) 51. Ballesteros, F.J., Kon, F., Campbell, R.: Off++: The Network in a Box. In: Proceedings of ECOOP Workshop on Object Orientation in Operating Systems (ECOOP-WOOOS 2000), Sophia Antipolis and Cannes, France (2000) 52. H¨uni, H., Johnson, R.E., Engel, R.: A framework for network protocol software. In: Proceedings of the tenth annual conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA’95), Austin, TX, USA, ACM Press, New York, NY, USA (1995) 358–369 53. Kohler, E.: The Click Modular Router. PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA (2001) 54. Montz, A.B., Mosberger, D., O’Malley, S.W., Peterson, L.L.: Scout: A communicationsoriented operating system. In: Proceedings of Fifth Workshop on Hot Topics in Operating Systems (HotOS-V), Orcas Island, WA, USA, IEEE Computer Society Press (1995) 58–61 55. Kiczales, G., Lamping, J., Menhdhekar, A., Maeda, C., Lopes, C., Loingtier, J.M., Irwin, J.: Aspect-Oriented Programming. In Aks¸it, M., Matsuoka, S., eds.: Proceedings of 11th European Conference on Object-Oriented Programming (ECOOP’97). Volume 1241 of LNCS. Springer-Verlag, Jyv¨askyl¨a, Finland (1997) 220–242 56. Desmet, L., Piessens, F., Joosen, W., Verbaeten, P.: Improving software reliability in datacentered software systems by enforcing composition time constraints. In: Proceedings of Third Workshop on Architecting Dependable Systems (WADS2004), Edinburgh, Scotland (2004) 32–36
C O C ON ES: An Approach for Components and Contracts in Embedded Systems Yolande Berbers, Peter Rigole, Yves Vandewoude, and Stefan Van Baelen DistriNet, Department of Computer Science, KULeuven Celestijnenlaan 200A, B-3001 Heverlee {Yolande.Berbers,Peter.Rigole,Yves.Vandewoude,Stefan.VanBaelen} @cs.kuleuven.ac.be
Abstract. This chapter presents C O C ON ES (Components and Contracts for Embedded Software), a methodology for the development of embedded software, supported by a tool chain. The methodology is based on the composition of reusable components with the addition of a contract principle for modeling nonfunctional constraints. Non-functional constraints are an important aspect of embedded systems, and need to be modeled explicitly. The tool chain contains CCOM, a tool used for the design phase of software development, coupled with D RACO, a middleware layer that supports the component-based architecture at run-time.
1
Introduction
Embedded systems are typically characterized by a specific functionality in a specific domain, where the software element is taking an increasingly important role. When developing embedded software, besides a range of software quality and stability aspects, one has to consider non-functional aspects and resource constraints. Embedded systems often have limited processing power, storage capacity and network bandwidth. A developer has to cope with these constraints and make sure that the software will be able to run on the constrained system. Often, embedded systems also have timing constraints on their computations. Today, embedded software is becoming complex; according to [8] the complexity of embedded-system applications is increasing with 140% a year. It is no longer feasible to build such systems from scratch. Reuse of existing software is becoming vital, especially in the light of today’s tight time-to-market demands in industry. Reuse should ensure that one can use validated software, only then will reuse result in shorter development time. To enable reuse, we have chosen a component-based approach for building embedded systems. Component software is quite common today in traditional applications. A large software system often consists of multiple interacting components. These components can be seen as large objects with a clear and well-defined task. Different definitions of a component exist; some see objects as components, while others define components as large parts of coherent code, intended to be reusable and highly documented. We base our definition on the one given by Szyperski [16], see section 2.1. However, many definitions focus only on the functional aspect of a component. For embedded software C. Atkinson et al. (Eds.): Component-Based Software Development, LNCS 3778, pp. 209–231, 2005. c Springer-Verlag Berlin Heidelberg 2005
210
Yolande Berbers et al.
the non-functional constraints cannot be discarded. Modeling these non-functional constraints explicitly enables one to safely reuse components in a design, while being sure that the non-functional constraints will be met. This is the major motivation for the work presented in this chapter. In the past few years, we have developed a methodology C O C ON ES (Components and Contracts for Embedded Software) for developing software for embedded systems, using a component oriented approach. Our approach uses contracts to model the nonfunctional constraints. The C O C ON ES methodology is backed by a tool chain that spans both the design-time and the runtime phase. CCOM (Component and Contract-Oriented Modeling) is a software design tool, enabling the developer to specify components and their interactions, including contracts for the non-functional constraints. D RACO (Distrinet Reliable and Adaptive COmponents) is a middleware layer that at runtime will support the component-oriented software architecture on which our methodology is based. It allows components to be created and destroyed, organizes the communication between components, and monitors the contracts defined at design time. Currently, C O C ON ES offers support for memory and bandwidth contracts. The C O C ON ES methodology is aimed at computationally powerful systems running complex software. Although we address resource-constrained systems, we do not claim to support hard RT applications, nor embedded systems with small footprints. To validate the various elements of our approach, we have applied our methodology using our tools to various smaller examples and to a fully fledged embedded case study. This chapter gives a comprehensive overview of our methodology, our componentoriented software architecture, the supporting tool chain and the case study. Elements of this work have been published on conferences and workshops: [10, 11, 17, 18]. The presented work was started during the SEESCOA project (Software Engineering for Embedded Systems, using a Component Oriented Approach), and is continued in the CoDAMoS project (Context Driven Adaptation of Mobile Services). This chapter is organized as follows: Section 2 gives an overview of our component architecture. Sections 3 and 4 respectively describe the design-time tool and the runtime tool that together support our methodology. Section 5 presents a validation of our methodology, component architecture and our tools, through a fully fledged embedded case study. We compare our work with related work in Sec. 6, and conclude in Sec. 7.
2
Core Concepts of the Proposed Component Architecture
This section describes the software architecture used in the Components and Contracts for Embedded Software methodology. Before giving details about C O C ON ES, we list the main strengths and characteristics of C O C ON ES: 1. 2. 3. 4.
C O C ON ES components are loosely coupled to facilitate reuse C O C ON ES components communicate through ports Connectors are used to connect communicating ports C O C ON ES defines constructs for composing applications out of components (a) Some constructs describe design-time compositions (blueprints) (b) Other constructs describe run-time compositions (instances)
C O C ON ES: An Approach for Components and Contracts in Embedded Systems
211
5. Contracts are used to specify and verify non-functional constraints (a) Contracts can be used to specify and verify non-functional aspects of compositions at design-time (b) Contracts can be used to verify the correct execution of compositions with regard to their resource use at run-time (c) Currently, C O C ON ES supports contracts for timing and for bandwidth requirements 6. C O C ON ES is a methodology, supported by a CASE tool and by a runtime environment. These tools are described in Sec. 3 and 4. 2.1
C O C ON ES Components
The most common definition of a component was given by Szyperski in [16]: A software component is a unit of composition with contractually specified interfaces and explicit context dependencies only. A software component can be deployed independently and is subject to composition by third parties. In C O C ON ES, a distinction is made between components and component blueprints. The latter are reusable static entities that only exist at design time and contain a complete description of the type of a component and its implementation (the code). In addition, component blueprints have a unique identifier, a version number and can be stored in a blueprint catalog. In contrast, the term component is reserved for a runtime component instance containing a certain runtime state. A C O C ON ES component complies with the definition of Szyperski: it is a reusable documented software entity, offering a coherent behavior and is used as a building block in applications. In addition, all inter-component communication is explicit and takes place by sending asynchronous messages through external interfaces. In general, interfaces are an abstraction of the behavior of a component and consist of a (subset of) interactions of that component, together with a set of constraints on when these interactions may occur. In C O C ON ES, a component interface consists of a group of messages that may be sent to or sent out from the component. These interfaces are formally specified using the port construct. 2.2
C O C ON ES Ports
A C O C ON ES port represents a bidirectional communication access point of a component, consisting of an interface for incoming messages and an interface for outgoing messages. As with components, the distinction is made between port blueprints and ports. In C O C ON ES, a port is specified on 3 levels: Syntactic Level: syntactic description of messages that can be sent and received. Semantic Level: pre and postconditions associated to the messages. Synchronization Level: description of the sequence in which the messages have to occur. At the moment of writing, only the syntactic and the synchronization levels have been formally worked out in detail. Two ports can only be interconnected if their associated interfaces match on all levels. The number of connections that can be made with
212
Yolande Berbers et al.
a port is specified using the MNOI (Maximum Number of Instances) property of a port. A major advantage of this restriction is that with this additional knowledge about the usage of the component, the developer can make more accurate QoS statements about the services the component delivers. Evidently, these restrictions are enforced at runtime by our execution environment (see Sec. 4). C O C ON ES supports 3 types of ports with respect to this MNOI: Single Port: A single port allows for one-on-one communications. This port is represented by a rectangle in our CCOM design tool (Fig. 1(a)). Multiport: One multiport of dimension n is conceptually identical to n single ports as it allows for n connectors to be attached simultaneously. Although messages can be sent to the entire multiport as such (in this case it behaves as a multicast port), the intended behavior of a multiport is to send messages to a specific index. Conceptually, a multiport is analogous to a call center: a connection is granted to a multiport unless it is already involved in its specified maximum number of connections. Once connected, conversation is one-to-one. As depicted in Fig. 1(b), the symbol of a multiport depends on its dimension. Multicast Port: A multicast port of dimension n is a single port that can have n connectors attached to it. Messages sent to a multicast port are always sent to all connectors attached to it. It is therefore not possible to differentiate between different receivers. Also, a multicast port can never receive messages. The graphical notation of a multicast port is a trapezium (Fig. 1(c)). The dimension for both multiports and multicast ports may be ∞. 2.3
C O C ON ES Connectors
Ports are connected using the connector construct. Compatibility of the port interfaces is checked both at design time and at runtime by the C O C ON ES tool chain. As such, connectors act as a kind of tunnel during message transmission. Connectors provide a layer of abstraction of component location since they can cross node boundaries when different components are spread over various nodes in a distributed system: components are unaware whether they are communicating with local or with remote components. At runtime, the underlying middleware system (see Sec. 4) takes care of this transparency. 2.4
C O C ON ES Contracts
Contracts are used in C O C ON ES to specify non-functional constraints. They allow a designer to impose constraints on the behavior of components and on the interactions between them. Contracts are attached by the designer when an application is constructed by composing components. They can be attached to all previously described constructs of the C O C ON ES architecture. A C O C ON ES contract is used both for annotation and for verification. It is an important aspect for a designer for documenting applications. Furthermore, contracts are used to verify the correctness of a program regarding their resource use. Some verifications can be done statically and are performed by CCOM, our component composition tool.
C O C ON ES: An Approach for Components and Contracts in Embedded Systems
213
(a) Single Port
(b) Multi Port
(c) Multicast Port Fig. 1. Different ports in C O C ON ES
Other verifications are done dynamically by a contract monitoring module in D RACO (see Sec. 4). Although the C O C ON ES contracts are a general construct, only timing contracts and bandwidth contracts have been worked out at the time of writing. Work is underway in order to support memory contracts as well. A C O C ON ES timing contract specifies and imposes the timing constraints to which communicating components have to adhere. Timing contracts can be attached both to connectors and to ports (to specify constraints concerning multiple connections – e.g. 500ms after the arrival of message m on port p1, a response must be broad-casted on port p2). Two types of timing contracts are currently supported: deadline timing contracts and periodicity timing contracts. A deadline timing contract imposes a constraint on the occurrence time of a particular event, given the occurrence time of an event that happened earlier. Possible events include the sending of a message, the receipt of a message, and the termination of the processing of a received message. A periodicity timing contract imposes a constraint on the periodic occurrence of a particular event.
214
Yolande Berbers et al.
C O C ON ES bandwidth contracts specify constraints concerning the flux density of the information exchanged between two ports. By expressing characteristics of the amount of information exchanged per time unit, we can deduct how suitable a connector is in a distributed component configuration. Therefore, these bandwidth contracts improve the self-containedness of components with regard to their use in a distributed system, making components location transparent. By performing a design-time analysis step that checks the feasibility of the component’s connectors over a given connection, the CCOM tool can reject or accept distributed configurations. In order to make bandwidth contracts easily understandable by application engineers using the CCOM tool, C O C ON ES bandwidth contracts are expressed in terms of concepts that are easy to reason about. Descriptions such as bits per second, available time frames, packets per time unit, etc. make no sense from a component’s point of view. Components send out messages at a certain rate, so quantitative aspects of their port’s communication behavior should be described in terms of message size (MS) and interval time (IT) between consecutive messages. C O C ON ES bandwidth contracts consist of constraints on several statistical characteristics derived from their message sizes and interval times. Figure 2 gives a conceptual illustration of the relationship between them and the bandwidth they use. Message Size (MS): the size of a message expressed in bytes. Interval Time (IT): the time between the beginning of the transmission of two consecutive messages.
Fig. 2. Message timing
2.5
C O C ON ES Compositions
Applications are constructed by creating component compositions. In this process, guided by the CCOM design tool (see Sec. 3), existing component blueprints can be loaded from a component repository and visually be connected to each other. Additional components can be created and key properties (such as the number of ports and their interfaces) can be created by the tool. The CCOM tool generates the necessary skeleton code that can be filled in by the developer. This process is bidirectional, in that the properties of an existing code can be retrieved from its source code. During the design of the application, the CCOM tool will check architectural consistency such as the compatibility of the connected ports on the syntactic and synchronization level, and, where possible, the feasibility of contracts. 2.6 Supporting Tools The entire methodology is supported by a tool chain. The CCOM (Component and Contract-Oriented Modeling) composition tool is a CASE design tool supporting the design and implementation of components and the construction of compositions. CCOM
C O C ON ES: An Approach for Components and Contracts in Embedded Systems
215
is capable of generating skeleton code that assists the developer in the implementation process. The code is then converted to standard Java with a preprocessor, and compiled. At runtime, D RACO (DistriNet Reliable And Adaptive Components) is the middleware system responsible for the correct execution of C O C ON ES compositions. We discuss CCOM and its code generation in Sec. 3. D RACO is discussed in Sec. 4.
3
CCOM Case Tool
The CCOM tool supports the development of applications using the architectural C O C ON ES concepts described in the previous section. First, CCOM assists the developer during the creation and development of: Component Blueprints: component blueprints can be graphically created, specified and stored into a repository for later use. Compositions: compositions can be constructed using components (either custom made or retrieved from a repository), connectors and contracts. In addition, the CCOM tool provides three views in order to decompose the structure of an application. These views allow the developer to focus on the issue at hand, and make sure that relevant constructs are easily accessible: Blueprint models: all component blueprints from which instances will be used in the composition are grouped in a blueprint model. Instance models: the instance model gives a structural overview of the application. It consists of connected component instances. Scenario models: a scenario model represents a specific action in the application. Focus is on non-functional constraints which are represented by contracts that can be attached to component instances, port instances and/or connectors. The following paragraphs elaborate on the different features of the CCOM tool, and how these features assist the developer in the construction of an application using the C O C ON ES methodology. 3.1 Developing Component Blueprints The development of a component blueprint comprises two steps. 1. The specification of the blueprints of both the component and its ports. 2. Providing an appropriate implementation of the messages a component can receive. This is achieved by filling in the skeleton code that was generated by the CCOM tool during the previous step. Once a component has been specified and implemented, it is transformed into an XML representation and stored in the component repository. Fig. 3 shows a screen shot of the tool during the development of a blueprint model. In our tool, blueprints are represented with dashed lines. Large rectangles are components, small ones are the ports attached to a component. Left in the figure is the repository of component blueprints, ordered hierarchically. On the right, a blueprint model
216
Yolande Berbers et al.
Fig. 3. A blueprint model in the CCOM tool
is shown that groups several component blueprint needed in a car regulator or cruise control application. The cruise control application is used to align the speed of a car to a target speed requested by the driver using a cruise control. The regulator makes use of a speedometer to read the vehicle speed. At a frequency of 2 Hertz, that is every 500 ms, the regulator should calculate the new speed of the car, and pass this on to the engine. The regulator should be stopped a.o. when the brakes are hit, the driver accelerates, the driver turns down the cruise control or when the speed drops below a certain limit. As discussed in Sec. 2.2, port interfaces are specified on multiple levels: syntactic, semantic (not yet implemented) and the synchronization level. The specification of these interfaces is shown in Fig. 4. The port in this figure is the SpeedUpdate port, which is part of the SpeedoMeter component. This component measures the speed of the car at a frequency of 2 Hertz. It can output its speed calculation to an unlimited number of other components. Amongst other, the speed is sent to the Input port of the SpeedDisplay component, which displays the speed on the dashboard of the car. Fig. 4(a) shows how the syntactic elements of the port blueprint can be filled in: every message can be described including its name, parameters and direction. The synchronization level is shown in Fig. 4(b). Here extended MSC’s (Message Sequence Charts) are being used to specify the interaction protocol. From the MSC it becomes clear that the SpeedUpdate port of the Speedometer component first receives the
C O C ON ES: An Approach for Components and Contracts in Embedded Systems
217
Start message and that it sends Update messages in a loop. The interaction stops when the port receives a Stop message. For each message interaction three hook types can be distinguished (see Fig. 4(b)): Send hooks (boxes with ’S’) representing the sending of messages. Receive hooks (boxes with ’R’) representing the reception of messages. End-of-activation hooks (boxes with ’X’) representing the termination of the processing of a received messages.
(a) Specification of the Syntactic Interface
(b) Specification of the Synchronization Interface
Fig. 4. Specification of the interfaces of a port blueprint
The interface of a port blueprint is used to verify whether it can be connected to other port blueprints: connecting ports is only possible if their interfaces match. The compatibility of ports can be verified both for the syntactic level and for the synchronization level. Using the specifications of a component and its ports, the CCOM generates the necessary skeleton code to be filled in by the developer (see Sec. 3.3). The synchronization between a component blueprint specification and its implementation occurs automatically by the CCOM tool. 3.2 Developing Compositions A composition can be built by retrieving component blueprints from the repository and loading them into the composition. Next, instantiations of these component blueprints
218
Yolande Berbers et al.
can be created and put into instance models. Connecting component instances is done by (1) instantiating the port instances that will communicate with each other and (2) creating a connector and attaching it to the created port instances. The scenario model, used in the following step, enables the software developer to impose non-functional constraints on parts of a composition by attaching contracts. A CCOM contract can be attached to one or more participants (component instances, port instances and/or connectors). The actual number and type of participants in a contract are dependent on the particular type of contract: a contract constraining the memory usage of a component is attached to a component instance, while contracts imposing timing constraints on the interaction between components are attached to the ports involved in the interaction. To make it more concrete, we elaborate here on the timing constraints. In CCOM, timing constraints are specified by means of templates with properties that have to be filled in by the application designer. Using templates makes it easier for a developer to specify constraints, without the need to learn a particular formal specification notation. In general, a CCOM timing contract specifies and imposes the timing constraints to which communicating components have to adhere. A timing contract is concerned with the communication between components. As such, it is straightforward to attach the timing contract to their ports, since these are the communication gateways between components. Furthermore, the communication between components is fully specified by the MSC of the involved ports. So this MSC plays a key role in the specification of a timing contract. A hook is a point on an MSC that represents a particular communication action: we distinguish a send hook, a receive hook and an end-of-activation (eoa) hook. A timing contract can be specified by means of these hooks. For example, a deadline contract could specify that the maximum duration between the send hook and the eoa hook may not exceed 500 milliseconds. A deadline contract has thus 3 parameters: a hook that starts the contract, a hook that ends the contract, and the maximum allowed time difference between the occurrences of these hooks. The second type of timing contract that CCOM supports is the periodicity timing contract. Fig. 5 shows how such a contract can be specified in our tool. The window in the tool shows the MSC, with the hooks in each message, and the message names. The four necessary parameters of the periodicity contract can be filled in at the bottom: in this example the contract starts at the sending of the start message, the periodic event is the reception of the update message, the contract ends with the sending of the stop message, and the period is 500 ms. 3.3 From Design to Execution: Code Generation To facilitate the development of components in C O C ON ES, the skeleton code of the component and its ports is generated by the CCOM tool. CCOM also ensures synchronization between a component blueprint specification and its implementation. For the further implementation of the component, a custom language is used. This language is a superset of Java that supports relevant component-based constructs. The code is automatically preprocessed, compiled and packaged into a binary that is ready for deployment in the runtime environment. As such, the design is directly used as input for the implementation.
C O C ON ES: An Approach for Components and Contracts in Embedded Systems
219
Fig. 5. The specification of a periodicity timing contract in the CCOM tool
We shortly illustrate the language with a small example consisting of two components that are used in our cruise control example: a SpeedoMeter and a Speed Display. The first component will measure the speed of a vehicle using an existing method (measureSpeed()) and will broadcast this information on its Out port. The second component prints out all values it receives through its Input port. The implementation of these two components is shown in Fig. 6. The implementation of a component starts with the component keyword. It consists of zero or more attributes (e.g. $activated in Fig. 6) and methods (not shown for these trivial components) and the description of its ports. The declaration of a multicast port is straightforward since it can not accept messages. It suffices to specify its existence using the multicastport keyword. Two parameters are required: the name of the port and the maximum number of simultaneous connections that are allowed. In the above example, the multicastport Out specifies an UNLIMITED number of simultaneous connections. As such, connections will never be refused at runtime. A multiport has a similar declaration, but since it can accept messages, these messages must be declared as well using the message keyword. The definition of a message in-
220
Yolande Berbers et al. component SpeedoMeter { protected boolean $activated = false; multicastport Out UNLIMITED; multiport Control 1 { message Start { $activated = true; } message Stop { $activated = false; } message Update { if ($activated) { message x = Speed; x::value = measureSpeed(); Out..x; } } } }
component SpeedDisplay { multiport Input 1 { message Update { System.out.println( "The speed of " + "the vehicle is: " + $$inMessage::value ); } } }
Fig. 6. Two simple components in C O C ON ES notation
cludes the code to be executed when the message is received on the port. New messages can be created using the statement: message varName = messageName; After its creation, this message can be sent out through any connected port. Message sending is asynchronous and as such, the sending of a message is always successful. If the component on the other side of the connector does not accept messageName, the system will return a CannotDeliverMessage message. Finally, two additional operators were added. Fields of a message are accessed using the :: operator: x::value = measureSpeed(); The .. operator is used on a port to send out a given message: Out..x; Inside the implementation body for a message, the implicit parameter $$inMessage refers to the received message. In the Update message of the SpeedDisplay component for instance (see Fig. 6), a parameter is retrieved from the incoming message and displayed on screen.
4
D RACO Runtime System
Next to the CCOM tool, the tool chain supporting the C O C ON ES methodology also contains a runtime environment capable of executing C O C ON ES compositions: the D RACO component system. D RACO is a middleware system that provides the underlying infrastructure of an execution environment for component compositions. The runtime system is highly modularized. As such, it can be configured and targeted to specific applications, while guaranteeing a minimal memory footprint. The D RACO system is implemented in Java and targeted towards more powerful embedded systems such as an IPAQ or a robot used in manufacturing. Very small embedded systems or systems with
C O C ON ES: An Approach for Components and Contracts in Embedded Systems
221
hard real-time deadlines, such as often found in the automotive world are not the focus of the D RACO middleware platform. The C O C ON ES component design-methodology can however, be used on such systems as well. The architecture of D RACO is depicted in Fig. 7. It consists of a core system which provides the minimal functionality to execute C O C ON ES applications. The most important tasks of the core system are as follows: 1. 2. 3. 4.
Management of component instances, connectors and contracts. Support for introspection and naming. Abstraction of the underlying hardware and OS. Routing and scheduling messages sent between components.
The core system consists of 5 units and its footprint is less than 65kB, allowing it to be installed on embedded devices with stringent resource constraints. At startup, the core is dynamically assembled using the builder pattern [7]. Since the builder reads an XML file describing which implementation to use for each of the core units, modifying or replacing one core unit has no impact whatsoever on the rest of the system. The ability to easily customize its core makes D RACO an excellent platform for various assessments (e.g. replacing the scheduler would allow us to investigate the influence of the scheduling algorithm on the execution of a component based application, . . . ). Furthermore, it allows for further customization depending on the target platform. Once instantiated, the core is considered to be fixed. In order to keep the complexity (and size) of D RACO sufficiently low, no attempt was made to allow for unanticipated modifications of the D RACO core at runtime. The 5 core modules are: Component Manager: is responsible for loading component blueprints, creating instances and removing them. It also keeps a repository of created component instances, with a basic directory mechanism mapping names onto component instances. Connector Manager: is a repository containing the connectors that exist between component instances in a composition. Each connector refers to the ports to which it is connected. Each port has a send message handler queue and a receive message handler queue associated to it. Message Manager: this module is responsible for delivering messages sent out by components. By means of the Connector Manager it retrieves the send message handler queue of the sending port and the receive message handler queue of the receiving port. The messages then traverse the send message handler queue of the sending port and arrive at the Scheduler. Scheduler: accepts messages coming from a send message handler queue and schedules them for delivery to the appropriate message handler queue. Module Manager: responsible for loading and unloading extension modules, which can be used to extend the functionality of the D RACO component system. As shown on top of Fig. 7, each of the core modules exports a lightweight component-interface. These appear as components in the runtime system, and can be used by application components to query or configure the underlying middleware environment.
222
Yolande Berbers et al.
Fig. 7. Overview of the D RACO architecture
Interaction between D RACO and the user is handled by an external shell, which is provided to D RACO at startup and resides in a different binary. By separating the user interaction from D RACO , it is possible to use different interaction shells depending on the situation. An interactive shell with scripting capabilities is available for use during development on a high performance desktop machine, while a thin layer with minimal functionality can be used when resource consumption is an issue. Although the functionality of the core system is relatively limited, D RACO offers an infrastructure which allows the addition of functionality that may not always be required: extension modules. These modules can be loaded and unloaded at runtime by the module manager. The following extension modules have been worked out: The distribution module (DM): this module adds distribution functionality to the core platform in a complete transparent way. It introduces the notion of proxy components, similar to the proxy pattern defined in [7]. These proxy components are lightweight components that represent remote components. As such, they offer the same ports and exactly the same semantic information. The DM is responsible for (1) setting up and tearing down connections between remote D RACO systems, and (2) responsible for managing proxy components and generating them based on real components. In D RACO , a connection is an abstract concept that can be implemented by any kind of physical wired or wireless connection. No stubs or other design-time entities need to be generated in advance in order to make components communicate in a distributed way. Instead, proxies are created dynamically on an as-needed basis. This incorporates a considerable advantage over traditional approaches that need additional constructs (e.g. the subs an skeletons used by Java RMI). The contract monitor: this module checks whether contracts are violated at runtime. Depending on the type of the contract, the monitoring differs. For the timing contracts, messages are intercepted and time stamped by an event gathering unit. These time stamps are used by the event processing unit to verify the timing contract. The unit responsible for the verification of the contract, can be moved to another node to minimize intrusion on the target platform. Currently, contract violations are reported offline: a developer can analyze the occurred violations after an application’s execution. In the future, a contract violation will be reported to the application, which then must take corrective measures.
C O C ON ES: An Approach for Components and Contracts in Embedded Systems
223
The resource manager: this extension module is responsible for the negotiation of contracts with applications when these are started. The resource manager knows what contracts are currently active, and can then accept new contracts in function of available resources. The live update module (LUM): allows components to be replaced at runtime, even while they are part of a running application. The LUM achieves this by (1) putting the component to be replaced in an inactive state by temporarily holding back the messages, (2) instantiating the new version of the component, (3) possibly transferring the internal state from the old components to the new components, using routines provided by the developers of the components, (4) rewiring the connectors of the old version to the new component, (5) activating the new component by deblocking the messages that were held back in step 1 (6) removing the old component. Since the exact tasks and thus requirements of extension modules are unknown in advance, they can make use of reflection mechanisms and may subscribe to one of the many events triggered by the core system. In addition, they can interfere with the message flow and interact with the delivery of messages. In D RACO , messages are sent asynchronously between components. The path followed by a message traveling from component A to component B consists of 3 major parts (see Fig. 8(a)): the sending message chain, the scheduler and the receiving message chain. Each extension module can add message handlers to these message chains to implement the features they want. The sending message chain comprises the journey of a message from the moment it is sent through the port of the originating component until it is scheduled for execution by the scheduler. Its detailed implementation in D RACO is shown in Fig. 8(b). In the first step, the component contacts the port through which the message will be sent. Since ports are implemented as inner classes in D RACO , this is achieved with a local call (arrow 1). The port will pass on the message to the message manager (arrow 2) which will retrieve the attached connector from the connector manager (arrow 3). Each connector is associated with 4 message handlers (the first handler of both the sending and receiving chain of each direction: component A to B and vice versa). The message manager retrieves the two handlers associated with the current message direction. The receiving message handler is used for the delivery of the message after it has been scheduled for execution by the scheduler (see further). It is therefore simply passed on with the message to the sending message handler (arrow 4). This sending message handler is the first (and in the most basic scenario also the last) handler in a chain of message handlers. Each handler in the chain has the ability to intercept and modify the message, and will then forward it to the next handler in the chain (arrow(s) 5). The last handler is responsible for the delivery to the scheduler (arrow 6). After receiving both the message and its associated receiving message handler, the scheduler queues the message until it is ready for execution. The exact queuing mechanism depends on the scheduler that is used, but it is the responsibility of the scheduler to preserve the order of messages over a given connector. When the scheduler has selected a message for delivery, it allocates a thread for the execution of this message, and passes on the message to its receiving message handler.
224
Yolande Berbers et al.
(a) Schematic overview of message journey
(b) Details of the sending message chain
Fig. 8. Message Delivery in D RACO
As shown in Fig. 8(a), the principle behind the receiving sequence is identical to the sending sequence: there is a chain of message handlers that process the message (e.g. the timing monitor can read out the time stamp added to the message by his peer in the sending message chain) and subsequently pass it on to the next handler in the chain. The last handler delivers the message to the port at the end of the connector. This port will then dispatch the message to the actual method associated to the message. After message execution, control is returned to the scheduler.
5
Extensive Case Study
Several embedded applications have been developed using the C O C ON ES design methodology and the supporting tool chain. One of these is the car regulator introduced in Sec. 3. This application has been used as a case study to define and assess timing constraints. However, as our implementation was not run on a car, and not on embedded hardware, it was in our eyes still a toy application. We then developed a full fledged embedded application, with the specific intention to validate our design methodology in a larger application, to verify the specific advantages of our component architecture and to test the re-usability of our components and our designs. This led to a camera surveillance system of which several variations were designed and implemented. The surveillance system can be used for security related purposes such as physical intrusion detection and registration of activity in home and office buildings. A PC/104 embedded computer (holding the operating system, a Java virtual machine, our component runtime and the test case code) was connected to a DFW-VL500 firewire digital camera. It was linked over a TCP/IP network with a desktop PC serving as a storage and control station. The two bottom boxes in Fig. 9 (generated by the CCOM tool) give an overview of the component compositions in the surveillance case. Device boundaries are indicated by surrounding boxes. The central Camera component on the embedded device (PC
C O C ON ES: An Approach for Components and Contracts in Embedded Systems
225
Fig. 9. Overview of the Camera Surveillance Case
104) continuously grabs images from the camera at a predefined rate and multicasts them towards the MotionDetector and the Switch component. The motion detector analyzes the images and produces an alarm-start output message when motion is detected. The switch, receiving the message from the motion detector, forwards the video stream towards its output port until the alarm-stop message is received, meaning that motion has ceased. The suspicious images are sent to the StorageController component, which is located on the desktop PC. Proxy components are introduced for handling the remote communication transparently. The StorageController Proxy and the Switch Proxy allow for remote communication between Storage Controller and Switch. The Storage component, which encapsulates database access, eventually stores the images. The core application, as just described, was developed in CCOM and executes on top of D RACO , running a distribution module. In order to demonstrate reuse, several variations and extensions of the core application were developed. In one of the variations the motion detector component and the switch component were replaced by a component that passes one image out of 20 to the storage controller. This was a straight forward change, and all the other components could be reused without any change. Subsequently, we wanted to take reuse one step further and ported our component system on an iPAQ. We then designed BlueGuard which provides security guards with more information about the safety inside the building. This extension allows the guards to query the recorded images using a handheld device when they are in the neighborhood of an observation station. For this extension a BlueGuardClient component was created that is available on the handheld device.
226
Yolande Berbers et al.
(a) Viewing events
(b) Viewing an image
(c) Camera Control Fig. 10. BlueGuard client
Furthermore, the distribution module was extended with Bluetooth [3] connection support to enable short-range wireless access to our observation stations. The Blue GuardClient component, instantiated on the iPAQ handheld devices (see the top box of Fig. 9), provides the user interface used by the security guards. This component can be connected to a StorageController component for querying purposes and to several Camera components for adjusting camera settings such as focus and zoom. As depicted on the example setup (Fig. 9), the BlueGuardClient may be connected to a StorageController component through the embedded PC 104 module using proxy components for routing their messages. Fig. 10(a) shows the tab view of all recent events and the Bluetooth connection status. The image associated with each event can be requested and is shown in the view tab (Fig. 10(b)). The third tab (Fig. 10(c)) allows for changing the brightness, sharpness and zoom parameters of the camera. An in-depth description of this BlueGuard extension can be found in [11]. Both timing contracts and bandwidth contracts have proven their advantages in the surveillance case. In the distributed setup, the bandwidth contract attached to the connection between the Switch and StorageController component imposed limitations on the rate at which the camera’s images could be sent. At the resolution of 320×240 in uncompressed RGB color format, we were limited to sending no more than 4 images per second over a 10Mb Ethernet connection. The message size
C O C ON ES: An Approach for Components and Contracts in Embedded Systems
227
is 230400 bytes and the interval time 250 milliseconds. The Bluetooth communication between the handheld device and PC 104 module was fast enough for transmitting requested images to the handheld device because of the long interval times between consecutive images (only one image at the time is viewed). However, the timing contract describing the time between sending the image and receiving it only just met the timing requirements due to the rather slow (0.4 Mbps) throughput. The periodicity contract attached to the image sending hook of the camera was defined with a period of 250 milliseconds, fitting the requirements imposed by the network connection. Contract monitoring proved that they were always adhered by the components during the test period. Our case study allowed us also to validate the live update possibilities of our architecture. To do so, we have replaced the MotionDetector component by a newer version, using a different algorithm, while the application was running. Both the old and the new version compare every image with the previous one. The state of this component consists of the current image. It was this state that was asked to the old component and fed to the new component. The application as a whole continued to work fine. No images were lost. The core camera surveillance system, its variations and its extensions, have proven the soundness of our methodology and our approach where components and contracts play a central role, have shown that our architecture is suitable for embedded platforms, and have validated our tool chain.
6
Relation to the State-of-the-Art
In [2], Beugnard et al. argue that component interfaces should be specified on 4 levels: basic, behavioral, synchronization and Quality of Service. This is similar to the specification of port interfaces in C O C ON ES. The last level of the port specification was deliberately omitted, since Quality of Service properties are specified using contracts. C O C ON ES contracts are more general and have a broader scope. Therefore, they can be attached to components or connectors as well (e.g. memory contracts are likely to be attached to components), and are not tied exclusively to a port. The state-of-the-art of component based development is too large to be presented here. We just contrast our work with alternative component-based frameworks specifically targeted to embedded systems, which have been developed in recent years. In Koala [19], components are implemented in C and specify provides and requires interfaces that cannot be changed. Interfaces can be connected if the provided interface implements at least all methods for the required interface. The binding of these interfaces is made at the product level. All external information (including memory management) must be retrieved through require interfaces. Other embedded component systems worth mentioning are PECOS [9, 20] (a model for field-devices with the emphasis on formal execution models using Petri Nets), PortBased Objects [14, 15] (used in the Chimera RT operating system), VEST (A tool set for constructing and analyzing component based embedded systems) [13] and DESS (a generic component architecture and notation for embedded software development) [5, 6].
228
Yolande Berbers et al.
C O C ON ES has several aspects in common with the Ptolemy II project [1]. Ptolemy II studies the modeling, simulation and design of concurrent, real-time, embedded systems. It specifically focuses on the assembly of concurrent components and the use of models of computation that regulates the interaction between embedded components. One of their major problem areas are the use of heterogeneous mixtures of models of computation, including discrete-event systems, data flow, process networks, synchronous and reactive systems, and communicating sequential processes. A number of C O C ON ES concepts are inspired by ROOM [12] and UML-RT, more specifically components, ports, and connectors. Although initially intended for designing and building telecommunication systems, the ROOM methodology can also be used for the design of other types of embedded systems. ROOM designs contain primarily actors, ports, bindings and state machines. The ROOM methodology has some new and interesting ideas: it introduces thread encapsulation which hides the internal thread mechanisms; it offers and alternative way of connecting software components by means of bindings; the idea of port protocols is an advantage since it enforces a designer to only connect compatible ports; and it offers executable models and the ability to generate code by putting code into transitions of the state machines. ROOM however lacks a consistent way to annotate time in designs. In general, ROOM has no support for the annotation of non-functional constraints, like for instance memory and bandwidth constraints. The Fractal component model [4] in particular is very interesting because it is in several aspects based on the same principles as C O C ON ES’s component model. Support for extension and adaptation is its prime concern and it is aiming at a broad range of host devices from embedded systems to application servers. It has, however, a less strictly defined component definition than C O C ON ES’s components and provides a language independent interface definition for its components. This interface definition can be used to connect components written in different languages. In addition, there is also support for composite components and it allows (sub)components to be shared between components. In the Fractal model, connections between two or several components are called bindings. There are two types of bindings, primitive bindings and composite bindings. Primitive bindings are language-level bindings (synchronous or asynchronous) whereas composite bindings are a composition of primitive bindings and components. Using this definition of inter-component bindings, flexible distributed applications can be built. Each component has a component controller that can control all internal behavior of the component such as affecting operation invocations, influencing the behavior of internal components, creating new components, etc.. The Javabased Fractal framework that supports the Fractal component model consists of a core and several extensions, called increments. Like in DRACO , these increments can add new functionality to the core. The core offers a basic API for performing actions such as creating components, adding bindings between components and managing the content of components. Some of the increments under development allow for component bootstrapping, component distribution (distributed bindings), mobility and protection (resource management and distribution).
C O C ON ES: An Approach for Components and Contracts in Embedded Systems
7
229
Conclusion
This chapter has given an overview of C O C ON ES, a methodology and architecture for developing software for embedded systems, using a component oriented approach, where contracts are used to model the non-functional constraints. C O C ON ES is backed by a tool chain that spans both the design-time and the runtime phase. Contracts can be specified at design-time and can be checked both at design-time and at runtime. The runtime environment offers beside its core system support for distribution and live updates. A full fledged embedded case study has proven the soundness of our methodology, the applicability of our software architecture in the domain of embedded systems and the robustness of our runtime environment. In conclusion we can say that our methodology is original in that it is supported by a tool chain, where both functional and non-functional constraints are checked, and this both at design-time and at runtime. These checks are generated by the tools, based on the design made by the developer. The key contribution of our work is therefore the integrated tool chain that spans design-time and runtime, and covers functional and non-functional constraints. More detailed description of this work can be found in [10, 17, 18]. C O C ON ES is an ongoing project. We are currently on different tracks to extend and improve our methodology and architecture. (1) We are extending the contracting framework to support general resource contracts in an extensible way. This framework will make it easy to add new types of contracts to the tool chain and monitoring mechanisms for new contracts to the D RACO runtime system. One track that is already being explored is the deployment of memory contracts and mechanisms to monitor a component’s memory use. In addition, the new contract framework will allow resource contracts to be negotiable at deployment time and even renegotiable when resource availability in the system changes. This will enable flexible runtime deployment of programs on embedded systems without endangering the system’s robustness and reliability. (2) In the future, our live updating tool will be able to automatically generate state transfer functions in order to enhance the support for updating components at runtime. (3) With the addition of the notion of context, our future runtime system will be able to discover, retrieve and process context information to improve its application’s capabilities to react to environmental conditions and events. Such context information could include information about the user of the system (such as his preferences), environmental information (such as temperature, location and time), platform information (such as CPU and memory information) and service information (available software services). (4) Furthermore, research is being conducted on runtime software adaptation and reconfiguration mechanisms that respect existing resource contracts. This way, programs can -guided by the runtime system- adjust its configuration and composition at runtime according to new contextual conditions.
Acknowledgment Both projects introduced, SEESCOA and CoDAMoS, are funded by the Belgian Institute for the Promotion of Innovation by Science and Technology in Flanders.
230
Yolande Berbers et al.
References 1. Philip Baldwin, Sanjeev Kohli, Edward A. Lee, Xiaojun Liu, and Yang Zhao. Modeling of sensor nets in ptolemy ii. In Proceedings of Information Processing in Sensor Networks (IPSN), Berkeley, CA, USA, April 26-27 2004. 2. Antoine Beugnard, Jean-Marc J´ez´equel, No¨el Plouzeau, and Damien Watkins. Making components contract aware. Computer, 32(7):38–45, July 1999. 3. Bluetooth. Bluetooth wireless protocol, 2003. http://www.bluetooth.com/ and http://www.bluetooth.org/. 4. Eric Bruneton, Thierry Coupaye, and Jean-Bernard Stefani. Recursive and dynamic software composition with sharing. In Proc. of the Seventh International Workshop on ComponentOriented Programming, Malaga, Spain, 2002. 5. DESSteam. Definition of components and notation for components. Technical report, December 2001. http://www.dess-itea.org. 6. DESSteam. Timing, memory and other resource constraints. Technical report, 2001. http://www.dess-itea.org. 7. Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison Wesley, 1994. 8. ITRS. International technology roadmap for semiconductors. Internet, 2004. http://public.itrs.net. 9. Oscar Nierstrasz, Gabriela Ar’evalo, St´ephane Ducasse, Roel Wuyts, Peter M¨uller, C. Zeidler, Thomas Genssler, and R. van den Born. A component model for field devices. In In proceedings of the IFIP/ACM working conference on Component Deployment, Berlin, 2002. 10. Peter Rigole, Yolande Berbers, and Tom Holvoet. Design and run-time bandwidth contracts for pervasive computing middleware. In C. Urarahy, A. Sztajnberg, and R. Cerqueira, editors, Proceedings of the first International Workshop on Middleware for Pervasive and Ad Hoc Computing (MPAC)., pages 5–12, Rio De Janeiro, Brazil, June 2003. 11. Peter Rigole, Yolande Berbers, and Tom Holvoet. Bluetooth enabled interaction in a distributed camera surveillance system. In Proceedings of the Thirty-Seventh Annual Hawaii International Conference on System Sciences, pages 1–10. IEEE Computer Society, 2004. 12. B. Selic, G. Gullekson, and P. Ward. Real-Time Object Oriented Modeling. Wiley, 1994. ISBN 0471599174. 13. John A. Stankovic. VEST — A toolset for constructing and analyzing component based embedded systems. Lecture Notes in Computer Science, 2211:390–402, 2001. 14. David B. Stewart and Pradeep K. Khosla. The chimera methodology: Designing dynamically reconfigurable and reusable real-time software using port-based objects. International Journal of Software Engineering and Knowledge Engineering, 6(2):249–277, June 1996. 15. David B. Stewart, Richard A. Volpe, and Pradeep K. Khosla. Design of dynamically reconfigurable real-time software using port-based objects. Software Engineering, 23(12):759–776, 1997. 16. Clemens Szyperski. Component Software: Beyond Object-Oriented Programming. AddisonWesley, November 2002. 17. David Urting, Stefan Van Baelen, Tom Holvoet, Peter Rigole, Yves Vandewoude, and Yolande Berbers. A tool for component based design of embedded software. In J. Noble and J. Potter, editors, Proceedings of 40th International Conference on Technology of Object-Oriented Languages and Systems (Tools Pacific 2002), volume 10, pages 159–168, Sydney, Australia, February 2002. Australian Computer Society Inc. 18. David Urting, Tom Holvoet, and Yolande Berbers. Embedded software development: Components and contracts. In T. Gonzalez, editor, Proc. of the IASTED Conference on Parallel and Distributed Computing and Systems, pages 685–690, 2001.
C O C ON ES: An Approach for Components and Contracts in Embedded Systems
231
19. Rob van Ommering. Building Reliable Component-Based Software Systems, chapter The Koala Component Model. Artech House Publishers, July 2002. 20. Michael Winter, Thomas Genssler, Alexander Christoph, Oscar Nierstrasz, St´ephane Ducasse, Roel Wuyts, Gabriela Ar´evalo, Peter M¨uller, Chris Stich, and Bastiaan Sch¨onhage. Components for embedded software - the pecos approach. In Proceedings of the Second International Workshop on Composition Languages, Malaga, Spain, June 2002.
Adopting a Component-Based Software Architecture for an Industrial Control System – A Case Study Frank L¨uders1, Ivica Crnkovic1, and Per Runeson2 1
Department of Computer Science and Engineering M¨alardalen University Box 883, SE-721 23 V¨aster˚as, Sweden {frank.luders,ivica.crnkovic}@mdh.se 2 Department of Communication Systems, Lund University Box 118, SE-221 00 Lund, Sweden [email protected]
Abstract. This chapter presents a case study from a global company developing a new generation of programmable controllers to replace several existing products. The system needs to incorporate support for a large number of I/O systems, network types, and communication protocols. To leverage its global development resources and the competency of different development centers, the company decided to adopt a component-based software architecture that allows I/O and communication functions to be realized by independently developed components. The architecture incorporates a subset of a standard component model. The process of redesigning the software architecture is presented, along with the experiences made during and after the project. An analysis of these experiences shows that the component-based architecture effectively supports distributed development and that the effort required for implementing certain functionality has been substantially reduced while, at the same time, the system’s performance and other run-time quality attributes have been kept on a satisfactory level.
1
Introduction
Component-based software engineering (CBSE) denotes the disciplined practice of building software from pre-existing smaller products, generally called software components, in particular when this is done using standard or de-facto standard component models [7, 16]. The popularity of such models has increased greatly in the last decade, particularly in the development of desktop and server-side software, where the main expected benefits of CBSE are increased productivity and timeliness of software development projects. The last decade has also seen an unprecedented interest in the topic of software architecture [2, 15] in the research community as well as among software practitioners. CBSE has notable implications on a system’s software architecture, and an architecture that supports CBSE, e.g. by mandating the use of a component model, is often called a component-based software architecture. This chapter presents an industrial case study from the global company ABB, which is a major supplier of industrial automation systems, including programmable controllers. The company’s new family of controllers is intended to replace several existing C. Atkinson et al. (Eds.): Component-Based Software Development, LNCS 3778, pp. 232–248, 2005. c Springer-Verlag Berlin Heidelberg 2005
Adopting a Component-Based Software Architecture for an Industrial Control System
233
products originally developed by different organizational units around the world, many of which were previously separate companies, targeting different, though partly overlapping, markets and industries. As a consequence, the new controller products must incorporate support for a large number of I/O systems, network types, and communication protocols. To leverage its global development resources and the competency of different development centers, ABB decided to adopt a component-based software architecture that allows I/O and communication functions to be realized by independently developed components. This chapter is organized as follows. The remainder of this section describes the questions addressed by the case study and motivates the choice of method. Section 2 presents the context of the case study, including a description of the programmable controller and its I/O and communication functions as well as the organizational and business context. The process of componentizing the system’s software architecture is presented in Section 3. Section 4 analyzes the results of the project and identifies some experiences of general interest. A brief overview of related work is provided in Section 5. Section 6 presents conclusions and some ideas for further work. 1.1 Questions Addressed by the Case Study The general question addressed by the case study is what advantages and liabilities the use of a component-based software architecture entails for the development of an industrial control system. Due to the challenges of the industrial project studied, the potential benefit that a component-based architecture makes it easier to extend the functionality of the software has been singled out for investigation. More specifically, the project allows the two following situations to be compared: – The system has a monolithic software architecture and all functionality is implemented at a single development center. – The system has a component-based software architecture and pre-specified functional extensions can be made by different development centers. By pre-specified functional extensions we mean extensions in the form of components adhering to interfaces already specified as part of the architecture. This fact is presumed to be significant, while the fact that the functionality in question happens to be related to I/O and communication is not. In addition to the question of whether the component-based architecture reduces the effort required to make such functional extension, the study also addresses the questions of whether any such reduction is sufficient to justify the effort invested in redesigning the architecture and after how many extensions the saved effort surpasses the invested effort. Since the system in question is subject to hard real-time requirements, the potential effect of the architecture on the possibility of satisfying such requirements is also studied. Finally, the architecture’s possible effect on performance is analyzed. 1.2 Case Study Method The research methodology used is a flexible design study, conducted as a participant observation case study [14]. The overall goal of the study is to observe the process of
234
Frank L¨uders, Ivica Crnkovic, and Per Runeson
componentization, and evaluate the gains of a component-based architecture. It is not possible to demarcate such a complex study object in a fixed design study. Neither is there an option to isolate and thereby study alternative options. Instead we address the problem using a case study approach, where one study object is observed in detail and conclusions are drawn from this case. In order to enable best possible access to the information on the events in the case, the observations are performed by an active participant. The main researcher is also an active practitioner during the study. As a complement, interviews are conducted after the case study to collect data on costs and gains of the component approach, thus conducting data triangulation. Participatory research always includes a threat with respect to researcher bias. In order to increase the validity of the observations, a researcher was introduced late in the research process as a ”critical friend”. The long researcher involvement in this case study reduces on the other hand the threat with respect to respondent bias. Case studies are by definition weak with respect to generalization, in particular when only a single case is observed. However, to enable learning across organizational contexts, we present the context of the case study in some detail. Hence, the reader may find similarities and differences compared to their environment, and thus judge the transferability of the research.
2
Context of the Case Study
Following a series of mergers and acquisitions, ABB became the supplier of several independently developed programmable controllers for the process and manufacturing industries. The company subsequently decided to continue development of only a single family of controllers for these and related industries, and to base all individual controller products on a common software platform. To be able to replace all the different existing products used in different regional areas and industry sectors, these controllers needed to incorporate support for a high number of communication protocols, network types, and I/O systems, including legacy systems from each of the previously existing controllers as well as current and emerging industry standards. A major challenge in the development of the new controller platform was to leverage the software development resources at different development centers around the world and their expertise in different areas. In particular, it was desirable to enable different development centers to implement different types of I/O and communication support. Additional challenges were to make the new platform sufficiently general, flexible, and extensible to replace existing controllers, as well as to capture new markets. The solution chosen to meet these challenges was to base the new platform on one of the existing systems while adopting a component-based software architecture with well-defined interfaces for interaction between the main part of the software and I/O and communication components developed throughout the distributed organization. As the starting point of the common controller software platform, one of the existing product lines was selected. This system is based on the IEC 61131-3 industry standard for programmable controllers [8]. The software has two main parts 1) the ABB Control
Adopting a Component-Based Software Architecture for an Industrial Control System
235
Builder, which is a Windows application running on a standard PC, and 2) the system software of the ABB controller family, running on top of a real-time operating system (RTOS) on special-purpose hardware. The latter is also available as a Windows application, and is then called the ABB Soft Controller. A representative member of the ABB controller family is the AC 800M modular controller. This controller has two built-in serial communication ports as well as redundant Ethernet ports. In addition, the controller has two expansion buses. One of these is used to connect different types of input and output modules through which the controller can be attached to sensors and actuators. The other expansion bus is used to connect communication interfaces for different types of networks and protocols. The picture in Fig. 1 shows an AC 800M controller equipped with two communication interfaces (on the left) and one I/O module (on the right).
Fig. 1. An AC 800M programmable controller
The Control Builder is used to specify the hardware configuration of a control system, comprising one or more controllers, and to write the programs that will execute on the controllers. The configuration and the control programs together constitute a control project. When a control project is downloaded to the control system the system software of the controllers is responsible for interpreting the configuration information and for scheduling and executing the control programs. Fig. 2 shows the Control Builder with a control project opened. The project consists of three structures, showing the libraries used by the control programs, the control programs themselves, and the hardware configuration, respectively. The latter structure is expanded to show a configuration of a single AC 800M controller, equipped with an analogue input module (AI810), a digital output module (DO810), and a communication interface (CI851) for the PROFIBUS-DP protocol [10]. To be attractive in all parts of the world and a wide range of industry sectors, the common controller must incorporate support for a large number of I/O systems, communication interfaces, and communication protocols. During the normal operation of a controller, i.e. while the control programs are not being updated, there are two princi-
236
Frank L¨uders, Ivica Crnkovic, and Per Runeson
Fig. 2. The ABB Control Builder
pal ways for it to communicate with its environment, denoted I/O (Input/Output) and variable communication, respectively. To use I/O, variables of the control programs are connected to channels of input and output modules using the program editor of the Control Builder. For instance, a Boolean variable may be connected to a channel on a digital output module. When the program executes, the value of the variable is transferred to the output channel at the end of every execution cycle. Variables connected to input channels are set at the beginning of every execution cycle. Real-valued variables may be attached to analogue I/O modules. Fig. 3 shows the program editor with a small program, declaring one input variable and one output variable. Notice that the I/O addresses specified for the two variables correspond to the position of the two I/O modules (AI810 and DO810, respectively) in Fig. 2. Variable communication is a form of client/server communication and is not synchronized with the cyclic program execution in the way that I/O is. A server supports one of several possible protocols and has a set of named variables that may be read or written by clients that implement the same protocol. An ABB Controller can be made a server by connecting program variables to so-called access variables in a special section of the Control Builder (see Fig. 2). Servers may also be other devices, such as field-bus devices [10]. A controller can act as a variable communication client by using special routines for connecting to a server and reading and writing variables via the connection. Such routines for a collection of protocols are available in the Communication Library, which is delivered with the Control Builder. The communication between a client and a server can take place over different physical media, which, in the case of the AC 800M, are accessed either via external communication interfaces or the built-in Ethernet or serial ports.
Adopting a Component-Based Software Architecture for an Industrial Control System
237
Fig. 3. The program editor of the ABB Control Builder
Control projects are usually downloaded to the controllers via a TCP/IP/Ethernetbased control network, which may optionally be redundant. A control project may also be downloaded to a single controller via a serial link. In both cases, downloading is based on the Manufacturing Message Specification (MMS) protocol 5, which also supports run time monitoring of hardware status and program execution. The system software of a controller, including the RTOS, can be updated from a PC via a serial link. Fig. 4 shows an example of a control system configuration.
Fig. 4. Example control system configuration
3
Componentization
3.1 Reverse Engineering of the Existing Software Architecture The first step in the componentization of the architecture of the Control Builder and the controller system software was to get an overview of the existing architecture of the
238
Frank L¨uders, Ivica Crnkovic, and Per Runeson
Fig. 5. The original software architecture
software, which was not explicitly described in any document. The software consists of a large number of source code modules, each of which is used to build the Control Builder or the controller system software or both, with an even larger number of interdependencies. An analysis of the software modules with particular focus on I/O and communication functions yielded the course-grained architecture depicted in Fig. 5. The boxes in the figure represent logical components of related functionality. Each box is implemented by a number of modules, and is not readily visible in the source code. Many modules are also used as part of other products, which are not discussed further here. This architecture is thus a product-line architecture [3], although the company has not yet adopted a systematic product line approach. On the controller side, which is the focus of this chapter, the architecture has two distinct layers [15]. The lower layer (the box at the bottom of the figure) provides an interface to the upper layer (the rest of the boxes), which allows the source code of the upper layer to be used on different hardware platforms and operating systems. The complete set of interdependencies between modules within each layer was not captured by the analysis. To illustrate how some modules are used to build both the Control Builder and the controller system software, we consider the handling of hardware configurations. The hardware configuration is specified in the Controllers structure of the Control Builder. For each controller in the system, it is specified what additional hardware, such as I/O modules and communication interfaces, it is equipped with. Further configuration information can be supplied for each piece of hardware, leading to a hierarchic organization of information, called the hardware configuration tree. The code that builds this tree in the Control Builder is also used in the controller system software to build the same tree there when the project is downloaded. If the configuration is modified in the Control Builder and downloaded again, only a description of what has changed in the tree is sent to the controller. The main problem with this software architecture is related to the work required to add support for new I/O modules, communication interfaces, and protocols. For instance, adding support for a new I/O system possibly required source code updates in all the components except the User Interface and the Communication Server, while a new communication interface and protocol could require all components except I/O Access to be updated.
Adopting a Component-Based Software Architecture for an Industrial Control System
239
As an example of what type of modifications may have been needed to the software, we consider the incorporation of a new type of I/O module. To be able to include a device (I/O module or communication device) in a configuration, a hardware definition file for that type of device must be present on the computer running the Control Builder. For an I/O module, this file defines the number and types of input and output channels. The Control Builder uses this information to allow the module and its channels to be configured using a generic configuration editor. This explains why the user interface did not need to be updated to support a new I/O module. The hardware definition file also defines the memory layout of the module, so that the transmission of data between program variables and I/O channels can be implemented in a generic way. For most I/O modules, however, the system is required to perform certain tasks, for instance when the configuration is compiled in the Control Builder or during start-up and shutdown in the controller. In the architecture described above, routines to handle such tasks had to be hard-coded for every type of I/O module supported. This required software developers with a thorough knowledge of the source code. The situation was similar when adding support for communication interfaces and protocols. The limited number of such developers therefore constituted a bottleneck in the effort to keep the system open to the many I/O and communication systems found in industry. 3.2 Component-Based Software Architecture To make it much easier to add support for new types of I/O and communication, it was decided to split the logical components mentioned above into their generic and specific parts. The generic parts, commonly called the generic I/O and communication framework, contains code that is shared by all hardware and protocols implementing certain functionality. Routines that are specific to a particular type of hardware or protocol are implemented in separate components, called protocol handlers, installed on the PC running the Control Builder and on the controllers. This component-based architecture is illustrated in Fig. 6. Focusing again on the controller side, and comparing this architecture with the previous one, the protocol handlers can be seen as an additional half-layer between the framework and the bottom layer. To add support for a new I/O module, communication interface, or protocol in this architecture, it is only necessary to add protocol handlers for the PC and the controller along with a hardware definition file and possibly a device driver. The format of hardware definition files is extended to include the identities of the protocol handlers as described below. Essential to the success of the approach, is that the dependencies between the framework and the protocol handlers are fairly limited and, even more importantly, well specified. One common way of dealing with such dependencies is to specify the interfaces provided and required by each component [9]. The new control system uses the Component Object Model (COM) [4] to specify these interfaces, since COM provides suitable formats both for writing interface specification, using the COM Interface Description Language (IDL), and for run-time interoperability between components. For each of the generic components, two interfaces are specified: one that is provided by the framework and one that may be provided by protocol handlers. In addition, interfaces are defined to give protocol handlers access to device drivers and system func-
240
Frank L¨uders, Ivica Crnkovic, and Per Runeson
Fig. 6. Component-based software architecture
tions. The identities of protocol handlers are provided in the hardware definition files as the Globally Unique Identifiers (GUIDs) of the COM classes that implement them. COM allows several instances of the same protocol handler to be created. This is useful, for instance, when a controller is connected to two separate networks of the same type. Also, it is useful to have one object, implementing an interface provided by the framework, for each protocol handler that requires the interface. An additional reason that COM has been chosen is that commercial COM implementations are expected to be available on all operating systems that the software will be released on in the future. The Control Builder is only released on Windows, and it is expected that most future control products will be based on VxWorks, although some products are based on pSOS, for which a commercial COM implementation does not exist. In the first release of the component-based system the protocol handlers were implemented as C++ classes, which are linked statically with the framework. This works well because of the close correspondence between COM and C++, where every COM interface has an equivalent abstract C++ class. An important constraint on the design of the architecture is that hard real-time requirements, related to scheduling and execution of control programs, must not be affected by interaction with protocol handlers. Thus, all code in the framework responsible for instantiation and execution of protocol handlers, always executes at a lower priority than code with hard deadlines. 3.3 Interaction Between Components When a control system is configured to use a particular device or protocol, the Control Builder uses the information in the hardware definition file to load the protocol handler on the PC and execute the protocol specific routines it implements. During download, the identity of the protocol handler on the controller is sent along with the other con-
Adopting a Component-Based Software Architecture for an Industrial Control System
241
figuration information. The controller system software then tries to load this protocol handler. If this fails, the download is aborted and an error message is displayed by the Control Builder. This is very similar to what happens if one tries to download a configuration, which includes a device that is not physically present. If the protocol handler is available, an object is created and the required interface pointers obtained. Objects are then created in the framework and interface pointers to these passed to the protocol handler. After the connections between the framework and the protocol handler has been set up through the exchange of interface pointers, a method will usually be called on the protocol handler object that causes it to continue executing in a thread of its own. Since the interface pointers held by the protocol handler reference objects in the framework, which are not used by anyone else, all synchronization between concurrently active protocol handlers can be done inside the framework.
Fig. 7. Interfaces for communication servers
To make this more concrete, we now present a simplified description of the interaction between the framework and a protocol handler implementing the server side of a communication protocol on the controller. This relies manly on the two interfaces IGenServer and IPhServer. The former is provided by the framework and the latter by protocol handlers implementing server side functionality. Fig. 7 is a UML structure diagram showing the relationships between interfaces and classes involved in the interaction between the framework and such a protocol handler. The class CMyProtocol represents the protocol handler. The interface IGenDriver gives the protocol handler access to the device driver for a communication interface. A simplified definition of the IPhServer interface is shown below. The first two operations are used to pass interface pointers to objects implemented by the framework to the protocol handler. The other two operations are used to start and stop the execution of the protocol handler in a separate thread.
242
Frank L¨uders, Ivica Crnkovic, and Per Runeson
interface { HRESULT HRESULT HRESULT HRESULT };
IPhServer : IUnknown SetServerCallback([in] IGenServer *pGenSrv); SetServerDriver([in] IGenDriver *pGenDrv); ExecuteServer(); StopServer();
The UML sequence diagram in Fig. 8 shows an example of what might happen when a configuration is downloaded to a controller, specifying that the controller should provide server-side functionality. The system software first invokes the COM operation CoCreateInstance to create a protocol handler object and obtain an IPhServer interface pointer. Next, an instance of CGenServer is created and a pointer to it passed to the protocol handler using SetServerCallback. Similarly, a pointer to a CGenDriver object is passed using SetDriverCallback. Finally, ExecuteServer is invoked, causing the protocol handler to start running in a new thread.
Fig. 8. Call sequence to set up conne
To see how the execution of the protocol handler proceeds, we first look at a simplified definition of IGenServer. The first two operations are used to inform the framework about incoming requests from clients to establish a connection and to take down an existing connection. The last two operations are used to handle requests to read and write named variables, respectively. The index parameter is used with variables that hold
Adopting a Component-Based Software Architecture for an Industrial Control System
243
structured data, such as records or arrays. All the methods have an output parameter that is used to return a status word. interface IGenServer : IUnknown { HRESULT Connect([out] short *stat); HRESULT Disconnect([out] short *stat); HRESULT ReadVariable( [in] BSTR *name, [in] short index, [out] tVal *pVal, [out] short *status); HRESULT WriteVariable( [in] BSTR *name, [in] short index, [in] tVal *pVal, [out] short *status); }; Running in a thread of its own, the protocol handler uses the IGenDriver interface pointer to poll the driver for incoming requests from clients. When a request is encountered the appropriate operation is invoked via the IGenServer interface pointer, and the result of the operation, specified by the status parameter, reported back to the driver and ultimately to the communication client via the network. As an example, Fig. 9 shows how a read request is handled by calling ReadVariable. The definition of the IGenDriver interface is not included in this discussion for simplicity, so the names of the methods invoked on this interface are left unspecified in the diagram. Write and connection oriented requests are handled in a very similar manner to read requests. The last scenario to be considered here, is the one where configuration information is downloaded, specifying that a protocol handler that was used in the previous configuration should no longer be used. In this case, the connections between the objects in framework and the protocol handler must be taken down and the resources allocated to them released. Fig. 10 shows how this is accomplished by the framework first invoking StopServer and then Release on the IPhServer interface pointer. This causes the pro-
Fig. 9. Call sequence to handle variable read
244
Frank L¨uders, Ivica Crnkovic, and Per Runeson
Fig. 10. Call sequence to take down connections
tocol handler to decrement its reference count, and to invoke Release on the interface pointers that have previously been passed to it. This in turn, causes the objects behind these interface pointers in the framework to release themselves, since their reference count reaches zero. Assuming that its reference count is also zero, the protocol handler object also releases itself. If the same communication interface, and thus the protocol handler object, had also been used for different purposes, the reference count would have remained greater than zero and the object not released.
4
Experiences
The definitive measure of the success of the project described in this chapter is how large the effort required to redesign the software architecture has been compared to the effort saved by the new way of adding I/O and communication support. It is important to remember, however, that in addition to this cost balance, the business benefits gained by shortening the time to market must be taken into account. Also important, although harder to assess, are the long time advantages of the increased flexibility that the component-based software architecture is hoped to provide. At the time of writing, the parts of the generic I/O and communication framework needed to support communication protocols have been completed, requiring an estimated effort of 15–20 person-years. A number of protocols have been implemented using the new architecture. The total effort required to implement a protocol (including the protocol handler, a device driver, firmware for the communication interface, and possibly IEC 61131-3 function blocks) is estimated to be 3–6 person-years. The reduction in effort compared to that required with the previous architecture is estimated to vary from one third to one half, i.e. 1–3 person-years per protocol. Assuming an aver-
Adopting a Component-Based Software Architecture for an Industrial Control System
245
age saving of 2 person-years per protocol handler, the savings surpass the investment after the implementation of 8–10 protocols. Table 1 summarizes these effort estimations, which were made by technical management at ABB and are primarily based on reported working hours. System tests have shown that the adoption of the chosen subset of COM has resulted in acceptable system performance. The ability to meet hard real-time requirements has not been affected by the component-based architecture, since all such requirements are handled by threads that cannot be interrupted by the protocol handlers. Table 1. Summary of effort estimation for the two software architectures
Investment in framework Cost per protocol Saving per protocol Return on investment
Original software architecture 0 4–9 person-years 0 –
Component-based software architecture 15–20 person-years 3–6 person-years 1–3 person-years 8–10 protocols
An interesting experience from the project is that the componentization is believed to have resulted in a more modularized and better documented system. Two characteristics generally believed to enhance quality. This experience concurs with the view of Szyperski [16] that adopting a component-based approach may be used to achieve modularization, and may therefore be effective even in the absence of externally developed components. The reduction in the effort required to implement communication protocols is partly due to the fact that the framework now provides some functionality that was previously provided by individual protocol implementations. This is also believed to have increased quality, since the risk of each protocol implementation introducing new errors in this functionality has been removed. Another interesting experience is that techniques that were originally developed to deal with dynamic hardware configurations have been successfully extended to cover dynamic configuration of software components. In the ABB control system, hardware definition files are used to specify what hardware components a controller may be equipped with and how the system software should interact with different types of components. In the redesigned system, the format of these files has been extended to specify which software components may be used in the system. The true power of this commonality is that existing mechanisms for handling hardware configurations, such as manipulating configuration trees in the Control Builder, downloading configuration information to a control system, and dealing with invalid configurations, can be reused largely as is. The idea that component-based software systems can benefit by learning from hardware design is also aired in [16]. Another lesson of general value is that it seems that a component technology, such as COM, can very well be used on embedded platforms and even platforms where runtime support for the technology is not available. Firstly, we have seen that the space and computation overhead that follows from using COM is not larger than what can be afforded in many embedded systems. In fact, used with some care, COM does not intro-
246
Frank L¨uders, Ivica Crnkovic, and Per Runeson
duce much more overhead than do virtual methods in C++. Secondly, in systems where no such overhead can be allowed, or systems that run on platforms without support for COM, IDL can still be used to define interfaces between components, thus making a future transition to COM straightforward. This takes advantage of the fact that the Microsoft IDL compiler generates C and C++ code corresponding to the interfaces defined in an IDL file as well as COM type libraries. Thus, the same interface definitions can be used with systems of separately linked COM components and statically linked systems where each component is realized as a C++ class or C module. Among the problems encountered with the componentization, the most noticeable was the difficulty of splitting functionality between independent components, i.e. between the framework and the protocol handlers, and thus determining the interfaces between these components. In all probability, this was in large parts due to the lack of any prior experiences with similar efforts within the development organization. Initially, the task of specifying interfaces was given to the development center responsible for developing the framework. This changed during the course of the project, however, and the interfaces ultimately used were in reality defined in an iterative way in cooperation between the organizational unit developing the framework and those developing protocol handlers. Other problems are of a non-technical nature. An example is the potential problem of what business processes to use if protocol handlers are to be deployed as stand-alone products. So far, protocol handlers have only been deployed as parts of complete controller products, comprising both hardware and software.
5
Related Work
A well-published case study with focus on software architecture is that of the US Navy’s A-7E avionics system [13]. Among other things, this study demonstrated the use of information hiding to enhance modifiability while preserving real-time performance. Although the architecture of the A-7E system is not component-based in the modern sense, an important step was taken in this direction by decomposing the software into loosely coupled modules with well-defined interfaces. A more recent study, describing the componentization of a system with the aim to make it easier to add new functionality, has been conducted in the telecommunications domain [1]. In this case study, the monolithic architecture of Ericsson’s Billing Gateway Systems is redesigned into one based on distributed components, and a componentbased prototype system implemented. In contrast to our case, the system does not have hard real-time requirements, although performance is a major concern. The study shows that componentization of the architecture can improve the maintainability of the system while still satisfying the performance requirements. There is a substantial body of work on component-based software for control systems and other embedded real-time systems, which, unlike this chapter, focuses on the development of new component models to address the specific requirements that a system or application domain has with respect to performance, resource utilization, reliability, etc. One of the best known examples is the Koala component model [12] for consumer electronics, which is internally developed and used by Philips. Two other examples with particular relation to the work presented in this chapter are the PECOS
Adopting a Component-Based Software Architecture for an Industrial Control System
247
component model [6], which was developed with the participation of ABB for use in industrial field-devices, and the DiPS+ component framework [11], which targets the development of flexible protocol stacks in embedded devices. The primary advantage of such models over more general-purpose models is their effective support for optimization with respect to the most important aspects for the particular application domain. A typical disadvantage is the lack of efficient and inexpensive tools on the market. For instance, building proprietary development tools in parallel with the actual product development may incur significant additional costs.
6
Conclusions and Future Work
The experiences described above show that the effort required to add support for communication protocols in the controller product has been considerably reduced since the adoption of the new architecture. Comparing the invested effort of 15–20 person-years with the saving of 1–3 person-years per protocol handler it is concluded that the effort required to design the component-based software architecture is justified by the reduction in the effort required to make pre-specified functional extensions to the software and that the savings surpass the investment after 8–10 such extensions. Based on current plans for protocol handlers to be implemented, it is expected that the savings exceed the investment within 3 years from the start of the project. In addition to these effort savings and the perceived quality improvements, the component-based architecture has resulted in the removal of the bottleneck at the single development centre and the possibility of developing the framework and several protocol handlers concurrently. This could potentially lead to business benefits such as reduced time to market. Concerning the overhead introduced by the component model, which is small in the current system but may be larger if and when more COM support is incorporated, we believe that the business climate in which industrial control systems are developed justifies a modest increase in hardware resource requirements in exchange for a noticeable reduction in development time. The experiences with the use of a component-based software architecture in ABB’s control system could be further evaluated. For instance, as more protocol handlers are completed, the confidence in the estimated reduction of effort can be increased. Another opportunity is to study the effect on other system properties, such as performance or reliability. A challenge is that this would require that meaningful measures of such properties could be defined and that measures could be obtained from one or more versions of the system before the componentization. Since a number of protocol handlers have been implemented and even more are planned, there is probably a good opportunity to study the experiences of protocol implementers, which may shed additional light on the qualities of the adopted architecture and component model. One possibility would be to conduct a survey, which might include several development centers. Further opportunities to study the use of a software component model in a real-time system might be offered by a future version of the controller that adopts more of COM and possibly uses a commercial COM implementation. An issue that may be addressed in the future development at ABB is inclusion of a COM-runtime system with support for dynamic linking between components. Com-
248
Frank L¨uders, Ivica Crnkovic, and Per Runeson
mercially available COM implementations will probably be used for systems based on Windows and VxWorks. Dynamic linking will simplify the process of developing and testing protocol handlers. A potentially substantial effect of dynamic linking is the possibility of adding and upgrading protocol handlers at runtime. This might allow costly production stops to be avoided while, for instance, a controller is updated with a new communication protocol. Another possible continuation of the work presented here, would be to extend the component approach beyond I/O and communication. An architecture were general functionality can be easily integrated by adding independently developed components, would be a great benefit to this type of system, which is intended for a large range of control applications.
References 1. Algestam, H., Offesson, M., Lundberg, L.: Using Components to Increase Maintainability in a Large Telecommunication System. In: Proceedings of the Ninth Asia-Pacific Software Engineering Conference (2002) 65–73. 2. Bass, L., Clements, P., Kazman, R.: Software Architecture in Practice. 2nd edition. AddisonWesley, Reading, MA (2003). 3. Bosch, J.: Design & Use of Software Architectures - Adopting and Evolving a Product-Line Approach. Addison-Wesley, Reading, MA (2000). 4. Box, D.: Essential COM. Addison-Wesley, Reading, MA (1997). 5. ESPRIT Consortium CCE-CNMA (eds.): MMS: A Communication Language for Manufacturing. Springer-Verlag, Berlin Heidelberg New York (1995). 6. Genßler, T., Stich, C., Christoph, A., Winter, M., Nierstrasz, O., Ducasse, S., Wuyts, R., Ar´evalo, G., Sch¨onhage, B., M¨uller, P.: Components for Embedded Software – The PECOS Approach. In: Proceedings of the 2002 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (2002) 19–26. 7. Heineman, G., Councill, W., editors: Component-Based Software Engineering - Putting the Pieces Together. Addison-Wesley, Reading, MA (2001). 8. International Electrotechnical Commission: Programmable Controllers - Part 3: Programming Languages. 2nd edition. IEC Std. 61131-3 (2003). 9. L¨uders, F., Lau, K., Ho, S.: Specification of Software Components. In: Crnkovic. I., Larsson, M. (eds.): Building Reliable Component-Based Software Systems. Artech House Books, Boston London (2000) 23–38. 10. Mahalik, N.: Fieldbus Technology. Springer-Verlag, Berlin Heidelberg New York (2003). 11. Michiels, S.: Component Framework Technology for Adaptable and Manageable Protocol Stacks. PhD thesis, K.U.Leuven, Leuven, Belgium (2004). 12. van Ommering, R., van der Linden, F., Kramer, J., Magee, J.: The Koala Component Model for Consumer Electronics Software. Computer, Vol. 33, Issue 3 (2000) 78–85. 13. Parnas, D., Clements, P., Weiss, D.: The Modular Structure of Complex Systems. IEEE Transactions on Software Engineering, Vol. 11, Issue 3 (1985) 259–266. 14. Robson, C.: Real World Research. 2nd edition. Blackwell Publishers, Oxford (2002). 15. Shaw, M., Garlan, D.: Software Architecture – Perspectives on an Emerging Discipline. Prentice-Hall, Upper Saddle River, NJ (1996). 16. Szyperski, C.: Component Software – Beyond Object-Oriented Programming. 2nd edition. Addison-Wesley, Reading, MA (2002).
Specification and Evaluation of Safety Properties in a Component-Based Software Engineering Process Lars Grunske1 , Bernhard Kaiser2 , and Ralf H. Reussner3 1
2
School of ITEE, The University of Queensland St Lucia, Brisbane 4072, Australia [email protected] Fraunhofer Institute for Experimental Software Engineering Sauerwiesen 6, 67661 Kaiserslautern, Germany [email protected] 3 Software Engineering Group, University of Oldenburg OFFIS, Escherweg 2, 26121 Oldenburg, Germany [email protected]
Abstract. Over the past years, component-based software engineering has become an established paradigm in the area of complex software intensive systems. However, many techniques for analyzing these systems for critical properties currently do not make use of the component orientation. In particular, safety analysis of component-based systems is an open field of research. In this chapter we investigate the problems arising and define a set of requirements that apply when adapting the analysis of safety properties to a component-based software engineering process. Based on these requirements some important component-oriented safety evaluation approaches are examined and compared.
1
Introduction
Over the past years, the paradigm of component-based software engineering has been established in the construction of complex software-intensive systems [1], mainly in the context of large business software projects. Models and procedures have been developed that help designing component-based systems and assessing many relevant quality properties. Design by components is also a promising approach in the domain of embedded systems, where cost reduction, time-to-market and quality demands impose special constraints. In the context of embedded systems, in particular safety-critical systems, some new issues arise that are still subject of current research. Some of these issues are: – How to specify the failure behavior of a component, when its usage and environment are unknown? – How to evaluate the safety properties for a system built with components? – How to adapt accepted safety assessment techniques to the special context of embedded and component-based systems? – How to construct safety cases for a system built from components? In this chapter we will discuss these problems in detail and give an overview of research work covering this problem domain. C. Atkinson et al. (Eds.): Component-Based Software Development, LNCS 3778, pp. 249–274, 2005. c Springer-Verlag Berlin Heidelberg 2005
250
Lars Grunske, Bernhard Kaiser, and Ralf H. Reussner
The remainder of the chapter is organized as follows: In Sect. 2 an introduction to the general safety concepts is given. Therefore, the relevant safety terms are defined. Thereafter, Sect. 3 provides an overview over the state of the art safety analysis techniques. In the main part of this chapter, Sect. 4, we investigate the arising problems and propose some requirements to safety analysis techniques when applying them to the construction and evaluation of component based safety critical systems. Furthermore, we summarize some important state of the art techniques for component-based analysis for safety properties. In Sect. 5 we compare these techniques and show how each of these techniques fulfills the stated requirements. This provides support for the selection of a suitable analysis technique. Finally, Sect. 6 contains concluding remarks and points out the directions for future work.
2
Basic Safety Concepts
To introduce into the matter of safety analysis, we first define the relevant terms and concepts used in this chapter. Definition 1 (Component). A component is an identifiable entity with a well defined and specified behavior. In computer science and engineering it designates a self-contained, i.e. separately deployable piece of hardware or software. Definition 2 (System). A system is a set of components which act together as a whole and that is delimited by a system boundary. This chapter deals with purely technical systems (while safety analysis in general considers non-technical components such as user interaction as well). Due to recursive decomposition the subsystems (components) of a system can be viewed as systems on their own, so the terms component and system are often used interchangeably. Definition 3 (Failure). A failure is any behavior of a component or system, which deviates from the specified behavior, although the environment conditions do not violate their specification. Based on this definition a failure is basically a deviation from the specified behavior. However, from the practical viewpoint it is useful to introduce a failure classification of finer granularity by distinguishing different ways in which the provided behavior can deviate from what the expectation. For dependable systems there is an accepted categorization which groups the failures into the following failure types or failure modes [2, 3]: – tl timing failure of a service (expected event or service is delivered after the defined deadline has expired - reaction too late) – te timing failure of a service (event or service is delivered before it was expected reaction too early) – v incorrect result of requested service (wrong data or service result - value) – c accomplish an unexpected service (unexpected event or service - commission) – o unavailable service (no event or service is delivered when it is expected - omission)
Safety Properties in a Component-Based Process
251
Definition 4 (Fault). A fault is a state or constitution of a component that deviates from the specification and that can potentially lead to a failure. Definition 5 (Accident). An accident is an undesired event that causes loss or impairment of human life or health, material, environment or other goods (similar [4]). To reduce the probability of an accident the preconditions under the control of the system must be distinguished from uncontrollable ones, because the system designer can only take counter-measures for the controllable ones. These controllable preconditions are called hazards and can be defined as follows: Definition 6 (Hazard). A hazard is a state of a system and its environment in which the occurrence of an accident only depends on factors which are not under control of the system. An example of a hazard is a defective car air-bag, since the accident “driver is injured” occurs only if the car crashes. It depends on the environment, whether a hazard leads to an accident and thus the term hazard is always defined with respect to a given system environment and depends on the actual definition of the system boundary. To quantify safety it is important to consider how probable a hazard is and what the severity of the correlated accident or damage is. This is captured in the definition of risk. Definition 7 (Risk). Risk is the severity combined with the probability of a hazard. It is not practical to claim that risk be the product of hazard level and probability, since there are no universally accepted measures for the hazard level and the estimations of the probability are often very coarse. A practical way is to group both severity and probability in a few categories (negligible consequences . . . catastrophic, very rare . . . sure), as in [5, 6]. Both dimensions are independent from each other. A release of radioactivity in a nuclear power plant for instance can cost the lives of many people. Therefore, such a kind of accident is not acceptable, even with a very low likelihood. Definition 8 (Acceptable Risk). Acceptable risk is the level of risk that has deliberately been defined to be supportable by the society, usually based on an agreed acceptance criterion. The risk acceptance depends on social factors such as applicable laws or public opinion. According to standards (e.g. [5]) the acceptable risk can be identified based on various risk acceptance principles, depending on local legislation. Some known risk acceptance principles are ALARP (the residual risk shall be As Low As Reasonably Practicable), GAMAB ( globalement au moins aussi bon, French principle that assumes that there is already an acceptable system and the risk of new system shall be equivalent or lower) and MEM (Minimum Endogenous Mortality, where individual risk due to a particular technical system must not exceed 1/20th of the minimum endogenous mortality.) This definition of risk enables the definitions of the terms S AFETY and S AFETY R EQUIREMENTS
252
Lars Grunske, Bernhard Kaiser, and Ralf H. Reussner
Definition 9 (Safety). Safety is freedom from unacceptable risks [5] In other words, safety is the situation where the risk is below the accepted risk level. Literally, safety is the situation where no hazard is present. Since this is not a practicable definition, the widely agreed definition refers to the risk level instead, incorporating the probability of a hazard. Definition 10 (Safety Requirements). A safety requirement is a (more or less formal) description of a hazard combined with the tolerable probability of this hazard. The tolerable hazard probability must be determined so that the combined risk for all hazards of the system is acceptable. This is the task of risk analysis. In summary, the aim of safety critical systems construction is to build a system so that it fulfills all of its safety requirements. This comprises the steps – – – –
Identification of all system level hazards Determination of the acceptable hazard probabilities (safety requirements) Taking constructive measures in order to avoid or reduce anticipated hazards Proof that all of these safety requirements are fulfilled (safety cases)
If the proof fails on the first attempt, the last two steps have to be repeated iteratively.
3
Established Safety Analysis Techniques
There is an established set of safety analysis techniques for different purposes. Most of them have been developed at a time when safety critical tasks were exclusively performed by purely mechanical or electrical systems and do not especially consider the new aspects introduced by software control. The different techniques can be classified by different categories: they are used in different process phases, they use different formalisms, and they also differ in the kinds of qualitative and quantitative analyses that they provide. In the context of component-based system development, the techniques can also be divided into techniques that ignore the internal structure of the systems (as these are not concerned by the fact that a system is developed by components) and techniques that refer to a structural model of the systems (as these potentially need some adaptation when applied to component-based systems). 3.1 Safety Analysis Techniques on System-Level Techniques belong to the first category e.g. because they look at the system on a coarse and abstract level, focusing on black-box properties or the effects of system-level failures to the environment. In these cases it is irrelevant whether a system is monolithic or component-based and which of the components are implemented in software or hardware. These techniques are typically applied in early process phases. In the sequel we give an overview on some techniques belonging to this category. An example for an early safety analysis technique is the Preliminary Hazard Analysis (PHA) [7], a technique that is applied during requirements analysis and early system
Safety Properties in a Component-Based Process
253
design. Its purpose is to identify potential dangerous sources, to give an early assessment of severity and probability of each hazard and to suggest constructive measures to avoid or reduce risks. PHA is an inductive technique that searches for the effects of identified hazards and the conditions in which they can arise. It is a manual and semi-formal technique that is applied on the system level. A similar technique is Functional Hazard Assessment (FHA)[6] that is increasingly used in aerospace industries. It assesses system functions without reference to the (later) technical realization. Like PHA it is used to obtain a first safety study in early process phases. Based on the potential hazards that have been identified all functions are categorized according to criticality levels. For each function and each of its failure modes the correlated effects, countermeasures and analysis or validation techniques are listed in a table. Although a FHA can be carried out on a subsystem level as well, it is a manual and rather coarse technique that does not require detailed information about the component structure of the system. Another example is Event-Tree-Analysis (ETA) [8], a graphical technique that uses a tree diagram to find and depict all potential effects of a given system-level hazard. The root of the tree is the hazard being analyzed. The branches are potential scenarios that lead to different consequences. Each branching point is associated to a condition which influences the further development of the scenario. For example if the hazard is “fire in engine”, the first branching point could be “automatic extinguishing system is working properly”. The TRUE branch leads to a mitigation scenario (no accident), the FALSE branch to another branching point: “fire is immediately detected by operator”. Again the two branches lead to a different continuation of the story and finally each scenario leads to an accident / damage or not. The technique can yield quantitative results, if for each branching point the probability to take the TRUE or the FALSE path are known. ETA is applied manually with computer support. Since all effects considered in an ETA happen in the system environment, the internal structure of the system is not of concern. As these techniques either regard the system as a black box or are applied in a stage where the actual implementation is yet unknown, they do not refer to components. Consequently, the aforementioned techniques can be applied to component-based systems without modification. In the following subsections we introduce some safety analysis techniques that refer to the internal structure of the system. Thus, we will have to discuss afterwards how far and with which modifications they can be applied in the context of component-based system design. 3.2 Failure Modes and Effects Analysis (FMEA) Failure Modes and Effects Analysis (FMEA; extended variant: Failure Modes, Effects and Criticality Analysis, FMECA) is a table-based, semi-formal technique to identify possible safety or reliability issues with their effects in a systematic and roughly quantitative way. FMEA can be applied both to products (system or component level) or to a process (e.g. software development process). FMEA has been standardized in IEC 60812 [9] and is today widely applied in industry, in particular in the automotive branch. The steps to be performed are: 1. Analysis of the system structure and identification of structure elements (hierarchically arranged in a structure tree diagram)
254
Lars Grunske, Bernhard Kaiser, and Ralf H. Reussner
2. Identification of the functions of each identified structure element. The functional decomposition follows the structural decomposition, i.e. functions of sub-components contribute to the functions of their respective super-components. 3. Investigation and listing of all failure possibilities of each function. Generating an FMEA table (see below) containing one row for each failure mode found 4. Estimation of (a) the failure probability of each failure mode, (b) the criticality of the failure mode and (c) the probability that the failure is not discovered early enough to prevent its consequences. For each of these three dimensions a measure out of the range 1 (most favorable case) to 10 (fatal case) is assigned. For this step the use of guiding words and predefined categories is recommended. Multiplication of the three numbers render a Risk Priority Number (RPN) between 1 and 1000. The most critical failure modes are marked with the highest RPN. 5. Redesign or improvement of the system. The ameliorations begin with the failure modes that have the highest RPN. The main goal is to reduce the occurrence frequency of failures, followed by measures to ameliorate the detection of the failures (e.g. by alarm facilities). The RPN can be used to prioritize the amelioration efforts and to decide whether corrective actions are mandatory or not. After the changes a re-assessment of the system quantifies the effect of the measures. The RPN must now be significantly lower than before. The central document of an FMEA is the table, containing the columns (Structure Element, Failure Mode, Effect on System, Possible Hazards, Risk Priority Number, Detection Means, Applicable Controls / Countermeasures). This table helps to carry out the FMEA systematically and makes it a semi-formal method. 3.3 Hazard and Operability Studies (HAZOP) Hazard and Operability Studies (HAZOP) [10, 11] is a criticality analysis technique that has been developed in the 1970s in the context of chemical process industry and has been transferred to other industry branches, including software engineering. It focuses on abnormal deviations of process parameters from their expected values. The key element is a set of keywords that qualify the kind of deviation (e.g. no, less, more, reverse, also, other, fluctuation, early, late) for each information or material flow. The use of predefined keywords assures the completeness and consistency of the whole study. The list can be adapted or extended as appropriate. HAZOP is a session technique, conducted by a team of domain experts as early as a first material or data flow model for the system is available. The goal is to predict potential hazards that result from these deviations. The results are usually presented in a table and in the end report the system design is either accepted or changes to improve safety are requested. 3.4 Fault Tree Analysis Fault Tree Analysis (FTA) [12–14] is a graphical safety and reliability analysis technique which has been used in different industry branches for over 40 years. It is a deductive top-down analysis technique and a combinatorial technique. FTA allows tracing back influences to a given system failure, accident or hazard. Fault Trees (FTs) provide
Safety Properties in a Component-Based Process
255
logical connectives (called gates) that allow decomposing the system-level hazard recursively. The AND gate indicates that all influence factors must apply together to cause the hazard and the OR gate indicates that any of the influences causes the hazard alone. The logical structure is usually depicted as an upside-down tree with the hazard to be examined (called top-event) at its root and the lowest-level influence factors (called basic events) as the leaves. In the context of FTA the term “event” is applied in its probability theory meaning: an event is not necessarily some sudden phenomenon, but can be any proposition that is true with a certain probability. Based on a FT, several qualitative or quantitative analyzes are possible. A qualitative analysis list, for instance, all combinations of failures that must occur together to cause the top-level failure. Quantitative analysis calculates the probability of the topevent from the given probabilities of the basic events. Combinatorial formulas indicate for each type of gate how to calculate the output probability of a gate from the given input probabilities. The probabilities taken into account are the probabilities that an event occurs at least once over a given mission time or they are probabilities of a failed state with respect to a given point in time. The evolution of a system over time or any dependencies between the present system behavior and the history cannot be modelled. An important assumption to obtain correct results is the stochastic independence of the basic events, which is hard to achieve in complex networked systems where often common cause failures occur [15]. Figure 1 shows a simple FT example. The starting point of L a p to p U n a v a ila b le P = 0 .3 1 4
³ 1 & C P U D e fe c tiv e P = 0 .3 B a tte ry E m p ty P = 0 .2 M a in s P o w e r D o w n P = 0 .1
Fig. 1. Fault Tree Example
the model construction is a hazard state or a failure event that has been identified before (e.g. by means of an FMEA). In the example, the unavailability of a laptop computer is analyzed. A creative process is carried out to investigate all factors that contribute to the occurrence of this top-event. The search is performed along the system structure and examines all system functions, environment conditions (e.g. ambient temperature) and auxiliaries (such as power supply). The decomposition is stopped at a granularity level where the individual influence factors cannot or need not be refined any further. These lowest-level factors are called basic events and are the leaves of the tree, depicted by circles. For a quantitative analysis a probability or probability distribution must be known or estimated for all basic events. This is usually achieved by probabilistic failure models, most of them empirical models.
256
Lars Grunske, Bernhard Kaiser, and Ralf H. Reussner
In the example, we restrict ourselves to AND and OR gates, although many other gate types are provided by standards and FTA tools . In the figure, the graphical representation according to [13] has been chosen — in the United States different symbols are used. The AND gate is depicted by a & symbol and the OR gate by a ≤ 1 symbol (because at least one input must be true for the gate output to become true. The quantitative result shown in the figure has been obtained by application of the formulas associated with the AND and the OR gate. In case of the AND gate, the input failure probabilities Fi are multiplied with each other: Foutput = i Finputi , in the case of the OR gate DeMorgan’s theorem indicates that all input probabilities have to be inverted (subtracted from one), then multiplied, and finally the result has to be inverted again: Foutput = 1 − i (1 − Finputi ). 3.5 State-Based Approaches To explain the behavior of components with respect to safety it is often not sufficient to restrict to a two states (working and failed) abstraction, as FTA does. This accounts for a different class of analysis techniques, the state-based techniques. In the context of software engineering and systems safety usually discrete state approaches are applied. The practically relevant subclasses classes are: – Statecharts, ROOMcharts, UML State Diagrams or similar notations – Petri Nets – Markov Chains The use of Statecharts or similar notations can enhance safety during system construction by providing an intuitive notation with automatic consistency checking and by partially allowing for automatic code generation. Moreover, in safety critical areas they are exploited for safety analysis as well: formal state-based models that describe the (intended) system behavior can serve as a base for model checking. Model checking is a qualitative technique that decides if a certain undesired state (hazard state) is definitely impossible to reach. If this cannot be proved, a counter example is produced, which in turn helps the analyst to formulate countermeasures how to avoid that hazard. Probabilistic variants of model checking algorithms are currently a major subject in formal methods research. Petri Nets exist in deterministic and in probabilistic variants. They are a good means to model concurrent or collaborating systems. They also allow for different qualitative or quantitative analysis that can be useful in safety validation. However, Petri Nets are mainly applied for performance evaluation. Markov Chains (MCs) are a probabilistic state-based modelling technique. A MC is a finite state machine where the transitions occur stochastically according to defined probability distributions. MC analysis plays an important role in reliability analysis and can be used to judge the reliability or availability of safety-relevant components within a system [16]. It is a discrete-state approach and there exists a continuous time variant and a discrete time variant. A MC is mainly a state diagram that explicitly considers working states and failed states. In contrast to the combinatorial approaches MCs allow more than two states for each component, so multiple failure modes or degrading failure
Safety Properties in a Component-Based Process
257
(e.g. working - restricted service - completely failed) can be modelled. The states are usually depicted as circles and the state transitions as directed edges. The transition rates are annotated at the edges. A transition rate is the conditional probability, that the state will change from Si to Sj in the next short time interval under the condition that it is in state Si at time t. MC analysis is performed by formulating and solving differential equations (there are several transient and steady state analysis or simulation techniques and quite a number of tools is available). These equations can be imagined to describe the “probability flow” between different states.
4
Safety Analysis Techniques for Component-Based Systems
4.1 Problems Since safety means freedom from unacceptable risks, the primary goal of safety analysis techniques is to identify all failures on the system level that cause hazardous situations and to demonstrate that their probabilities are sufficiently low. In the context of component-based systems this involves some additional problems that do not occur in the same way in monolithic systems. A principal question to be addressed is the composition of the property “safety”. Is it permissible to say that a system is as safe as its components together (analogously as the combinatorial reliability models judge system reliability from component reliability)? A small part of the system, in particular a piece of software, cannot do harm to the environment and thus cannot be unsafe. We find that safety as a property is not defined on an arbitrarily low granularity level and thus fine-grained components do not possess a quality attribute “safety” [4]. However, the influence of component behavior, in particular software behavior on the safety of the whole system cannot be argued. In particular component failures can compromise the safety of the system. In real-time systems this applies to timing failures as well as to value failures. More exactly, safety violations result from failures that propagate to the system boundary. Thus, component-based safety analysis means to conclude system safety from component behavioral models. On a higher abstraction level, the conclusion is drawn from various quality properties of the components (e.g. correctness, availability, reliability) to system safety. For instance, the availability of a protective device such as a car airbag or a fire detector directly influences the safety of the containing system. Consequently, the techniques applied at the component-level do not need to necessarily be proper safety analysis techniques; analyzing reliability or correctness of components can be a part of the overall safety argument and according techniques can be applied on the chocomponent-level. The question is which techniques to choose and how to integrate the results to a system safety case. Another finding is that components are usually not isolated but require services from other components to provide their service correctly. Therefore, not only internal failures, but also failures that are propagated from a foreign component can cause a component to produce failures. The next issue concerns the development process: safety analysis techniques must integrate into the overall development process of the embedded system. In the case
258
Lars Grunske, Bernhard Kaiser, and Ralf H. Reussner
of component-based design this means in particular that concurrent development at different places and design for later reuse in an unforeseen environment have to be considered. The different modelling techniques used within the same project should be compatible, which can be achieved e.g. by integrated tool-chains or model export and import facilities. Another big challenge is complexity. Component-based engineering is often applied to systems that are too complex to be understood in one piece. For example, a system composed out of 10 components with only 2 state each has a state space of 1024 states; one can easily imagine the consequences for real-world systems with lots of states for each component. This problem is referred to as state-explosion. However, not only statebased approaches, but also other techniques suffer from complexity problems, e.g. by an excessive amount of causal chains that hampers the readability of the model. This leads us to the correlated question of scope and granularity: It is impossible to consider all states and all behavioral aspects of a system. The challenge is to find the right abstraction level that makes a model expressive enough and yet analyzable. We found that on the one hand, informal or even combinatorial models are sometimes not sufficient, but on the other hand, composing an integrated behavioral model and analyzing all possible sequences of actions, including failures, is far beyond feasibility. Techniques on a practical granularity level and with a limited scope (i.e. expressing just the facts of interest) are necessary. 4.2 Requirements Being aware of these particular problems we now map out some requirements that will help us to classify the safety analysis techniques that we will present in the subsequent sections. Requirement 1 (Appropriate Component-Level Models) Each component must be annotated with an appropriate evaluation model. Component-based safety analysis should decompose the system according to its architecture and then annotate each component with an appropriate model. The system level analysis technique must finally integrate the results from all component annotations to a sound safety case for the whole system. Different components may be implemented by different technologies and ideally it should be possible to choose for each component the most appropriate modelling technique. Due to the embedded nature of the systems, this includes techniques that are suitable for software and hardware aspects. The techniques should be able to describe the correct behavior and failure behavior by appropriate means. Further, we saw that the property safety on system level is influenced by different aspects of the component failure behavior, for instance by quality properties such as reliability, availability, timeliness and correctness on component-level. Accordingly, attaching models for different quality properties to different components in order to validate each of these properties by an appropriate technique is a suitable approach. Requirement 2 (Encapsulation and Interfaces) The notation for the evaluation models should allow encapsulation and composition by interfaces similar to componentbased design notations.
Safety Properties in a Component-Based Process
259
Many current component-based design notations (such as ROOM [17] or UML2.0) offer mechanisms to define components as closed capsules and ports that serve as points of information exchange between components. These ports define the externally visible interface of a component. Their semantics varies with the different modelling techniques; examples are – incoming and outgoing messages or signals – incoming and outgoing continuous data flows – provided and required services In these design frameworks, it is usually possible to refine components recursively and to integrate components to new components. Every component can be exchanged by another with the same interface. The internal implementation, i.e. everything that does not belong to the interface, is hidden from the environment. An appropriate syntax and type systems for interfaces allows one to check all component interconnections automatically for consistency. If even a formal semantics is associated to the interface notation (as it is the case in interface automata, for instance) it is possible to derive the system semantics from the component semantics and the topology. A similar construction principle is also desirable for component safety evaluation models. The interfaces of the component safety evaluation models should correspond as closely as possible to the interfaces used in the functional models from systems design phase. Requirement 3 (Dependencies on External Components) Safety analysis techniques must be able to express the dependencies of failures regarding provided services on failures regarding required services and on internal failures of the component. Due to the fact that most of the components are not self-contained and require external components to operate, the failure modes of the provided services depend on the failure modes of the provided services by other components. In consequence the failure probabilities of the provided services of a component are a function of (a) the probabilities of internal failure generation and (b) the probabilities of failures of the external environment the component interacts with. Requirement 4 (Integration of Analysis Results) A composition algorithm is required that constructs the evaluation model for a hierarchical component-based on the architecture and the evaluation models of the components used. The aim of constructing a safety model for a system is not only to visualize the system for better understanding, but also to run analysis algorithms on it, for instance to calculate the probability of the system level hazard. Therefore it is necessary that the algorithms are composable, i.e. the results from component analysis can be integrated to the results (e.g. safety critical failure probability) on the system level. Some compositional techniques are only analyzable after the final integration and suffer from the combinatorial explosion of the state space. Ideally the analysis algorithms allow simplifications and calculations of immediate results on the component-level. The advantage is that each time the component is reused only a part of the calculations has to be redone and the performance is thus acceptable. The integration of results from different
260
Lars Grunske, Bernhard Kaiser, and Ralf H. Reussner
modelling techniques should be automated to a high degree, as manual copy or translation between different formats is error-prone and compromises the integrity of a safety analysis. Requirement 5 (Practicable Granularity) The techniques applied should be on the one hand rich enough in details to express how different kinds of component behavior can influence system safety, but on the other hand coarse enough to allow affordable analysis on the system level. Regarding granularity and scope, a compromise between expressive power and analyzing effort must be found. The approach of exhaustive modelling of all possible behavior traces to explain how a system level hazard can occur is infeasible.A plain parts count approach (system works correctly if all components work correctly) which is sometimes used in reliability analysis is not sufficient to show how components interact and how for instance a safety subsystem mitigates failures of other components. Often the two state abstraction (working versus failed) in combinatorial techniques is too coarse, but a full state based approach is not manageable due to the state-explosion-problem. A compromise could be to classify failures according to a few categories, which still allows to formulate simple causal relations between failures of different classes at different interfaces. In the case of state-based approaches it is often not feasible to examine the whole state space as determined by the functional model of the system. Instead, it is preferable to take a coarser approach by only modelling the states that are involved in a safety-relevant scenario. Requirement 6 (Tool Support) The safety analysis technique should be supported by appropriate and ergonomic tools. Some of the safety analysis techniques (FMEA, for instance) were originally designed as paper-and-pencil methods. In the present context, manual application of techniques is not practicable. First, systems that are designed by components are usually complex systems, so only computer based tools allow humans to handle systems of high complexity without making errors. Important aspects are project browsing and history tracking facilities, model design assistants and consistency checking, ergonomic user interface and structured graphical representation. Second, one main purpose of components is to design them at different places (division of labor) or at different times (reuse). Traditionally, when one team at one place created a model, intuitive knowledge and implicitly agreed assumptions helped to overcome ambiguities. In the componentbased process, when working at different places or when reusing a component that has been created years before, the lack of this direct communication will likely lead to misinterpretation. By enforcing a well-defined model syntax (and ideally also semantics) and by capturing all aspects of the model in a file or database, computer based tools help to create reusable and exchangeable component analysis. 4.3 Running Example To explain how a safety evaluation system built with components works in practice, we present a steam boiler system as a running example. The left part of figure 2 shows a
Safety Properties in a Component-Based Process
261
schematic, similar to those process engineers use to describe the hardware of the plant. The process plant incorporates the pressure tank, a triple-redundant pressure sensor and a double-redundant safety valve. Further the system contains a software controller that implements a two-out-of-three voter for the sensors and gives command to open both valves if a pressure higher than the allowable level is detected. The voter pattern assures that if at least two out of the three sensors indicate the right value, the controller takes the correct decision. Furthermore, each of the valves is sufficient as a pressure relief, so if one fails, the system is still safe. In subsequent sections we will also discuss a variant of the example where it is possible to select either voter mode (three sensors) or single-sensor mode. The right part of the figure shows a structure diagram, as an
P
S e n s o r 1
P
C o n tr o lle r
S e n s o r 2
P
:S y s te m V 1 : V a lv e
V 2 : V a lv e
S e n s o r 3 C : C o n tr o lle r
S 1 :S e n s o r
S 2 :S e n s o r
S 3 :S e n s o r
V a lv e 1 V a lv e 2
Fig. 2. Steam Boiler schematics and structure diagram
embedded systems engineer would use it to describe the system. The structure diagram describes the static architecture of a system, consisting of components and interconnections between these. During design phase, models for the behavior are attached to the components, for example state machine models that describe the reaction of components to trigger signals received via its ports [17]. During the construction phase, only the intended behavior is of relevance. Safety analysis in contrast focuses on possible deviations from the intended behavior that lead to hazardous situations. By definition, the ports of the components are the only spots where information is exchanged and the interconnections in the structure diagram are the paths of information flow. Consequently these are also the spots where failures are propagated between components. The idea behind the component-based safety analysis techniques discussed in the following subsections is to exploit the system architecture for safety analysis by attaching models for failure generation to the components.
262
Lars Grunske, Bernhard Kaiser, and Ralf H. Reussner
4.4 Failure Propagation and Transformation Notation (FPTN) The Failure Propagation and Transformation Notation (FPTN) described in [3, 18] is one of the first approaches that introduce modular concepts for the specification of the failure behavior of components. The basic entity of the FPTN is a FPTN-Module. This FPTN-Module contains a set of standardized sections. In the first section (the header section) for each FPTNmodule an identifier (ID), a name and a criticality level (SIL) is given. The second section specifies the propagation of failures, transformation of failures, generation of internal failures and detection of failures in the component. Therefore, this section enumerates all failures in the environment that can affect the component and all failures of the component that can affect the environment. These failures are denoted as incoming and outgoing failures and are classified by the failure categorization presented above (reaction too late(tl), reaction too early(te), value failure(v), commission(c) and omission(o)). In the example which is given in figure 3 the incoming failures are A:tl, A:te, A:v,and B:v and the outgoing failures are C:tl, C:v, C:c and C:o. The propagation and transformation of failures is specified inside the module with a set of equations or predicates (e.g for propagation: C:tl=E:tl and for transformation C:c=A:te&&A:v and C:v=A:tlB:v). Furthermore a component can also generate a failures (e.g C:o) or handle an existing failure (e.g B:v). For this it is necessary to specify a failure cause or a failure handling mechanism and a probability. FPTN-Modules can also be nested
Fig. 3. Abstract FPTN-Module
hierarchically. Thus, FPTN is a hierarchical notation, which allows the decomposition of the evaluation model based upon the system architecture. If a FPTN-module contains embedded FPTN-modules the incoming failures of one module can be connected with the outgoing failures of another module. Such a connection can be semantically interpreted as a failure propagation between these two modules. For the evaluation of an FPTN-module a fault tree is constructed for each outgoing failure based on the predicates specified inside the FPTN-module. As a result of this interpretation, a FPTN-module can be seen as a forest of fault trees, where the leaf
Safety Properties in a Component-Based Process
263
nodes and their probabilities are extracted from the failure generation and the failure handling section inside the FPTN-module. To show the applicability of the FPTN in figure 3 the failure behavior of the Steam Boiler System (c.p. section 4.3) is modelled. To keep the considerations simple, we assume only a few failure modes: A sensor fails with a value failure (wrong pressure indicated) if a mechanical or an electrical failure occurs. A valve can fail to open (omission) for electrical or mechanical reasons, but also as a result of a missing command (omission at the input failure cmd). The controller fails to give the open commands (omission) either if at least two of the connected sensors give wrong signals (value failure corresponding to P 1, P 2 or P 3) or if there is an internal hardware defect. Based on these assumptions, for each componentused a FPTN-module is created, which describes the failure behavior. These created FPTN-modules are embedded into the FPTN module “Steam Boiler System” and connected with respect to the possible failure propagation. For the evaluation of the safety properties the failure probability of both outgoing failures Open.o need to be calculated. As described earlier, this can be performed by an analysis of the corresponding fault trees.
Steam Boiler System
ID=System
ID=S1
Sensor
ID=S2
Sensor
Sensor
Pressure:v
ID=C P1:v Pressure:v
Transformation Pressure:v = Intern1|| Interen2 Internal ntern1 Generated by [Electrical Defect] with [Probability=0.1]; Intern2 Generated by [Mechanical Defect] with [Probability=0.1];
P2:v
P3:v
Pressure:v
SIL=4
Controller
SIL=4
Transformation Cmd:o = Intern1|| (P1:v&&P2:v || P1:v&&P3:v || P2:v&&P3:v) Cmd:o Internal Intern1 Generated by [Hardware Defect] with [Probability=0.1];
ID=V2
SIL=4
Valve
Propagation Open:o Open:o = Command:o || Intern1|| Intern2 Cmd:o Internal Intern1 Generated by [Electrical Defect] with [Probability=0.1]; Intern2 Generated by [Mechanical Defect] with [Probability=0.1];
SIL=4
Transformation Pressure:v = Intern1|| Interen2 Internal ntern1 Generated by [Electrical Defect] with [Probability=0.1]; Intern2 Generated by [Mechanical Defect] with [Probability=0.1];
ID=S3
ID=V1
SIL=4
Transformation Pressure:v = Intern1|| Interen2 Internal ntern1 Generated by [Electrical Defect] with [Probability=0.1]; Intern2 Generated by [Mechanical Defect] with [Probability=0.1];
SIL=4
Valve
SIL=4
Propagation Open:o = Command:o || Intern1|| Intern2 Cmd:o Internal Intern1 Generated by [Electrical Defect] with [Probability=0.1]; Intern2 Generated by [Mechanical Defect] with [Probability=0.1];
Fig. 4. Steam Boiler Example (FPTN)
Open:o
264
Lars Grunske, Bernhard Kaiser, and Ralf H. Reussner
4.5 CFT Fault Tree Analysis is one of the most popular safety analysis techniques. Unfortunately they provide only a restricted decomposition mechanism: the decomposition into independent subtrees, called modules. To be compatible to the architecture model that shall serve for automatic construction of the safety case, the models for the failure behavior must be attachable to the components and account for the assignment of incoming and outgoing failures to the ports. They must take into account that the components are in general not independent from each other because the ports are access points for possible influences from other components. FTs are compositional in the sense that independent subtrees can be cut off and handled separately. Technical components however are typically influenced by other components and thus this assumption does not hold. To allow for a modularization that corresponds to the component and port concept, an extension of FTs has recently been proposed [19]. It is called Component Fault Trees (CFTs) and allows defining partial Fault Trees that reflect the actual technical components. These CFTs can be modelled and archived independently from each other. Input and output failure ports glue these parts together. While traditionally independent subtrees were regarded as compound events, CFT are treated as a set of propositional formulas describing the truth-values of each output failure port as a function of the input failure ports and the internal events. CFTs can be acyclic graphs with one or more output failure ports. Each component constitutes a name space and hides all internal failure events from the environment. Components can be instantiated in different projects. Thus all necessary preconditions for an application of FTA to component-based systems are fulfilled. To model potential failures, CFTs for each component-class are generated. This is a manual task that is conveniently performed on a graphical CFT editor. Each CFT has input failure ports and output failure ports that must be associated to failure categories with respect to messages or services at the ports of the corresponding componentclasses. Between input failure ports and output failure ports the failure propagation or transformation and the internal failure generation of the component-classes are modelled. For the components in the steam boiler example (c.p. section 4.3) this leads to the CFTs, which are presented in figure 5, if the same failures modes and internal faults are assumed as given in the FPTN Section (c.p. figure 4). The CFTs given so far allow in conjunction with the structure diagram to integrate the system level CFT. However,
Valve
Open. Omission
>=1
Sensor
Pressure. Value
Command. Omission
Controller
>=1 Electrical Defect Mechanical Defect
>=1
Electrical Defect Mechanical Defect
2 out of 3
P1.Value
P2.Value
Command.Omission
Fig. 5. Controller, Valve and Sensor CFTs
Hardware Fault
P3.Value
Safety Properties in a Component-Based Process
265
before starting the analysis, another manual step is necessary: The user must complete the system-level Fault Tree by specifying which system hazard is to be examined. This can be performed using the graphical editor of the CFT analyzer. The resulting Fault Tree is shown in figure 6, which is a screen shot taken from our analysis tool UWG3 that will be introduced in the following section. The lower part of the structure has been generated automatically, the top-event and the AND gate have been added manually by the user. The AND gate attached to the failure output ports V1open.omission and V2open.omission specifies that if both valves fail to open when expected, the hazard to be examined is present. Assuming all events to have constant failure probability of 0.1 we calculated the hazard probability to 0,214 using the tool UWG3.
Fig. 6. System CFT
4.6 Safety Analysis with Parametric Contracts In the following, we describe how to model safety properties in the interface of a component, using the “Quality of Service Modeling Language” (QML) [20]. This language allows to define arbitrary quality dimensions. We define failure class definitions as quality dimensions. As the QML assumes that quality attributes are fixed values for a component and neglects the context-dependency of quality attributes, we then couple that notation with an analysis technique called “Parametric Contracts”. Parametric contracts allow to model the context-dependencies of a component’s safety attributes and thus the analysis of component-based systems. Parametric contracts have been used for general
266
Lars Grunske, Bernhard Kaiser, and Ralf H. Reussner
reliability modelling before [21, 22], but are specialized for safety analysis here. The following section provides original research results. As current interfaces (“signature-list based interfaces”) specify the well-behavior of a component service (i.e., the behavior exposed without failures), these interfaces are unsuitable for specifying or analyzing failure-propagation through a system. Therefore signature-list interfaces have to be extended by two dimensions: (a) a specification of failure classes and (b) a specification of the dependency of a service’s failure behavior on the failure behavior of its context. The inclusion of failure classes specification into a service signature can be done by the QML. The QML allows the specification of quality dimensions as well as the specification of Quality of Service contracts (QoS contracts, for short) specifying the actual provided or required service quality for the dimensions defined before. In the following, for each of the failure types introduced in section 2 a “contract type” (i.e., quality dimension) is defined. type TooEarly = contract {numberOfFailures type TooLate = contract {numberOfFailures type IncorrectValue = contract {numberOfFailures type Commission = contract {numberOfFailures type Omission = contract {numberOfFailures
: <<decreasing>> no / year;} : <<decreasing>> no / year;} : <<decreasing>> no / year;} : <<decreasing>> no / year;} : <<decreasing>> no / year;}
The above list assigns to each failure type the unit numberOfFailures which is measured by the number of occurrences per year (no / year). The keyword decreasing denotes that lower values relate to a higher “quality of service”. This is important to know when matching component interfaces. In case two values are not the same, one has to know whether a higher or a lower value is acceptable. The second extension of signature-list interfaces is modelling the context dependency of failures. Basically, for any failure of the above failure types, there are three causes: Internal service error: a bug in the service’s code causes a failure. External call error: a call to an external service causes a failure. External calls can go to services of other domain components or to services provided by the run-time environment (operating system, middleware, etc.) External interruption error: the run-time environment stops or interrupts the execution of the service pre-emptively and causes a failure. In the following a failure type is denoted by f t ::= tl|te|v|c|o for the failure types TooEarly, TooLate etc. The term PX (Y ) is used to denote the probability that the event which is specified by the corresponding subscript X and parameter Y occurs. The subscripts is, es and ei signify internal service error, external call error and external interruption error, respectively. One can assume that on each execution trace of the software
Safety Properties in a Component-Based Process
267
at most one failure occurs. This assumption is justified by the fact that failure probabilities are very low. This assumption allows us to simply sum up the failure probabilities for all possible failure causes, restricting the analysis effort to linear equations. Therefore, the probability that service A fails with a failure of failure type f t is Pf t (A) = Pisf t (A) + Pesf t (A) + Peif t (A)
(1)
Here, Pisf t (A) is the probability that A fails (with a failure in failure type f t) because of an internal service error, Pesf t (A) because of an external service error and Peif t (A) an external interruption failure. If 5 failure types have been defined, then 5 equations of this style are required. It is assumed that failures of one failure type affect only consecutive failures of the same type, e.g. if some service is provided too late, it may cause other services to be provided too late as well, but not too early or with a wrong value. This assumption holds in many practical cases. In principle however, each initial failure can result in a failure of any of the above types. To capture this, the linear equations could be extended, which in turn increases the analysis effort. When modelling the dependencies of the component environment on the failure probability (for each failure type), the latter two terms of the above equation are important (as the first one is internal). Hence, the following considerations deal with Pesf t and Peif t . In both cases, we need information on what happens if method A is called. In the case of determining the probability of an external service error, one needs to know which external services are called (and how often they are called). In the case of the external interruption error probability one needs information on the length of the execution and assume that the chance of an interruption is proportional to the execution time. Both kinds of information are given by a so-called service effect automaton [23]. A service effect automaton (SEA) is a finite state machine, describing for each service implemented by a component, the set of possible sequences of calls to services of the context. Therefore, a service effect automaton is a control-flow abstraction. Controlstatements (if, while, etc.) are neglected, unless they concern calls to the component’s context. As the SEA is an automaton, it accepts a language. As the input symbols of the SEA are names of the external services called, a word of the language is a trace of service calls. By traces(SEA) the set of traces of the SEA is denoted (which is the language accepted by the SEA). Figure 7 presents the SEA of the control process of the steam boiler controller. We refer to the variant of our example where the user can select between voter mode and single sensor mode. The automaton in the figure presents an abstraction of the software control-flow of the boiler control process. It first reads the value of pressure sensor 1 (Read:P1), then it calls the user-interface to determine whether the user selected 2-of-3 three voting mode or single sensor mode. According to this selection (let us assume a probability [u]for voter mode), either the other sensors are read and then the valve commands are issued or the valve commands are issued directly after the first pressure sensor reading. Hence the set of all traces is traces(SEAControlP rocess ) = {(Read : P 1, Read : U I, (Read : P 2, Read : P 3, Cmd : V alve1)|(Cmd : V alve1), Cmd : V alve2)n |n ∈ N}. For our purpose the SEA is extended to a Markov model, i.e., each transition is annotated with a transition probability (while the constraint holds, that for any state the
268
Lars Grunske, Bernhard Kaiser, and Ralf H. Reussner
Fig. 7. Service Effect Automaton of the Steam Boiler Controller
sum of the probabilities of outgoing transitions never exceeds one.) As a result one has for each tr ∈ traces(SEA) a function P (tr) giving the probability of tr occurring in the SEA. Since execution traces of the main function in real-time systems usually are loops (repeating themselves over and over again until the device is switched off), let us first regard just the individual runs, and consider repetition later. Services that are called by the main function have one start and one end point. In our example SEA, showing the main loop, one finds two branches and thus two possible traces per run and get the probability P (tr) = u for the trace tr = Read : P 1, Read : U I, Read : P 2, Read : P 3, Cmd : V alve1, Cmd : V alve2 and P (tr) = (1 − u) for the trace tr = Read : P 1, Read : U I, Cmd : V alve1, Cmd : V alve2. On each trace, services are called and these services can cause a failure of one of the known failure type (still assume that failures are so improbable there is at most one failure per run.) Now one has to add up the failure probabilities of all externally called services e in each trace tr to get the failure possibility related to this trace under the condition that this trace is taken. Pesf t (tr) :=
Pf t (e)
(2)
e∈tr
To get the total probability for one run of the main function A we refer to the definition of conditional probability. This allows one to specify the probability Pesf t that a failure of type ft occurs in an arbitrary trace tr as follows: Pesf t (A) :=
P (tr) ∗ Pesf t (tr)
(3)
tr∈traces(SEA(A))
This means, Pesf t (A) sums for all possible traces the product of the probability that tr is executed and the probability that during one execution of tr a failure of f t occurs. Regarding the main loop that runs continuously, one finds that the probability that after n runs no failure has occurred is the product of the probabilities that there is no failure in the first run, no failure in the second run and so on until n. Consequently, denoting n runs of A as An , the probability Pf t (An ) is defined as follows: Pesf t (An ) := 1 − (1 − Pesf t (A))n
(4)
Safety Properties in a Component-Based Process
269
The remaining step in order to obtain the failure probability per time unit is to estimate the number n, i.e. the number of main loop runs during one time unit. This task is feasible because the main loop is usually scheduled on a known and regular time basis, e.g. every 20 ms. If the main loop immediately starts again after completion, the number of runs per time unit can be estimated from the execution time per loop. As a result one obtains for each failure type the probability per execution time, that a failure that is caused by a call to a foreign component occurs. For practical application the method can be refined by correcting terms, e.g. the probability that the failure from the called component causes harm to the caller. These terms have to be specified by the component implementer while the function occurrence probabilities depends entirely on the usage context of the component. The probability of an external interruption error Peif t is modelled in linear dependency on the length of the service’s execution code trace. In principle, the length of the code execution trace depends on the actual path the control-flow takes through the code. The probability for a specific path taken is given by the transition probabilities of the service effect specification. The only missing information for specifying the control flow path length (in number of instructions) is the number of instructions associated to each transition and the number of instruction associated to each state of the service effect specification. If the service effect specification is derived from existing component code, this data is available and simply needs to be attached to the service effect specification. However, without having the service implementation at hand for analyzing its code, these figures might be hard to estimate in advance. Note, that this dependency of component specifications on the actual implementation makes us talking on component implementation instances rather than component types. Mathematically, one models the influence of external interruption errors as a linear function mapping each implemented service A to the probability that a failure of failure type f ti occurs. Again, we refer to the definition of conditional probability. P (tr) ∗ L(tr) ∗ Pf t (tr) (5) Peif t (A) := tr∈traces(SEA(A))
Formally, it sums over all possible traces the product of the probability that the trace is executed (P (tr)) and a measure for the length of the trace (L(tr)) and the probability Pf t (tr) that the occurrence of an external interruption error results in an failure of failure type f t. After these definitions, it is time to step back and to consider practical issues. First, lets summarize what our model needs as inputs: 1. The service effect automaton (SEA), a Markov model (i.e., having for all traces tr the value P (tr)). See [21] for a detailed discussion how to yield that data by a combination of code analysis, monitoring or simply educated guessing. However, even if the component vendor does not provide the SEA it can be generated aposteriori out of an existing component. In addition one needs L(tr), the length of a trace. But this is also given by the code of a component.
270
Lars Grunske, Bernhard Kaiser, and Ralf H. Reussner
2. The failure probability Pf t (e) for each external service and each failure type f t. This data has to be provided by the component deployer as it is part of the component context. It can be measured (for basic operations) or predicted by using the presented model itself. 3. The probability Pf t that a failure of f t is caused by an external interrupt. The second question of practical concern is how to evaluate the above formulas. The main problem is that the number of traces can be infinite, hence the sums given above cannot be simply evaluated within a loop. (Even worse, one has to show their convergence). Therefore, we refer to the Markov chain analysis for service effect automata extended to Markov models, as described in [21, 22].
5
Evaluation of Safety Analysis Techniques
In the following we classify and evaluate the techniques presented above according to the requirements for safety analysis methods as introduced in section 4.2. 5.1 FPTN Requirement 1: Appropriate Component Level Models: As presented in [3] the failure propagation and transformation notation provides a simple but comprehensive annotation of the failure behavior of a component. These annotations are easy to understand and to analyze. However, failures are only differentiated according to the given five categories. Requirement 2 + 3: Encapsulation and Interfaces and Dependencies on External Components: A FPTN-Module is encapsulated and provided with the incoming and outgoing failures a well defined interface to its environment. To specify the relation of these incoming and outgoing failures, the failure transformation and propagation predicates are used. Based on these predicates the dependencies of the failure behavior of the modeled component from its environment is defined. Requirement 4: Integration of Analysis Results: For a hierarchical composition of FPTN-modules it is necessary to specify which failures are propagated between components. To identify this information currently no systematic procedure is specified in literature. Requirement 5: Practical Granularity: The notation of the failure propagation and transformation notation utilizes the five relevant failure types [2] (reaction too late, reaction too early, value failure, failure of commission and failure of omission). However, the architect of a component can decide which failure types and relations between these failure types are really needed. Due to this the granularity is define by the user of the notation and thus even for a complex system the safety properties are still analyzable. Requirement 6: Tool Support Up to now there is no commercial tool that supports the specification and evaluation of FPTN-modules.
Safety Properties in a Component-Based Process
271
5.2 CFT Requirement 1: Appropriate Component Level Models: Similar to the FPTN the CFTs provide a simple but comprehensive annotation of the failure behavior of a component. The expressive power is restricted to combinatorial logic. Requirement 2: Encapsulation and Interfaces: Each CFT is encapsulated and failure ports are used as interfaces to the capsules. These failure ports are separated into input and output failure ports. Components are reusable entities which makes the technique appropriate for component-based development processes. Requirement 3: Dependencies on External Components: To describe the dependencies on external components the input failure ports are used. If they are connected with an output failure port of another component, the associated failures are propagated between these two components. Requirement 4: Integration of Analysis Results: Due to their structure, component fault trees are hierarchically decomposable. That means the CFT of a component can contain the CFTs of the embedded components. Furthermore, the embedded CFT can be automatically connected, based on the interface specifications of the embedded components and a construction algorithm, which is presented in [24]. The quantitative analysis is usually performed by Binary Decision Diagrams (BDDs) [25] and the BDD fragments for each component can be automatically flattened to one analyzable BDD. Requirement 5: Practical Granularity: Similar to the FPTN, component fault trees utilizes the five relevant failure types and the architect can decide which failure types and relations between these failure types are modeled within the CFT. Due to this, the granularity is defined by the user and thus even for a complex system the safety properties are still analyzable. Requirement 6: Tool Support The specification and evaluation of the CFTs is supported by a commercial tool, called UWG [19]. It has been developed in co-operation between the Hasso-Plattner-Institute and the companies Siemens and DaimlerChrysler for the last two years and has been used in several industrial projects where it proved its intuitive handling. It incorporates all previously mentioned features of the Component Fault Tree concept. UWG3 provides an efficient analysis algorithm that makes use of BDDs to efficiently represent even large CFTs. 5.3 Parametric Contracts Requirement 1: Appropriate Component Level Models: As parametric contracts are specified by the service effect automata, the composition of the notation is given by the recursive composition of service effect automata. For that composition, a transition marked by a call to an external method (read access or command) is replaced by the service effect automaton of that call. That construction of substituting transitions in a reversible way by service effect automata is shown in detail in [23]. However, Parametric Contracts are tailored only to a certain class of measurable quality properties. Requirement 2: Encapsulation and Interfaces: The service effect automata are used to describe the interface of a component. Requirement 3: Dependencies on External Components: This requirement is fulfilled, as the service automata explicitly models call to external components. Their fail-
272
Lars Grunske, Bernhard Kaiser, and Ralf H. Reussner
ure probabilities are explicitly considered in the analysis. Therefore, this requirement is fulfilled. Requirement 4: Integration of Analysis Results: As the service effect automata are again service effect automata (see above), one can apply the same analysis techniques. In fact, for given service effect automata, previously computed failure probabilities of their traces traces can be used directly for the analysis of the composed service effect automaton (even without explicitly constructing the composition). Requirement 5: Practical Granularity: Service effect automata abstract from internal computations and the influence of parameters on the failure probability of calls. This is only valid, if the parameters have no influence on the failure probabilities. This is the case e.g. if parameters are fixed or not existent (as in our example). However, the validity of this abstraction is not always given and its presence has to be validated. However, current research is concerned with more detailed usage profile models, taking parameters into account. Requirement 6: Tool Support Tool support for the specification of parametric contracts is currently developed by the Palladio research group in Oldenburg. Currently, the analysis is not supported by dedicated programs. Commercial tools for safety analysis are currently not available. 5.4 Comparison of the Three Evaluation Notation Concluding the evaluation we present a table 1 a comparison of the three componentbased analysis techniques for safety properties. In this table we assign a quality mark ranging from −− (requirements are not fulfilled) to ++ (requirements are completely fulfilled) up to our knowledge for each analysis technique for each requirement.
6
Conclusions
In this chapter, we have investigated the applicability of the component-based software engineering paradigm to the domain of safety critical systems. For that reason, we have discussed the relevant problems in detail and given an overview of current approaches and research covering this problem domain. As a result, we have identified a set of requirements that are needed to evaluate safety properties for a system built with components. These requirements are used to compare the state of the art specification Table 1. A Comparative Evaluation Requirement
FPTN
CFT
Appropriate Component Level Models Encapsulation and Interfaces Dependencies on External Components Integration of Analysis Results Practicable Granularity Tool Support
+ ++ ++ − + −
+ ++ ++ ++ + +
Param. Contracts + + ++ + + −
Safety Properties in a Component-Based Process
273
techniques that allow for the evaluation of the probability of hazards or safety critical failures. These specification techniques are Component Fault Trees (CFTs), Parametric Contracts and Failure Propagation and Transformation Notation Modules (FPTN Modules), which have partly been developed by the authors of this chapter and partly by other researchers. Each of these three evaluation notations has its own strengths and limitations. To increase these strengths and to reduce the limitations we try to unite the features of the three evaluation notations, which will ideally lead to a unified notation that completely fulfills all requirements that are given in this chapter.
References 1. Szyperski, C.: Component Software: Beyond Object-Oriented Programming. ACM Press, Addison-Wesley, Reading, MA, USA (1998) 2. Bondavalli, A., Simoncini, L.: Failure Classification with respect to Detection. Esprit Project Nr 3092 (PDCS: Predictably Dependable Computing Systems) (1990) 3. Fenelon, P., McDermid, J., Nicholson, M., Pumfrey., D.J.: Towards integrated safety analysis and design. ACM Computing Reviews, 2 (1994) 21–32 4. Leveson, N.G.: SAFEWARE: System Safety and Computers. Addison-Wesley Publishing Company (1995) 5. CENELEC (European Committee for Electro-technical Standardisation): CENELEC EN 50126: Railway Applications – the specification and demonstration of Reliability, Availability, Maintainability and Safety. CENELEC EN 50128: Railway Applications: Software for Railway Control and Protection Systems CENELEC, Brussels (2000) 6. SAE ARP 4754 (Society of Automotive Engineers Aerospace Recommended Practice): Certification Considerations for Highly Integrated or Complex Aircraft Systems (1996) 7. Department of Defense, United States of America: Military Standard 882C. System Safety Program Requirements (1999) 8. Deutsches Institur f¨ur Normung e.V.: DIN 25419: Ereignisablaufanalyse, Verfahren, graphische Symbole und Auswertung (German Standard) (1985) 9. IEC 60812 (International Electrotechnical Commission): Functional safety of electrical/electronical/programmable electronic safety/related systems, Analysis Techniques for System Reliability - Procedure for Failure Mode and Effect Analysis (FMEA) (1991) 10. IEC (International Electrotechnical Commission): Hazard and operability studies (HAZOP studies) - Application guide (2000) 11. UK Defence Standardization Organisation: Defence Standard 00-58, HAZOP Studies on Systems Containing Programmable Electronics, Part 1 and 2 (2000) 12. DIN 25424 (Deutsches Institut f¨ur Normung e.V.): Fault Tree Analysis: Part 1 (Method and graphical symbols) and Part 2 (Manual: calculation procedures for the evaluation of a fault tree) (1981/1990) 13. IEC 61025(International Electrotechnical Commission): Fault-Tree-Analysis (FTA) (1990) 14. Vesely, W.E., Goldberg, F.F., Roberts, N.H., Haasl, D.F.: Fault Tree Handbook. U. S. Nuclear Regulatory Commission (1996) 15. Mauri, G.: Integrating Safety Analysis Techniques, Supporting Identification of Common Cause Failures. PhD thesis, Department of Computer Science, University of York (2001) 16. IEC (International Electrotechnical Commission): IEC 61165: Application of Markov techniques (1995-2003) 17. Selic, B., Gullekson, G., Ward, P.: Real-Time Object Oriented Modeling. John Wiley & Sons (1994)
274
Lars Grunske, Bernhard Kaiser, and Ralf H. Reussner
18. Fenelon, P., McDermid, J.A.: An integrated toolset for software safety analysis. Journal of Systems and Software 21 (1993) 279–290 19. Kaiser, B., Liggesmeyer, P., M¨ackel, O.: A new component concept for fault trees. In: Proceedings of the 8th Australian Workshop on Safety Critical Systems and Software (SCS’03), Adelaide (2003) 20. Frolund, S., Koistinen, J.: Quality-of-service specification in distributed object systems. Technical Report HPL-98-159, Hewlett Packard, Software Technology Laboratory (1998) 21. Reussner, R.H., Poernomo, I.H., Schmidt, H.W.: Reasoning on software architectures with contractually specified components. In Cechich, A., Piattini, M., Vallecillo, A., eds.: Component-Based Software Quality: Methods and Techniques. Number 2693 in LNCS. Springer-Verlag, Berlin, Germany (2003) 287–325 22. Reussner, R.H., Schmidt, H.W., Poernomo, I.: Reliability prediction for component-based software architectures. Journal of Systems and Software – Special Issue of Software Architecture - Engineering Quality Attributes 66 (2003) 241–252 23. Reussner, R.H.: Automatic Component Protocol Adaptation with the CoCoNut Tool Suite. Future Generation Computer Systems 19 (2003) 627–639 24. Grunske, L.: Annotation of component specifications with modular analysis models for safety properties. In: Proceedings of the 1st International Workshop on Component Engineering Methodology (WCEM), Erfurt (2003) 737–738 25. Bryant, R.: Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers 35 (1986) 677–691
Performance Evaluation Approaches for Software Architects Anu Purhonen VTT Technical Research Centre of Finland, P.O. Box 1100, FI-90571 Oulu, Finland [email protected]
Abstract. Performance analysis techniques have already been developed for decades. As software architecture research has matured, performance analysis techniques have also been adapted to the evaluation of software architectures. However, the performance evaluation of software architectures is not yet systematically used in the industry. One of the reasons may be that it is difficult to select what method to use. The contribution of this work is to define a comparison framework for performance evaluation approaches. In addition, the framework is applied in comparing existing performance evaluation approaches. The framework can be used to select methods for evaluating architectures, to increase understanding of the methods, and to point out needs for future work.
1
Introduction
In this work, performance evaluation means evaluation of both the time and resource behavior of the system. A number of techniques are available for evaluating hardware and software performance [1]. However, performance evaluation of software architectures has only been researched for a few years because software architecture research itself is relatively young. Architecture is the fundamental organization of a software system embodied in its components, their relationships to each other and to the environment [2]. Because software architecture has a fundamental role in the quality of the final product, software architecture evaluation can reveal critical problems in the design at an early stage of the development when modifications are still easy to make [3, 4]. Performance evaluation techniques are usually more well-known to developers of safety-critical systems such as avionics or medical systems. In non-safety critical systems, failure of a deadline does not have serious consequences such as loss of human lives, however, it may be harmful to the manufacturer’s business. In addition to just meeting deadlines, performance in non-safety critical products has become more and more important in terms of how good service the product gives to the users, for example, what applications can be concurrently run on a mobile phone. In the embedded systems, hardware has often been designed first and therefore it has also had a strong influence on software design. Nowadays, a hardware platform can be just one component in the system that is utilized in several products. On the other hand, the same software components can be used in various hardware platforms. The hardware and software are no longer designed separately but the system has to be designed as a whole. Another challenge in contemporary product development is C. Atkinson et al. (Eds.): Component-Based Software Development, LNCS 3778, pp. 275–295, 2005. c Springer-Verlag Berlin Heidelberg 2005
276
Anu Purhonen
that more and more components, either software or hardware, are purchased from third parties. Thus, it may be difficult for the integrator to know why a component behaves in a certain way during runtime. A third property of current systems that increases the complexity of estimating the performance of the system is that both software and hardware resources are reserved dynamically, based on the acute needs of the user. It may not be feasible to define beforehand what exactly is happening in the system in a specific moment of time, because there are too many variables. The concurrent and separate development of components means that several assumptions have to be made about interfacing components until more information is available. In addition, there may be uncertainty regarding requirements; for example, the standards that should be supported may be unfinished when the development starts. Software architecture is a natural way of supporting this kind of development. Architectural diagrams allow easy analysis of design alternatives before implementation is fixed. Several of the existing performance evaluation approaches are based on Queuing Network Models (QNM) [5–8], Petri nets [9–11], or process algebras [12, 13]. QNMs are constructed from service centers and queues. Service centers provide services and each service center has a queue attached to it. Compared to QNM, Petri nets and process algebras are more formal techniques. Petri nets describe the system using places, transitions, and tokens. Timing information has been added by a number of extensions to traditional Petri nets. Process algebras are algebraic languages that are used for description and formal verification of the functional properties of concurrent and distributed systems. Stochastic process algebras (SPA) are extensions to process algebras allowing analysis of performance properties. In addition to the above approaches, real-time system scheduling theory [14] has been used to address the issues of priority-based scheduling of concurrent tasks with hard deadlines. Moreover, the software architecture community provides the Architecture Trade-off Analysis Method (ATAMSM ) [4]. ATAM is a scenario-based method for evaluating architecture-level designs. ATAM considers multiple quality attributes, including performance. However, if detailed analysis is needed the method proposes that specialized performance evaluation techniques are used. Although performance is one of the main quality attributes in many application areas, at the moment performance evaluation is often not systematically used for supporting architectural design decisions in the industry [11, 15]. However, this does not mean that performance evaluation is not used at all. On the contrary, support for design decisions is gathered, for example, using benchmarks, simulation, prototyping, and analyzing worst case situations based on experience. Some specific problems of the current practices are that it is difficult to estimate the reliability of the results, the analysis cannot be easily repeated, and the results are not comparable with other, similar evaluations. Furthermore, each team or expert usually has their own practices and the results are not stored so that they could be utilized in other evaluations. The main reason behind the lack of acceptance of performance evaluation techniques is that they are considered to be difficult and time-consuming [9, 11, 15]. In order to make the methods easier to use and accept, developers are trying to connect the performance models directly to familiar architectural models and automate the whole
Performance Evaluation Approaches for Software Architects
277
process from the software architecture description to the evaluation results [16]. However, an additional problem from the non-performance specialist’s point of view is that it is difficult to select a method to use, because the capabilities and differences of the current methods are difficult to sort out from existing publications. The contribution of this work is a definition of a framework that can be used to compare different approaches to the performance evaluation. The definition is based on the requirements of the stakeholders of the evaluation. The framework is applied in comparing different types of performance evaluation approaches that have published examples how they have been used in software architecture evaluation. The following performance evaluation approaches were selected for comparison: – Rate-Monotonic Analysis (RMA) is a collection of quantitative techniques that are used to understand, analyze, and predict the timing behavior of the systems [14]. It is not one method developed by a certain group, but there are, for example, several tools that are based on RMA. – PASASM is a method for the Performance Assessment of Software Architectures [17]. PASA uses the principles and techniques of Software Performance Engineering (SPE) [3, 6]. – Layered Queuing Network modeling (LQN) [16, 18–20] is an extension to QNM. One research group is mainly developing this approach. – The evaluation of Colored Petri Nets (CPN) approach is based on just one case study [11] where CPN was used for evaluating alternative mechanisms and policies in an execution architecture. No further publications have been made of this approach so far but because CPN in general is widely accepted this is an interesting case study. – Because ATAM is probably the best-known software architecture evaluation method, it is included even though it is not necessarily a performance evaluation method. – Another approach based on just one publication is called the “metrics” approach in this work. Metrics were used for analyzing performance of the high-level software architecture of a telecommunication system [21]. In this approach, the analysis can be made without making any assumption of the implementation of the components. The chapter is organized as follows. Related work is presented in Section 2. Section 3 reviews the needs of performance evaluation in the software architecture development and the expectations of the stakeholders. Section 4 introduces the elements of the comparison framework. An overview of the approaches selected for comparison is presented in Section 5. Section 6 discusses the results of the comparison and Section 7 concludes with some proposals for future work.
2
Related Work
Balsamo and Simeoni [22] examine approaches to derive performance models from software architecture specifications. Although they include many methods in their analysis, the comparison is quite narrow. They only compare the approaches based on the notation that is used to describe the architecture, the architectural constraints, and the used performance model.
278
Anu Purhonen
Balsamo et al. in [1] introduce several notations that can be used to describe the behavior of a software system and the main classes of stochastic performance models. Based on these introductions, they present the integrated methods for software performance engineering. A summary of the characteristics is given using nine elements, three of which analyze the quality of the methods. They also classify the methods in a space of three dimensions: integration level of performance analysis in the software life cycle, automation degree, and integration level of the software model with the performance model. This survey is a good presentation of the available techniques for performance evaluation and in that way complements our survey. However, the survey appears to have started from the methods and their capabilities whereas our goal has been to start from the needs of the users and the system. A third survey from Balsamo and her colleagues [23] goes even further into the characteristics of queuing network models. Although it is very informative, it appears to be directed at the performance specialists and people who are familiar with queuing theories. Herzog and Rolia [13] compare the characteristics of layered queuing modeling and SPA in eight features that are close to some of the features we have used. However, they have not specifically concentrated on the needs of software architecture evaluation.
3
Performance Evaluation
The requirements for the performance evaluation techniques are derived from the features of the system and the needs and constraints of the development organization. Stakeholders are people who are interested in the results of the evaluation and the way the evaluation is performed. Stakeholders include, for example, hardware and software architects, software developers, managers, and marketing people. The hardware architect needs to know the amount of resources the planned software architecture requires. The software architect is interested in whether the architecture meets its requirements. In particular, they can utilize direct improvement proposals and clarifications of the dependencies between performance and other quality attributes. Marketing can use the information about the capabilities of the planned system and the cost of the capabilities. Managers follow the costs and benefits of the evaluation itself. The possible costs include time and the amount of work spent and the cost of tools. The component designers need performance budgets for their components in order to be able to tune them accordingly. The person who makes the evaluation prefers that the evaluation is as easy to perform as possible. In addition to these, all of the stakeholders are interested in getting the results as fast as possible and again when the parameters or the architecture changes. In addition to the needs of the stakeholders, the constraints of the organization can also affect the selection of the methods used in the development. The history and size of the organization affects what methods the company acquires. Moreover, the background and experience of the people affect how well they can utilize the methods. Performance evaluation at the architectural level requires expertise in three subjects. First, the needs of the application domain from the performance point of view have to be understood. Secondly, the design decisions and concepts that are important in the software architecture should be considered. Finally, an understanding of the theory and concepts behind performance is needed.
Performance Evaluation Approaches for Software Architects
279
The stakeholders set the goals for the evaluation. On the other hand, the goals depend on the stage at which the evaluation is made in the life cycle of the system. One of the first situations is when the software architecture is being developed. The person making the evaluation is then the architect or one of the persons belonging to the architect team. When comparing design decisions, the accuracy of the analysis result is unimportant as long as the analysis points out which is the best solution from the different candidates. Moreover, in the early stages of development the requirements may still be changing, therefore it is not useful to derive accurate results from inaccurate input values. Although some sort of estimate of the hardware architecture is usually needed in the evaluation, hardware architecture is not yet fixed in the early phases of the development. The results of the performance evaluation can be used to support definition of the hardware/software partition and to determine the correct amount of hardware resources for the platform. This is especially important when designing a platform for a product-line. In order to be able to analyze the resource usage of a future architecture, the estimates from the old products are utilized. Therefore, the estimates of a software component in different types of hardware platforms should be comparable. The development organization should be interested in the quality of the architecture as well as the quality of the code. Fortunately, the advantages of architecture reviews are starting to be recognized [24]. Software architecture reviews may be internal reviews based on checklists or evaluations made by an external evaluation team. Usually the goal of the reviews is to identify possible points of improvement and problem areas, that is, to perform risk analysis. If the review is based on checklists, the architect should be able to answer to the questions of the reviewers using the results of the evaluations made during the software architecture design. However, when an external evaluator team makes the evaluation they can derive new quantitative performance models from the architecture to support their analysis of the system. Finally, evaluation methods need a description of the system under evaluation. These descriptions include, for example, the runtime architecture of the system, performance objectives, and estimates of the resource usage of individual components. However, the architecture specifications often do not describe the runtime operation of the system or the information is incomplete. Furthermore, the architecture can be evolving so that modifications are made frequently. Software architecture is not the only thing that affects performance; the implementation of the components also affects the overall performance. Thus, at the architecture evaluation, the effect of implementation has to be estimated until the real values are available.
4
Comparison Framework
The elements of the comparison framework are introduced in this section. 4.1 Context This section presents the elements that are used for defining what types of purposes the method has been applied to and in what type of systems.
280
Anu Purhonen
Evaluation goal – The evaluation may be performed differently depending on the purpose of the evaluation. For example, there can be different needs in different stages of the product’s life cycle. Therefore, it is important to know what types of evaluations have already been performed with the approach or for what types of evaluations it was originally intended. Application field – Application field describes what kinds of applications the approach is intended for or what applications it has been already applied to. Application field can affect, for example, the requirement for the accuracy of the results. Product type – The products differ, for example, in size and complexity. The approaches can differ in terms of what support they give in evaluating different types of products. 4.2 Architecture The elements in this section examine the evaluation approaches from the point of view of what information they require from the system under study. Views – Software architecture descriptions are usually divided into views. The evaluation approach should define the architectural structures that need to be described in order to be able to make the evaluation. Language – In order to allow easy integration with the other development activities, the language in which the architecture is assumed to be described is important. Architectures are depicted with architecture description languages (ADL). For example, Unified Modeling Language (UML) [25] is commonly used as ADL, but there are also several languages that have been specifically developed for describing architectures [26]. Parameters – In addition to architectural structures, some other information may be needed. For example, in order to analyze response time of requests, the execution time estimates of the individual components needed for serving the request are required. The analysis technique may expect information about resource allocation policies such as scheduling policy. Furthermore, the objectives for the performance evaluation, such as timing requirements and resource constraints, are needed. 4.3 Evaluation The elements in this section describe how the evaluation is actually performed using the method. Process – Guidance is needed for understanding what tasks belong to the evaluation and how the tasks should be performed. The theory behind the evaluation methods is often difficult to understand for non-performance experts and thus an ambiguous process description can be a cause for not taking the method into use. Performance Model – Performance model is the model of which the actual evaluation is performed. The performance model may be part of the architecture description or the architectural diagrams are transformed into a performance model.
Performance Evaluation Approaches for Software Architects
281
Solution Technique – An approach may support the use of one or more solution techniques. A solution technique may be based on some mathematical theory or it may be formed from rules and guidelines. Different techniques may be applied in different stages in the product development. Results – The performance-related design decisions that are made during software architecture design include selection of architectural styles and patterns, task partition and deployment to the platform, task scheduling and communication, and decisions concerning use of different types of resources. All those design decisions can be sources of improvement. In addition to software architecture changes, improvements to the hardware architecture or changes to the requirements, both functional and non-functional, can be proposed. Tools – Tools are needed to support the different tasks in the evaluation. In addition to helping the evaluation, transformation tools can be used between architecture models and performance models. Furthermore, the reuse of results is facilitated with tools. 4.4 Costs and Benefits This section handles the elements that are used to examine how useful the method is to an organization and to the actual user. Collaboration – Trade-off studies link the evaluation to the other needs of the stakeholders. Software architects have to know the possible dependencies between different quality attributes. Furthermore, there are several domains in the product development that need to work together so as to produce a final product that meets all the requirements. Because expertise and information tend to be distributed between experts in these domains, easy integration with other development activities is needed. Effort – In addition to benefits there are also costs from the evaluation. Furthermore, the time-to-market requirements may lead to the systematic architecture evaluation being omitted if it is too laborious. Flexibility – The approach should be flexible to the abstraction level of the architecture description and to the size of the product. For example, when in the early stages of the development the speed of the analysis is often more important than the accuracy of the results. In addition, when analyzing large systems unnecessary details may be discarded in order to get an understanding of the efficiency of the whole system. On the other hand, sometimes detailed analysis is needed of the most critical areas. Reusability – In the early stages of product development there is a lot of uncertainty that has to be handled. The requirements change and therefore the architecture also has to be changed. As the development proceeds, the estimates also become more accurate and therefore it should be easy to modify the performance models and re-evaluate the architecture. In addition, one of the problems with the current practices is that the results of the evaluations are not reused. Thus, the approach should support reuse between projects.
282
Anu Purhonen
Maturity – The maturity is assessed by examining the amount of support that is available for the use of the method. Moreover, it is easier to convince the organization of the usefulness of the method if the method has already been used widely. Then the method is probably practical and stable, which means less problems while using it and that support is also available if problems do occur.
5
Overview of the Selected Approaches
This section describes the approaches selected for the evaluation. All these approaches have published examples of their usage in software architecture evaluation. 5.1 RMA Rate-monotonic analysis [14] is perhaps the best-known performance analysis approach among the hard real-time software developers. The basic groundwork for RMA can be traced back to the rate monotonic scheduling (RMS) theory [27]. Consequently, it has been especially used for analyzing schedulability. RMA has been widely used by several organizations in development efforts in mathematically guaranteeing that critical deadlines will always be met, even in worst-case situations [28]. Consequently, it has also been applied to evaluating software architectures, for example, for discovering the ratio between concurrently available functionality over the cost of required hardware resources [29] and for comparing architecture candidates [30]. Definitions of critical use cases and scenarios specify the scope of the analysis. In order to be able to use RMA, a task model of the system has to be defined. The model should specify period, response time, and deadline for each task. RMA does not have a separate performance model, but the mathematical analysis can be made directly from the architectural diagrams. There are commercial tools based on RMA [31, 32] and the tools can be linked directly to UML modeling tools. There is also a specialpurpose ADL and a tool-set for safety-critical systems [33]. 5.2 PASA PASA [17] can be used as a framework when SPE [3, 6] techniques are applied to software architecture evaluation. PASA identifies deviations from architectural style and proposes alternative interactions between components and refactoring to remove anti-patterns. PASA is intended for uncovering potential problems in new development or for deciding whether to continue to commit resources to the current architecture or migrate to a new one. PASA is developed on experiences in performance assessment of webbased systems, financial applications, and real-time systems. The evaluation starts from critical use cases that are further specified as key performance scenarios. The scenarios are documented using augmented UML sequence diagrams. PASA uses three different approaches to the impact analysis: identification of architectural styles, identification of performance anti-patterns, and performance modeling and analysis. Performance analysis is made in two phases. Initially, a simple analysis of
Performance Evaluation Approaches for Software Architects
283
performance bounds may be sufficient. If the analysis of performance bounds indicates the need for more detailed modeling, this is done in the second phase. Detailed performance analysis is based on two models: the software execution model and the system execution model. The models are derived from the sequence diagrams. While the software execution models provide optimistic performance metrics, the system execution models are used for studying the resource contention effects on the execution behavior. The results of solving the system execution model include, for example, metrics for resource contention, sensitivity of performance metrics to variation in workload composition, and identification of bottleneck resources. There is a commercial tool available for the performance analysis [34]. 5.3 LQN Layered Queuing Network modeling (LQN) [16, 18–20] is an extension to the QNM approach. The main difference is that a server, to which customer requests are arriving and queuing for service, may become a client to other servers from which it requires nested services while serving its own clients. If tasks have no resource limits then LQN gives the same predictions as a queuing network [18]. LQN was originally created for modeling client/server systems [16]. It has now been applied to database applications [20] and web servers [35]. Moreover, simulation of LQN models has been used to incrementally optimize the software configuration of electronic data interchange converter systems in a financial domain [36] and to determine the highest achievable throughput in different hardware and software configurations in a telecommunication system [16]. LQN starts with the identification of the critical scenarios with the most stringent performance constraints. Then LQN models are prepared. A LQN model is represented as an acyclic graph whose nodes are software entities and hardware devices and whose arcs denote service requests. The LQN model is transformed from UML class, deployment, and interaction diagrams [37] or Use case maps [38]. There are also transformation rules for creating LQN models from architectural patterns [16]. The LQN model produces results such as response time, throughput, utilization of servers on behalf of different types of requests, and queuing delays. The parameters in a LQN model are the average service time for each entry and average number of visits for each request. LQN offers several analytical solvers. In case the scheduling policy or something else hinders the analytical solver to be used, then simulation is available. A confidence interval can be given to the results. A research tool is available for solving LQN models [39]. 5.4 CPN In the Colored Petri Nets case study, CPN has been used for evaluating alternative mechanisms and policies in execution architecture [11]. The mechanisms include the task control and communication mechanisms and the policies are for task division and allocation. In addition, CPN is used for setting timing requirements for the component design and implementation when the available resources are already fixed. They estimate the message buffer usage and message delays based on simulation of lot of use
284
Anu Purhonen
cases. The application field is mobile phone software. The evaluation handles industrial scale products and product families. The ”‘4+1”’ views of architecture design approach [40] and UML are used to describe the architecture. The parameters are message delays on different message links, the task-switching time of the operation system, and event processing time. Different probability distributions are used for the streams of user request, events and network signals. The module architecture in UML is mapped to the execution architecture as a CPN model. The analysis is made using a simulation tool maintained by the University of Aarhus [41]. The simulation gives the following results: the message buffer usage in worst case, minimum, average, and maximum message delays, and the number of task switches needed for each transaction.
5.5 ATAM Architecture Trade-off Analysis Method (ATAM) is used to learn what the critical architectural design decisions in the context of selected attributes are [4, 42]. Those design decisions can then be modeled in subsequent analyzes. One of the supported attributes in ATAM is performance. ATAM has been used for risk analysis, for example, in realtime systems and in aeronautical systems. ATAM is a review type of activity that takes a few days to perform. An evaluation team that has several members with well-defined responsibilities performs ATAM. In addition, the development organization has to take part in the evaluation. In particular, the architect has a role in providing information on the architectural decisions. The quality goals in ATAM are characterized with scenarios. They use three types of scenarios: usage scenarios, growth scenarios, and exploratory scenarios. The scenarios are created by studying the architecture and interviewing the stakeholders. They use standard characterization for each attribute, which facilitates the elicitation of scenarios. The analysis is based on finding sensitivity points and trade-off points. A sensitivity point is a property that is critical for achieving a particular quality and a trade-off point is a property that affects more than one attribute and is a sensitivity point for at least one attribute. An attribute-based architecture style (ABAS) adds to architectural style the ability to reason based on quality attribute-specific models [43]. An analysis of an ABAS is based on a quality attribute-specific model that provides a method of reasoning about the behavior of component types that interact in the defined pattern. For example, a definition of performance ABAS can include a queuing model and the rules for solving the model under varying sets of assumptions. The qualitative analysis that ATAM uses is based on asking questions regarding what kinds of quantitative evaluations have been performed on the system and how else the performance characteristics have been ensured. The definitions of ABASs help in eliciting these questions. In case the screening questions reveal potential problems, a more comprehensive model of the quality attribute aspect under scrutiny is built. ATAM has a clear process description and definition of roles for members of the evaluation team.
Performance Evaluation Approaches for Software Architects
285
5.6 Metrics The goal in the metrics case study has been to analyze the effect of architectural decisions so that the possible implementation of the components does not need to be taken into account [21]. In the case study, metrics are applied to a high-level software architecture of a telecommunication system. The evaluation is scenario-based. A performance scenario describes a particular use of the system performed by users and its weight is derived from the frequency of how often in time this scenario will happen. Architecture Description Language for Telecommunication (ADLT) is used for describing the architecture. It has been designed with an easy translation from UML diagrams in mind. ADLT uses four diagrams: activity diagram, architectural configuration, sequence diagram and protocol diagram. Activity diagrams and sequence diagrams are used in performance analysis. An activity diagram shows the dynamic component and connector behavior using simple finite state machines enriched with activities. A sequence diagram is similar to UML’s sequence diagram. The analysis is performed using a metric called Elementary Stress: P resences + Queues P aralellisms + 1
(1)
Presences is the number of times that a component or connector appears in a scenario. Parallelisms denote the number of parallelism symbols inside a component or connector activity diagrams. Queues are the number of queue-type structures inside activity diagrams. The justification of the metric is that more occurrences of a component or connector imply a larger overhead associated with that scenario. This overhead is decreased by use of parallelism and made worse by queues or stacks. The trade-off analysis is made using a table. The trade-off value indicates the number of attributes the element is sensitive to. The average of all trade-off values gives the architecture trade-off value.
6
Comparison Results
This section compares the approaches presented in section 5 using the framework defined in section 4. The properties of the approaches are derived from the published uses of the methods. Therefore, this comparison may not include all the cases of how the method has been used. Furthermore, because the properties are derived directly from the publications of the methods, some of the properties can be overlapping. For example, mobile phones are telecommunication systems and telecommunication systems are real-time systems. The term that is closest to the term that is used in the publication of the method is selected. 6.1 Context A summary of the tasks that the evaluation approaches have been used for is presented in Table 1. All the approaches seem to be suitable for comparing architectural decisions.
286
Anu Purhonen Table 1. Evaluation goal
HW/SW configuration Finding bottlenecks Requirements for components Risk analysis Comparison of candidate solutions Estimation of architecture capability
RMA PASA LQN CPN ATAM Metrics X X X X X X X X X X X X X X X
Risk analysis and analysis of bottlenecks are other uses that do not necessarily require a method that gives precise values to the performance parameters. Thus, ATAM and metrics should be useful for those purposes. However, quantitative results are required for deciding the hardware and software configurations, for giving requirements to the component designers and for estimating, for example, response time. Therefore, the actual performance analysis techniques should be more appropriate for those purposes. Table 2 shows the applications the methods have been applied to. Telecommunication systems mean those other than mobile phones. If the publication has explicitly mentioned that the system is real-time, then that is marked. The problem in this comparison was that although the method descriptions often imply that the method is suitable for certain types of applications, there are not so many publications of how the methods have been used in those systems. Safety-critical systems probably need a more formal approach than, for example, consumer electronics in the architecture design. CPN could be considered to be the most formal approach in this comparison. However, RMA has also been used for safety-critical systems with a special-purpose ADL and a tool-set. Consequently, more examples are needed from all the approaches before any further conclusions can be made based on the application field. Table 2. Application Field
Telecommunication Real-time systems Missile guidance Web systems Mobile phone Client-server systems Avionics and aeronautics Financial systems
RMA PASA LQN CPN ATAM Metrics X X X X X X X X X X X X X X X X X
The types of products that have been evaluated using the methods are characterized in Table 3. One of the reasons why these approaches were selected for comparison was that they should be suitable for evaluating embedded software. Although this was not explicitly stated in the metrics case, it is reasonable to believe that it is also suitable for that purpose. RMA may not always be practical in complex systems because it expects
Performance Evaluation Approaches for Software Architects
287
Table 3. Product type
Embedded software Large system Continuous data Communication solution Concurrent usage scenarios Product family
RMA PASA LQN CPN ATAM Metrics X X X X X X X X X X X X
each task to be characterized by a single period, deadline and response time [29]. This may lead to overly pessimistic results. On the other hand, RMA has been found useful in analyzing, for example, avionics systems with continuous data [33]. In systems where users may activate several applications concurrently, those situations are interesting from a resource usage point of view and should be modeled as concurrent usage scenarios. At least LQN seems to be suitable for that. The communication solution in the metrics case study integrated a GSM system with Internet. 6.2 Architecture The CPN case study starts from an architecture description based on 4+1 views [40]. In addition, at least in one example where LQN has been utilized, several views that are needed have been defined [44]. However, otherwise the methods do not usually explicitly state the views of the architecture that should be available before evaluation. ATAM, metrics and analysis of patterns in PASA are approaches that should be suitable for conceptual level analysis. For the other approaches the conceptual level descriptions may be too general because they require fairly detailed parameters as input, as is shown later. Instead of specific views, most of the approaches describe the structures or diagrams that are needed. These diagrams are presented in Table 4. META-H [33] and ADLT [21] are languages that have been used for describing input diagrams. ADLT resembles UML. Furthermore, PASA uses augmented sequence diagrams that are similar to UML sequence diagrams but have some features from message sequence charts (MSC). The Table 4. Architectural diagrams
UML class diagram UML deployment diagram UML sequence diagram UML use cases Augmented UML sequence diagram MSC Use Case Maps ADLT diagrams META-H diagrams
RMA PASA LQN CPN ATAM Metrics X X X X X X X X X X X X X X X
288
Anu Purhonen
architecture for LQN has been described in UML, Use case maps or MSC diagrams. ATAM does not specify the structures that are needed for the analysis. In ATAM, the responsibility of the architect is to be able to describe the architecture and the main architectural decisions to the evaluation teams. In addition to the architectural diagrams each approach needs some additional information or parameters: – CPN requires time parameters such as message delays, task-switching time and event processing time. – Metrics and ATAM need definitions of usage scenarios and weights to them. These scenarios are derived from requirements. Part of the ATAM approach is that it supports the scenario elicitation. – LQN requires execution time demands for each software component on behalf of different types of system requests and demands for other resources such as I/O devices and communication networks. The parameters include, for example, the average values for the following parameters: arrival rate of requests, execution time of requests and average message delays. In addition, the scheduling discipline for each software and hardware server is needed. – In PASA the important performance metrics for each server are residence time, utilization, throughput, and average queue length. The scheduling discipline for queues also restricts how the problem is solved. – In RMA, each task should have period, deadline and response time possibly representing the worst case defined. In addition, the RMA tools provide support for different types of scheduling disciplines. 6.3 Evaluation The level of process descriptions varies in the approaches: – In the application of CPN the problem seems to be that there are no guidelines regarding how to use it in architecture evaluation. – For the metrics approach there is an architectural development process description. – ATAM has a process description for a review-type of evaluation where the roles of the evaluation team are strictly defined. – There are some guidelines of how to use LQN in the various publications but not one specific process description for software architecture evaluation. – PASA has a good process description of how to proceed from architectural diagrams to the evaluation. In addition, PASA is supported by the SPE methodology. RMA analysis can be made directly from UML diagrams and the metrics analysis is also made directly from ADLT diagrams. The other approaches use special performance models. CPN is based on the CPN model and a special inscription language for token definition and manipulation. PASA and LQN both use queuing network models. In case a detailed analysis needs to be included in ATAM, any model and method can be used with it. The solution techniques used are listed in Table 5. Naturally, the metrics approach is based on a special metric. LQN and apparently PASA have several analytic solvers, but
Performance Evaluation Approaches for Software Architects
289
Table 5. Solution techniques
Performance bounds Architectural patterns Architectural anti-patterns Analytic solver Simulation Mathematical rules ABAS Metrics
RMA PASA LQN CPN ATAM Metrics X X X X X X X X X X X X X X
simulation is used when an analytic solution is difficult or impossible to find. The developers of ATAM provide support for the evaluation with a selection of ABASs. PASA also includes analysis of architectural patterns and anti-patterns. In addition, RMA is proposed for use with PASA for schedulability analysis. Based on the publications, the following types of results have been received with the solution techniques: – By simulating a lot of use cases CPN is used to obtain maximum size of message buffers and average transaction processing time. On the other hand, when the total size of buffers is fixed, the timing parameters are results that can be further used as requirements for the component design and implementation. – Metrics are able to show the critical elements in the architecture. – ATAM is used to find out the points in the architecture that are sensitive to performance. They are further used in specifying trade-off points and risks. – LQN results are response time, throughput, queuing delays, and utilization of different software and hardware components. The more exact the solver the more accurate the results. – The main result of RMA is the schedulability of the system. However, other information can also be revealed, for example, response times for requests. – PASA is intended for deciding end-to-end processing for messages and for analyzing scalability. PASA identifies problem areas that require correction in order to achieve the desired scalability and quantifies the alternatives so that developers can select the most cost-effective solution. Tools were found available for all methods except ATAM and the metrics approach. RMA and PASA are the only ones that have commercial tools available. Moreover, RMA is supported by more than one evaluation tool and the tools can be directly connected to well-known design tools. There are also some publications about transformation tools for LQN, but the tools were not publicly available. The tool used in the CPN case study was earlier a commercial tool, but it is now maintained by a university [41]. At least the CPN and RMA approaches have tools available for formal modeling and analysis. The problem with all the approaches seems to be that the tools do not yet give much advice on how to utilize the results in order to improve the architecture. Therefore, the user has to be able to translate the performance concepts back to architectural alternatives.
290
Anu Purhonen
6.4 Costs and Benefits Trade-off with other attributes and integration with other development activities is supported as follows: – The CPN approach supports analysis of functionality in addition to performance evaluation. However, it had been difficult to formally analyze the whole model because of processing power requirements [11]. Integration of the CPN model with UML diagrams is claimed to be easy, but it is not clearly demonstrated in the publication. – In the metrics approach and ATAM, the trade-off analysis is supported. – In PASA the trade-off analysis is claimed to be supported but it is not clearly demonstrated how it is done. Integration of PASA and SPE to other development efforts is left to the organization utilizing the approach. – In LQN publications there are some ideas of how to integrate it with software design environments but more support is not yet available. – The connection with other design activities in RMA is supported with tools. However, although analysis can be made directly from UML diagrams, the support for the feedback to the architecture is unclear. In addition, mentions of trade-off studies with non-performance-related attributes were not discovered. The effort used is not often reported in the publications and the estimates that are given may not be comparable with each other. According to the CPN case study, it took one and half months to learn the method and tools. After that the modeling took two months to prepare the first version and an additional three weeks to make an update. The time spent in the actual evaluation was not given. According to the developers of the metrics approach, the costs of the evaluation do not seriously affect the development process. ATAM takes normally 3-4 calendar days to apply to architectures for medium-sized systems. However, ATAM requires assembling the relevant stakeholders for a structured session of brainstorming, presentation, and analysis costing a total of 40-60 staff days. The amount of time needed for the managers and the architects to prepare for the ATAM is not included. LQN does not give references to model building, but it has been mentioned that LQN analysis takes seconds and simulation minutes. The cost of following SPE (the approach upon which PASA is based) in performance-critical projects has been 1-10 percent of the overall budget of the project [6]. No references to the costs of RMA were discovered. A flexible method should be suitable for small and large systems at the conceptual and concrete levels. It should be flexible enough to be used in the different stages of product development and for different type of results accuracy. CPN supports hierarchical description that facilitates the modeling of larger systems. The metrics approach should be suitable for both large and small systems but without a tool, large systems may be too laborious. Because the steps of ATAM are strictly defined, the effort of the review remains nearly the same whatever the size of the system to be analyzed. LQN has been utilized for analyzing large systems. One of the main features of LQN is that it supports reserving multiple resources at the same time. Furthermore, there is a selection of solvers depending on what kind of result accuracy is needed. The steps for larger and smaller systems are the same in PASA. The use of sub-models facilitates the evaluation
Performance Evaluation Approaches for Software Architects
291
of larger systems. Evaluation of patterns and anti-patterns can already be performed from conceptual diagrams. RMA does not have any additional features concerning the flexibility of the method. The reusability can be supported in different ways. For example, the evaluations should be easily repeatable when parameters change and the results should be reusable between different projects. The selected approaches support reusability as follows: – In the CPN approach the parameters can be updated and the variation in products can be described with data variables and initial markings. Thus, the approach should support reusability between projects. – The metrics approach apparently gives always the same result with the same input information regardless of the user, because it is a platform-independent approach. – Because ATAM is based on experience of the evaluators with different composition of experts, it does not necessarily give the same result regardless of the evaluator. However, the intermediate results, such as definitions of the scenarios, can be reused. – LQN models can be reused with different parameters and solvers. In addition, the behavior of the nodes can be described in more detail if needed. – PASA supports composition of sub-models, but the analysis of concurrent scenarios is supported only with a simulator. The tool provides a database where parameter values (e.g. resource utilization) can be stored and reused in different models that run in the same environment. – RMA has good tool support. Thus, the evaluation result should not depend on the user of the method. However, no examples were found regarding how the evaluation results are actually used to improve the architecture so that may cause variation on the actual outcome of the evaluation. The properties for estimating maturity are presented in Table 6. In the CPN approach the main problem is the lack of documentation on how to model architectures with it. For the metrics approach a validation of metrics through Petri nets and QNM was said to be in progress. There is especially good guidance for the use of PASA in the form of two books. However, PASA evaluation without the tool may be difficult [45]. There are a lot of publications about LQN, but it needs a good book on how it should be used especially for software architecture evaluation. RMA is the most mature method in that it has more than one commercial tool and it has been widely used in developing Table 6. Metrics for estimating maturity
Service marks Case studies Several developers Commercial tool Research tool Handbooks Several tool providers
RMA PASA LQN CPN ATAM Metrics X X X X X X X X X X X X X
292
Anu Purhonen
hard real-time software. However, as a method for a software architect it is lacking in that although there are books about it, they have not been updated after new concepts of software architecture development have emerged. The problem with the other service mark, ATAM, is that there are no tools and it does not support detailed analysis of performance. From the software architecture development point of view, the CPN and metrics approaches are the most immature ones because there are not yet many publications on them. 6.5 Summary As anticipated, the comparison of the evaluation approaches showed that there is still a lot to do to make the methods more suitable for architects. Furthermore, it seems that one method is not enough; instead different techniques should be used for different purposes. In addition, the background and experience of the organization and the availability of tools affects what method is taken into use. RMA is a well-known method among real-time software developers and it is the only method that has more than one commercial tool. Tools are also directly connected to design tools. However, more guidance is needed on how it can be utilized specifically by software architects. In addition, the fact that it can give overly pessimistic results may hinder its use for many problems in consumer electronics. PASA also has a commercial tool and it is based on the well-known performance engineering approach SPE. Two books provide good support for different stages of the evaluation. Nonetheless, it is difficult to finish the evaluation without using their tool and there is no support for the integration of the approach with the other development activities. In addition, the trade-off analysis phase is included in the method but no detailed guidelines were found regarding how it is actually performed. LQN seems to be suitable for many types of applications and, based on numerous publications, a lot of effort is being put into making it user-friendlier. The interesting feature in LQN is that it allows reservation of multiple resources concurrently, which is an important feature for real world systems. However, the tool is not yet commercial and it is only available for restricted environments. In addition, although the notation is easy to understand, there is a need for a good manual on how the method can be applied by architects. CPN is especially good for systems where both functionality and performance needs to be validated. Modeling with CPN seems to take a long time and the evaluation is done using only a simulator, so the evaluation may be slow for some purposes. There seems to be no direct connection between design tools and the CPN tool. The results of CPN were based on only one case study of which no further publications were found. Thus, the results may not be as reliable as where there are more publications available. ATAM has been developed especially for architecture evaluation and it is supported by a handbook. In the evaluation, ATAM relies on the experience of the evaluation team. Consequently, it is not actually a performance evaluation method itself, but it creates a framework in which the detailed performance evaluation techniques can be embedded. Metrics is the other approach based on just one case study. The interesting point of this approach is that it does not require any estimates of the future implementation of the system, but it is solely based on the evaluation of the architectural structures.
Performance Evaluation Approaches for Software Architects
293
Unfortunately, the proposed metric has not yet been validated through other applications and performance evaluation techniques. In addition, the tools are needed before the metrics approach can be applied in larger systems.
7
Conclusion
The use of systematic performance evaluation of software architectures is still rare. However, the benefits of the early analysis of software development have been long accepted. This chapter first described the needs of the stakeholders for the performance evaluation. Based on the stakeholder requirements, the elements of the comparison framework were introduced and the framework was used to compare six performance evaluation approaches. At the moment the evaluation methods are not general-purpose and all of them have their strengths and weaknesses. Different methods need to be used for different purposes. In addition, the background of the development organization and the availability of the tools affect selection of the method. Consequently, the proposed comparison framework can help in selecting a method that best suits the needs of the organization and the system. It also helps in estimating the status of the performance evaluation research and in understanding the differences between the approaches. Future work in the evaluation methods is needed for creating guidelines and examples on how to use these methods in software architecture development. In addition, tool support is not yet sufficient for an architect that is not entirely familiar with performance concepts. Support is especially missing in transforming the architectural diagrams to performance models and on the other hand, transforming the evaluation results into architectural decisions.
Acknowledgments This work was partly conducted in the Moose project, under ITEA cluster projects of the EUREKA network, and financially supported by Tekes (the National Technology Agency of Finland). The author was also supported by a grant from the Nokia Foundation.
References 1. S. Balsamo, A. D. Marco, P. Inverardi, and M. Simeoni. Software Performance State of the Art and Perspectives. Dipartimento di Informatica, Unversit`a Ca’Foscari di Venezia, Research Report CS-2003-1, January 2003. 2. IEEE, IEEE Recommended Practice for Architectural Description of Software-Intensive Systems. IEEE Std 1471-2000, 2000. 3. C. U. Smith. Performance Engineering of Software Systems. Addison Wesley, 1990. 4. P. Clements, R. Kazman, and M. Klein. Evaluating Software Architecture: Methods and Case Studies. Addison-Wesley, 2001. 5. F. Aquilani, S. Balsamo, and P. Inverardi. Performance Analysis at the Software Architectural Design Level. Performance Evaluation, vol. 45, pp. 147-178, 2001.
294
Anu Purhonen
6. C. U. Smith and L. G. Williams. Performance Solutions: A Practical Guide to Creating Responsive, Scalable Software. Addison-Wesley, Boston, 2002. 7. P. K¨ahkipuro. Performance Modeling Framework for CORBA Based Distributed Systems. University of Helsinki, 2000. 8. C. Shousha, D. Petriu, A. Jalnapurkar, and K. Ngo. Applying Performance Modelling to a Telecommunication System. In Proceedings of The First International Workshop on Software and Performance (WOSP’98), Santa Fe, New Mexico, USA, 1998. 9. P. King and R. Pooley. Derivation of Petri Net Performance Models from UML Specification of Communication Software. In Proceedings of XV UK Performance Engineering Workshop, 1999. 10. K. Fukuzawa and M. Saeki. Evaluating Software Architectures by Coloured Petri Nets. In Proceedings of the 14th International Conference on Software Engineering and Knowledge Engineering (SEKE02), Ischia, Italy, 2002. 11. J. Xu and J. Kuusela. Analyzing the Execution Architecture of Mobile Phone Software with Colored Petri Nets. International Journal on Software Tools for Technology Transfer, vol. 2, pp. 133-143, 1998. 12. M. Bernardo, P. Ciancarini, and L. Donatiello. Architecting Families of Software Systems with Process Algebras. ACM Transactions on Software Engineering and Methodology, vol. 11, pp. 386-426, 2002. 13. U. Herzog and J. Rolia. Performance Validation Tools for Software/Hardware Systems. Performance Evaluation, vol. 45, pp. 125-146, 2001. 14. M. H. Klein, T. Ralya, B. Pollak, R. Obenza, and M. G. Harbour. A Practioner’s Handbook for Real-Time Analysis: Guide to Rate Monotonic Analysis for Real-Time Systems. Kluwer, 1993. 15. R. Pooley. Software Engineering and Performance: a Roadmap. In Proceedings of the 22nd International Conference on Software Engineering, Future of Software Engineering Track, Limerick, Ireland, 2000. 16. D. Petriu, C. Shousha, and A. Jalnapurkar. Architecture-Based Performance Analysis Applied to a Telecommunication System. IEEE Transactions on Software Engineering, vol. 26, pp. 1049-1065, 2000. 17. L. G. Williams and C. U. Smith. PASA: A method for the Performance Assessment of Software Architectures. In Proceedings of the 3rd International Workshop on Software and Performance, Rome, Italy, 2002. 18. C. E. Hrischuk, C. M. Woodside, J. a. Rolia, and R. Iversen. Trace-Based Load Characterization for Generating Performance Software Models. IEEE Transactions on Software Engineering, vol. 25, pp. 122-135, 1999. 19. J. A. Rolia and K. C. Sevcik. The Method of Layers. IEEE Transactions on Software Engineering, vol. 21, pp. 689-700, 1995. 20. G. Franks, A. Hubbard, S. Majumdar, D. Petriu, J. Rolia, and M. Woodside. A Toolset for Performance Engineering and Software Design of Client-Server Systems. Performance Evaluation, vol. 24, pp. 117-135, 1995. 21. S. Afsharian, M. Giuli, and G. Tarani. Quantitative Analysys for Telecom/Datacom Software Architecture. In Proceedings of the third International Workshop on Software and Performance, Rome, Italy, 2002. 22. S. Balsamo and M. Simeoni. Deriving Performance Models from Software Architecture Specifications. In Proceedings of the European Simulation Multiconference (ESM 2001), Prague, 2001. 23. S. Balsamo, V. D. N. Persone`e, and P. Inverardi. A Review on Queuing Network Models with Finite Capacity Queues for Software Architectures Performance Prediction. Performance Evaluation, 2002.
Performance Evaluation Approaches for Software Architects
295
24. R. Kazman and L. Bass. Making Architecture Reviews Work in the Real World. IEEE Software, vol. 19, pp. 67-73, 2002. 25. OMG, Unified Modelling Language, http://www.uml.org. 26. N. Medvidovic and R. N. Taylor. A Classification and Comparison Framework for Software Architecture Description Languages. IEEE Transactions on Software Engineering, vol. 26, pp. 70-93, 2000. 27. C. L. Liu and J. W. Layland. Scheduling Algorithms for Multi-Programming in a Hard RealTime Environment. Journal of the Association for Computing Machinery, vol. 20, pp. 46-61, 1973. 28. R. Obenza. Guaranteeing Real-Time Performance Using RMA. Embedded Systems Programming, pp. 26-40, 1994. 29. A. Ran and R. Lencevicius. Making Sense of Runtime Architecture for Mobile Phone Software. In Proceedings of the 11th ACM SIGSOFT Symposium on Foundations of Software Engineering held jointly with 9th European Software Engineering Conference, ESEC/FSE 2003, Helsinki, Finland, 2003. 30. A. Purhonen. Architecture Evaluation Strategy for DSP Software Development. In Proceedings of the 15th International Conference of Software & Systems Engineering and their Applications, Paris, 2002. 31. TimeSys Corporation, TimeWiz, www.timesys.com. 32. Tri-Pacific Software Inc., RAPID, www.tripac.com. 33. P. H. Feiler, B. Lewis, and S. Vestal. Improving Predictability in Embedded Real-Time Systems. Carnegie Mellon University, Software Engineering Institute, Technical report CMU/SEI-2000-SR-011, December 2000. 34. Performance Engineering Services, SPE*ED, http://www.perfeng.com. 35. J. Dilley, R. Friedlich, T. Jin, and J. Rolia. Measurement Tool and Modelling Techniques for Evaluating Web Server Performance. In Proceedings of Computer Performance Evaluation Modelling Techniques and Tools, 1997. 36. K. Aberer, T. Risse, and A. Wombacher. Configuration of Distributed Message Converter Systems Using Performance Modeling. In Proceedings of the 20th International Performance, Computation and Communication Conference, Phoenix, Arizona, 2001. 37. D. C. Petriu and H. Shen. Applying the UML Performance Profile: Graph Grammar-Based Derivation of LQN Models from UML Specifications. In Proceedings of TOOLS 2002. 38. D. C. Petriu and M. Woodside. Software Performance Models from System Scenarios in Use Case Maps. In Proceedings of TOOLS 2002. 39. Carleton University. LQNS Solver. http://www.sce.carleton.ca/rads/#softarch. 40. P. Kruchten. The 4+1 View Model of Architecture. IEEE Software, vol. 12, pp. 42-50, 1995. 41. University of Aarhus. CPN Tools. http://wiki.daimi.au.dk/cpntools/cpntools.wiki. 42. R. Kazman, M. Klein, and P. Clements. Evaluating software architectures for real-time systems. Annals of Software Engineering, vol. 7, pp. 71-93, 1999. 43. M. Klein, R. Kazman, L. Bass, J. Carriere, M. Barbacci, and H. Lipson. Attribute-Based Architecture Styles. In Proceedings of The First Working IFI Conference on Software Architecture (WICSA1), San Antonio, TX, 1999. 44. C.-H. Lung, A. Jalnapurkar, and A. El-Rayess. Performance-Oriented Software Architecture Engineering – an Experience Report. In Proceedings of the First International Workshop on Software and Performance, Santa Fe, New Mexico, 1998. 45. T. Kauppi. Performance Analysis at the Software Architectural Level. VTT Electronics, VTT Publications 512, 2003.
Component-Based Engineering of Distributed Embedded Control Software J.H. Jahnke1 , A. McNair1 , J. Cockburn1, P. de Souza1 , R.A. Furber2 , and M. Lavender2 1
Department of Computer Science University of Victoria Victoria, V8W-3P6, Canada, B.C. {jens,amcnair,japc,pdesouza}@netlab.uvic.ca 2 Intec Automation Inc., 2751 Arbutus Rd. Victoria, V8N-5X7, Canada, B.C. {bob,mike}@microcommander.com
Abstract. Embedded control applications have become increasingly networkcentric over the last few years. Inexpensive embedded hardware and the availability of pervasive networking infrastructure and standards have created a rapidly growing market place for distributed embedded control applications. Software construction for these applications should be inexpensive as well in order to satisfy mass-market demands. In this chapter, we present results from an industrialdriven collaborative project with the purpose of researching component-based software engineering technologies for mass-market network-centric embedded control applications. This project has lead to the development and refinement of several tools in support of component-based software development. We describe these tools along with their underlying concepts and our experiences in using them.
1
Net-Centric Embedded Components
Embedded systems have become ubiquitous in our daily lives. Networking them via the Internet and Intranet infrastructures is one of the computer industry’s fastest growing markets [6]. The omnipresence of digital communication infrastructures has created inexpensive media for tele-monitoring and distributed processing in embedded devices. Traditionally, the primary concerns while developing software for embedded systems have been maximizing run-time and memory efficiency in order to minimize hardware costs. Due to continuously decreasing hardware costs and the increasing complexity of tasks controlled by embedded systems, other goals like maintainability, reliability, security and safety have gained great importance. Still, current industrial development practices (processes, tools, and techniques) for software in embedded systems lag behind the state-of-the-art in other software engineering areas. Despite all the progress made in the general software engineering arena (e.g., model-driven specification and design, component-based software development, generative programming, framework reuse, etc.), much embedded software is still being developed at a low level of abstraction, using primitive programming languages like assembler and C. This development C. Atkinson et al. (Eds.): Component-Based Software Development, LNCS 3778, pp. 296–319, 2005. c Springer-Verlag Berlin Heidelberg 2005
Component-Based Engineering of Distributed Embedded Control Software
297
practice is inefficient for complex systems because it impedes software reuse and maintenance. Moreover, it is human-intense, requires a significant amount of experience, and is prone to error. These problems are growing even more severe with the current trend of interconnected embedded systems in net-centric architectures. The magnitude of potential benefits of aggregating and connecting embedded systems over the Internet drives interest in the currently unsolved problem of how to design, test, maintain, and evolve such heterogeneous, collaborative systems. 1.1 The Minimal Footprint Challenge A component-oriented approach can be used to tackle the problem stated above. The notion of reusable software components has proven beneficial in general software engineering domains. Current integrated software development environments provide an extensible library of front-end Graphical User Interface (GUI) components and backend database components that can be used to rapidly compose applications. In the area of embedded control systems, component-oriented software development has been shown to cut production costs and improve the maintainability of systems [20]. Current embedded component models such as the Microsoft .Net compact framework and Java 2 Micro Edition (J2ME/Java Beans) are powerful platforms for component-based development on high and mid-end devices. However, these frameworks are still far too resource-hungry for applications on low-cost, mass-produced 8 and 16 bit micro controller platforms. This is partially because both of these frameworks have been developed as stripped-down versions of component models defined for traditional workstations and servers. In contrast, the component model described in this chapter has been developed from the ground-up, specifically targeting low-powered, small-scale micro controllers. 1.2 The Interactivity Challenge A weakness of current component models targeted towards embedded controllers is that they are concerned solely with the code running on the embedded device. However, micro controllers embedded in smart appliances have become increasingly interactive and users expect to interact with their devices from remote locations such as the PC at their work place. Therefore, the concept of an embedded software component deployed on such a micro controller has manifestations beyond the actual code running on the micro controller; it might also include the code for a GUI running on a client’s PC in order to monitor and adjust the execution of the embedded component. Of course, an alternative to executing component UI code on client user devices would be to access embedded controllers with general purpose thin-client software, such as Web browsers. However, this would require each micro controller incur the overhead of operating an embedded Web server to render the user interfaces to many, potentially concurrent, clients. This approach has additional drawbacks, limiting the user interface to relatively simple controls and interaction paradigms, and making near real-time visualization of the components unlikely. Therefore, dedicated client UIs are often used to provide PC-based control of embedded devices, which has the advantage of offloading UI features onto host PCs, al-
298
J.H. Jahnke et al.
lowing the embedded controllers on the smart devices to be dedicated to what they are good at: controlling in real time. However, to create such applications, engineers have to bridge between the disparate component models used on the embedded system and the PC, respectively. Developing these bridges is often a costly and human-intensive process. The development of a holistic component model, which encapsulates embedded aspects as well as non-embedded (visual) aspects of controller components, could overcome this limitation. 1.3 The Integration Challenge Tele-monitoring of smart appliances over the Internet can be seen as a form of distributed computing. However, there is also an increasing trend to directly integrate the operations of different embedded devices so that they can exchange signals and information in order to act in concert with each other. It is one of the challenges of component-based development: to facilitate such integration without introducing additional context-dependencies for components and so decreasing their universal reusability in other application contexts. In other words, we are looking for ways to integrate distributed components, and, at the same time, to maintain their ignorance of each other. A programming style that attempts to solve this challenge by introducing a level of indirection is connection-based programming [20]. In this chapter, we present a component-oriented approach to engineering embedded control software for network-centric systems. This approach particularly targets low-powered platforms and facilitates interaction with PC-hosted GUIs. We present microCommander, a commercial visual development environment for embedded control software, developed by Intec Automation. The microCommander component model has specifically been developed as an answer to the first two challenges mentioned above. Furthermore, we have developed a research prototype technology in answer to the third challenge mentioned (integration). microSynergy is a model-driven development method for connection-based integration of multiple embedded controllers. Intec and UVic have been collaborating on research and development of these technologies since 2000, supported by the Advanced Systems Institute of British Columbia and the Natural Science and Engineering Research Council of Canada. The next two sections introduces the microCommander component-based development model and environment, which also facilitates PC-based user monitoring and control of devices. Section 4 focuses on the microSynergy approach to integrating multiple embedded devices by model-driven, connection-based programming. Section 5 presents related work and Section 6 contains an evaluation of our results and reports on our experiences.
2
Component-Based Development with microCommander
The microCommander embedded component technology consists of a component model, a library of embedded components and an integrated visual application development environment. The component library consists of basic control components such as timers logic, gates, controllers, etc. These components are assembled, using the visual
Component-Based Engineering of Distributed Embedded Control Software
299
development tools, into control applications. The combination of the software code that performs specific control functions on the micro controller with the code used to view and manipulate these functions from a remote host PC provides a single and coherent component model. We have chosen to present the perspective of using microCommander for application development first, before discussing the component model and infrastructure underneath microCommander. 2.1 Visual Component Assembly – The User’s Perspective microCommander was designed with interactivity and ease of use in mind. As a consequence, it deviates from the traditional “develop, build, download, test, debug” cycle by allowing developers to instantly assemble and visually monitor software components on the embedded hardware in real time. In microCommander the line between development and deployment of an application is blurred, the same tool is used to both develop and interact with an application. microCommander applications are developed using mVisual, software that runs on a host PC interacting with a target micro controller. An application is a collection of microCommander components (mComponents) that may interact with one another. Figure 1 shows the mVisual view of a simple greenhouse application. Besides the mComponents selected for the application, the figure shows a list of system components that belong to the component framework of microCommander, e.g., System Information, Job Scheduler, Tick List, Second List. These components can be used to customize the way the microCommander framework executes the mComponents used in an application.
Fig. 1. mVisual component view
300
J.H. Jahnke et al.
In order to develop an application, mComponents are instantiated by selecting them from a toolbar in mVisual. mComponents are then configured by customizing their properties through property sheets. For example, the left-hand side of Fig. 2 shows the property sheet for an on-off controller mComponent, which functions like a wall thermostat. It controls a digital component (e.g., a heater) to control a value (e.g., temperature) in a physical system (e.g., a greenhouse) to attain a target value. Connections between different input and output mComponent instances are also made with the help of an mComponent’s property sheet. The example below shows that the on-off controller mComponent is connected to a “Recirc Air Temp” converter component on its input (converts sensor mv to (◦ C), and to a “UHeater Ctrl” digital variable component on its output.
Fig. 2. On/Off controller property sheet (left-hand side) and corresponding visual control (righthand side)
Once an mComponent instance has been created, its data and possibly other properties may be viewed with an interactive dialog, called a visual control (right-hand side of Fig. 2). mVisual also serves as a micro controller operating interface that displays components in the form of visual controls, which might include switches, status lights, knobs, and sliders on one or more control panels. Adding descriptive wallpapers to the background of control panels can serve to further enhance the user guidance. mVisual also provides a comprehensive set of consistency checks such as dependency trees and hardware binding tables in order to prevent erroneous component compositions.
3
The microCommander Component Model
The microCommander component model is distributed in nature. A portion of an mComponent is executed autonomously on the embedded device and a portion is executed on a client PC in order to provide interactivity with the embedded application. For
Component-Based Engineering of Distributed Embedded Control Software
301
instance, the greenhouse unit heater controller switches the unit heaters on/off to maintain the greenhouse temperature. When the user connects with mVisual, the mComponents are rendered on-demand through property sheets and visual controls, making the mComponents accessible to the user for configuration and interaction. This is done by a rendering engine in mVisual, as shown in Figure 3.
Fig. 3. User interaction with mComponent
For the microCommander component model, we have adopted the general definition of the concept of a software component by Szyperski: “a software component is a unit of composition with contractually specified interfaces and explicit context dependencies only. A software component can be deployed independently and is subject to composition by third parties” [20]. More specifically, mComponents have the following characteristics: 1. Uniqueness: each component has a unique identifier. 2. Embedded functionality: each component includes binary code to be executed on an embedded controller in order to perform a particular function. 3. User-interactivity: each component includes binary, data and meta-data to be processed on a remote computer for the purpose of monitoring and controlling the component’s embedded function(s). 4. Parameterization-dialog: each component includes binary code to be executed on a remote computer for the purpose of customizing a component’s embedded function. 5. Self-disclosure: components can disclose their status, type, and ID. 6. Persistence: component instances can serialize to a binary stream destined for permanent storage, e.g., on a flash memory provided by the embedded platform. 7. Embedded hardware-dependency: a specification of the hardware context necessary for executing the component. 8. Contractual inter-component interface: event and data flow-oriented interface specification for component composition. mComponent instances are classified into types according to the embedded function they perform. mComponent instances of the same type share rules, properties and behavior. In addition, each mComponent instance hosts unique parameters and run-time information, making a particular instance different from other instances of the same type.
302
J.H. Jahnke et al.
When mVisual connects to a target micro controller, it introspects the current inventory of mComponent instances and sends the list back to mVisual. Then mVisual asks each instance to disclose its Component Configuration Block (CCB), a record of component information. In the case of component ”UHeater1”, it might reply: My My My My
Name: State: Data: Type:
"UHeater1" Enabled | Visible | Inverted On (1) "Digital Out"
Once mVisual knows the mComponent’s type, it retrieves its extended descriptor, which contains component type related information such as: Embedded function: Valid v-controls: Valid v-control messages: Parameterisability: Valid sources: Events detected: Hardware dependency:
Turn on/off, toggle LEDs, toggle switches, momentary switches, etc. Set data, get data, set flags Statements for building a property sheet Digi Var, Digi Op, Digi Out, Digi In, In Gate... "on to off", "off to on" exists digital output pin
This information allows mVisual to represent the mComponent with suitable visual controls and to render the component’s property sheet and verify related information when an mComponent is instantiated or modified. Extended descriptors are not stored on the embedded device but they are part of the microCommander environment installed on an operator’s computer. They might also be served from a remote component descriptor server. As this information is not needed in real-time, it can reside elsewhere, reducing the resources necessary on the micro controller. 3.1 Component Composition mComponents are composed in a data flow-style architecture. There are two main data types, digital (0 or 1) and analog (unsigned integer). Interfaces of mComponents comprise data sources and outputs. mVisual ensures type-safe composition, i.e., that data sources can only be connected to compatible data outputs. This flow of data between components is controlled by a common timed update service. The update frequency can be chosen differently for each mComponent instance. Obviously, some mComponents in micro controller applications will have to interface directly with hardware ports, such as digital and analog I/O ports. To do this, microCommander uses special components called hardware agents that act as an agent for hardware features such as analog and digital pins. The framework loads the hardware profile of the target micro controller platform to make all I/O ports accessible to the appropriate source and output agents. Updating a mComponent instance causes the instance’s default action to be performed. For instance, a timer update will cause it to increment its count. But, components can also be triggered to perform additional actions; actions from a set of actions unique to each Component type. For instance a timer can be triggered to reset, reset and go, stop and resume. Actions can be invoked in two ways: they can be scheduled or triggered periodically by system components that are part of the framework, or they can be triggered by events. Both alternatives are elaborated in more detail in the next section
Component-Based Engineering of Distributed Embedded Control Software
303
The component framework consists of a runtime engine for mComponents, a set of system components, and a set of rules to which each mComponent must comply. The core of the runtime engine comprises a number of system components supported by a number of services. On the micro controller, the runtime engine includes a boot loader (checks for a default application saved to flash memory and instantiates the system components), a simple task scheduler and a message dispatcher. On the PC, the runtime engine includes a number of services for managing components (saving, loading, tracking dependencies, enforcing rules, etc.), rendering engines for displaying component property sheets, and visual controls for user interaction with the components. The heart of the runtime engine is the Supervisor component which is responsible for managing all components in an application, both on the PC and the micro controller. It is supported by a number of other system components: – System Information: tracks and manages system resources such as memory. Maintains the target date and time. – Tick, Second, Daylists: Execute user-defined periodic jobs (a “tick” is the smallest time unit in a microCommander application. The actual duration of a tick in real time depends on the choice of the target micro controller hardware and how much work must be done during the “worst case scenario” tick). – Job Scheduler: provides date-time scheduling for user-defined jobs. – Event: watches a data source and triggers an action in another component when a specified change is detected. – Security: provides a list of users, passwords and access levels. Also provides an interface for user authentication. Framework rules are enforced through component inheritance and meta-data. Adherence to these rules is what enables the runtime engine and system components to perform their required tasks, such as component introspection, composition and persistence. Event components can be seen as a component-based implementation of the EventCondition-Action (ECA) paradigm. They can be deployed to watch specific data sources for the detection of change events, which, on occurrence, evaluate a user-specified condition. If this condition is fulfilled, the event-component can trigger other components to perform selected actions. 3.2 Resolving Hardware Heterogeneity microCommander currently supports three unique CPU architectures and thirteen unique micro controller boards. To provide this multi-platform support, microCommander has been designed to abstract from specific hardware interfaces provided by different chip makers and board manufacturers. The abstraction is achieved by using a layered model. There are two levels of hardware abstraction. The first is at the components level: microCommander provides a set of special components called agents (Fig. 4). All agent components have two properties, personality and pin, which enable them to interact with the hardware. A personality is a generic interface to common micro controller functionality such as digital output, analog input, PWM, etc. Micro controllers typically provide these hardware personalities in groups of pins, called ports,
304
J.H. Jahnke et al.
Fig. 4. microCommander framework abstraction layers
and hence the pin property is used to specify a single pin within the port. The enumeration of personalities for a micro controller can be greater than the amount of ports or pins, since often ports and pins can be configured to take on different behaviors. For example a micro controller often provides the ability to configure a digital pin to be either an input or an output. This level of abstraction provides portability of user applications. The second level of abstraction is the personality table. It is an enumeration of all the port/personality combinations provides by the platform’s hardware. Each platform’s hardware layer is responsible for providing the functionality for each type of personality available on the micro controller. The personality table provides the link between the hardware that implements the functionality for each personality and the agent components that use the personality. This allows the agent to be unaware of the underlying hardware. The same personality may be offered on several ports, as illustrated in Fig. 5, while some personalities may be lacking on a given hardware platform (i.e., no analog input). The personality table is one of the interfaces that contribute to the portability of the microCommander runtime engine.
Fig. 5. Example personality table
4
Coordinating Software Components on Distributed Controllers
Today, even inexpensive micro controller platforms are equipped with network interfaces such as Ethernet or CAN-bus. As outlined in the previous section, microCommander makes use of such interfaces for the purpose of PC-based remote interaction
Component-Based Engineering of Distributed Embedded Control Software
305
with embedded micro controllers. Such interfaces can also be used for networking micro controllers among themselves in order to build more complex, distributed embedded networks. The purpose of this section is to present our approach to engineering these types of networks based on the paradigms of distributed components and connectionbased programming [20]. Despite the fact that many current embedded software libraries provide implementations of standard data transport protocol stacks, the development of complex, distributed coordination logic among micro controllers still remains a complex task that lacks the methodological and technological support available today for general software development. High-level specification languages such as SDL [5], State Charts [8], Petri nets [17], UML Collaboration Diagrams [7], and UML Sequence Diagrams have proven useful abstractions for analyzing requirements and designing protocols for distributed systems. However, there is still a significant chasm between such high-level formalisms describing the coordination logic of a controller network and the actual implementation of the different controllers in terms of low-level programming languages. Currently, programmers have to cross this chasm manually by repeatedly translating high-level (often diagrammatic) specifications into low-level program code. Driven by time-to-market pressure, programmers often use a shortcut and immediately start coding rather than designing high-level specifications. This practice has proven error-prone for complex systems. Moreover, it decreases the maintainability of a system. Notably, a similar gap in the development process existed for general software engineering not too long ago, however it has recently been narrowed with the introduction of higher-level component-based language platforms (e.g., .net and EJB) and the development of model-driven code generation mechanisms [4]. We have chosen an analogous approach for narrowing the gap between design and implementation of complex coordination logic among distributed embedded controllers. Based on the microCommander component platform, we describe the microSynergy method and technology for model-driven specification and code-generation of coordination logic among distributed embedded controllers. The goal of microSynergy has been to realize the vision of the Object Management Group’s (OMG) Model Driven Architecture (MDA) paradigm for development of distributed embedded control software [13]. One key objective in MDA is to separate application logic from platform technology specific details. Application logic is analyzed and designed in a Platform Independent Model (PIM), which later on is translated automatically or semi-automatically into a Platform Specific Model (PSM) and source code. In addition to increased productivity, this approach provides the benefits of application logic that is easier to change and to migrate to other platforms. 4.1 The microSynergy Composition Model microSynergy’s PIM consists of two diagram types, System Diagrams and Connector Diagrams, describing the static and dynamic aspects of the controller interaction, respectively. These two diagram types are based on UML Component Diagrams and State Diagrams. Analogously to UML Component Diagrams, microSynergy System Diagrams depict components, their interfaces and connectors between them. However, System Diagrams are more restrictive than general UML Component Diagrams, in or-
306
J.H. Jahnke et al.
der to specifically target them to the domain of embedded controllers and make diagrammatic specifications executable. System Diagrams. The level of granularity components in System Diagrams correspond to is the deployment architecture of embedded controllers in the network. In other words, microSynergy System Diagrams treat embedded controllers as opaque entities and do not show their internal architecture, which can be developed, for example, with microCommander. Component interaction in microSynergy uses an event-based paradigm, meaning that there are no return parameters in interface signatures. These signatures each consist of a number of event gates, each one defining a signal that can be exchanged with the rest of the network. Gates can be defined as in-going or out-going and are graphically depicted as bubbles of different color at the border of component symbols. From a component’s viewpoint, in-going gates are used to receive events from the network, whereas out-going gates are used to emit events to the network. Events have names and can carry payload data based on a number of predefined types, such as boolean, integer, and blob (binary large object). Connections between controller components can be as simple as direct event-forwarding channels. However, microSynergy supports the specification of more sophisticated connections, which are required in order to coordinate third-party components with complex interactions. In order to show the advantages of complex connections, let us consider the simple System Diagram in Fig. 6. In this scenario, an “environmental controller” can directly send Heat(int) events to a “burner controller”, if the temperature is too low, and it can send Cool(int) events to a “window controller” if the temperature is too high. Each of these events can carry an integer payload specifying the intensity of the desired heating or cooling action (0-100 percent).
Fig. 6. Simple system diagram
A limitation of system-level designs using only such primitive connections is that interfaces have to be designed to work with each other. All complex application logic that specifies the coordination of the various distributed controllers has to be encapsulated in the controller components themselves. The main problem associated with this approach is the limited re-usability of controller components. However, it is often not practical to require that controllers be custom-programmed for a particular network configuration. In practice, system engineers reuse not only the hardware but also the software of pre-configured controller components. For example, the environment controller in the above example could be used in many other scenarios outside the domain of green house operations. In other scenarios, it might not be integrated with a win-
Component-Based Engineering of Distributed Embedded Control Software
307
dow controller but with an air conditioner or no cooling device at all. Encapsulating the coordination logic for a network of embedded software components inside these components would decrease the re-usability of these components in other contexts. In other words, embedded components should be as ignorant about each other’s existence as possible in order to minimize their architectural dependencies and maximize their re-usability and maintainability. This statement is also valid in the general domain of software engineering, where its consideration has resulted in a new programming paradigm called connection-based programming [20]. The essence of connection-based programming is to promote connections among components to become first-order programming concepts, rather than being simple call-dependencies between component interfaces. While the ideas behind connection-based programming are not new, few language platforms exist that fully realize them. While current programming languages allow programmers to implement “connections” in form of traditional program components, these platforms lack dedicated language concepts. Introducing such dedicated language concepts for connections facilitates separation of concerns (component-internal application logic vs. network coordination logic) and thus promotes the reuse and maintenance of embedded networks. Therefore, we have provided the microSynergy modeling language with an explicit notion for complex connections. Complex Connections. Before we describe how complex connections are realized, let us outline at an abstract level the requirements behind this concept. The above discussion shows that connections should be able to mediate among different “third-party” component interfaces that might have been developed independently from each other. Consequently, connections should be able to change the types of events routed through them, as well as the event payload. Furthermore, there may be more than two component interfaces participating in one logical connection. Based on these considerations we can classify connections, as shown in Fig. 7. We denote the number of in-going gates and the number of out-going gates associated with a connection as #i and #o, respectively. In the terminology of this classification, which is based on [12], the “simple” connections discussed at the beginning of this section and illustrated in Fig. 6 are denoted as binary conduits: they connect a single out-gate to a single in-gate with preservation of event type and payload. Figure 8 shows a simple System Diagram with a binary transducer that connects incompatible gates. In this example, the Environment Control component has an out-gate TempOffset(int degrees), which emits events when the difference between the desired temperature and the actual temperature changes. Obviously, the interface of this component is no longer compatible with the interface of Burner Control. We need a connection that changes the event type as well as translates temperature offsets to percentages of desired burner intensity. If we are considering adding the window controller from Fig. 6 to our network, and, for reasons of energy efficiency, want to ensure that having the burner activated or the windows opened are two mutually exclusive states, we need a ternary connection. Figure 9 illustrates this example, which uses an analyzer transducer. Obviously, any connection more complex than a simple binary conduit requires further definition of coordination logic. microSynergy uses another diagram type called a Connector Diagram for defining the interaction semantics of each connector.
308
J.H. Jahnke et al.
Fig. 7. Classification of connection types
Fig. 8. System Diagram using a binary transducer
Fig. 9. System Diagram with ternary connection
Component-Based Engineering of Distributed Embedded Control Software
309
Connector Diagrams. Connector Diagrams are based on UML State Diagrams and UML Activity Diagrams. As such, Connector Diagrams describe extended finite state machines. The new UML 2.0 standard supports different representations of state machines, defined by different language profiles. The traditional representation of UML State Machines is oriented towards Harel’s original StateChart notation [8]. Another profile in UML 2.0 defines a representation of state machines that adopts the syntax introduced with ITU’s Specification and Description Language (SDL) [21]. microSynergy is using SDL-based syntax, since SDL is well accepted in the domain of embedded protocol design. Moreover, SDL renders state machines as acyclic graphs, which facilitates computer-based layout operations. Another benefit of using SDL syntax is that we can base the semantics of microSynergy specifications on the more precisely defined formal semantics of SDL. Figure 10 shows an example definition for the connection in Fig. 9. This definition is particularly simple, because it uses only a single state, “ready”. The processing of in-going events is specified as a rectangle with an inwardly directed triangle, whereas out-going events are specified as rectangles with outward-pointing triangle. Decision points are shown as diamonds. The connector diagram shows how the TempOffset event is cast to event types Heat and Open. Moreover, it shows how the payload of the original event is mutated. The connector specification in Fig. 10 explicitly excludes the possibility that both systems are engaged at the same time, given that the system was started in a physical state with the windows closed and the burner turned off. Of course it would be more desirable to be able to activate the controller network in any physical state, e.g. with open windows. The simplest way of achieving this functionality would be to set the windows to an initial position at start-up, e.g., to close them automatically. However, this approach assumes that devices and communication links are 100% reliable. A better approach would require sensors to enquirer about the status of both the windows and burners. Figure 11 shows an example connection that uses this approach. In this case,
Fig. 10. Definition of simple ternary connection
310
J.H. Jahnke et al.
Fig. 11. Definition of simple ternary connection
the window controllers must have a status interface that lets external components query to the percentage that a window is open. The connector remains in its initial state start until it receives an activate signal. The connector uses status signals to enquirer about the physical status of the windows and assumes state closed or venting as an outcome of this inquiry. Controller identity and anonymity. The default semantics of microSynergy connection diagrams are to broadcast events to, and receive events from, all connected controllers with a gate of the specified name. This functionality might be desired in some cases. In others, the system engineer might want to specify the exact identity of the sources or destinations of signals. For example, in Fig. 11, we are only interested in inquiring about the status of the window controllers. This is specified by giving the window controller an explicit name (“w”) and using this name to qualify event interaction in connector diagrams, e.g., w.status. Controller multiplicity. Now let us assume that, for reasons of scalability, we want to permit many window controllers in our example controller network. Unless a controller component is annotated with the singleton stereotype, microSynergy’s default se-
Component-Based Engineering of Distributed Embedded Control Software
311
mantics is to broadcast signals to all controllers in the local network which have a type specified in the system diagram. In other words, a controller component that is not marked as a singleton actually stands for “one or many” controller components of the same type. This allows adding type-compatible hardware (e.g., additional window controllers) without changing the coordination logic specified in microSynergy. While these semantics clearly benefit system maintainability, the fact that the number of controllers connected is undetermined at specification time causes a problem when we want to specify a condition that needs to be fulfilled for all controllers of a given type. We solve this problem by introducing timed default transitions at decision points. Default transitions are similar to else clauses in conditional statements. However, they are executed only after the time period chosen as their parameter has elapsed. In other words, this time period specifies a temporal window in which an event might still be received that triggers a different transition originating from the decision point. Our example in Fig. 11 uses this concept to specify that a window closed state can only be reached if no window controller responds with a status event containing a payload p > 0 within 1000 milliseconds of the status inquiry being broadcast. 4.2 Platform-Specific Execution of Coordination Logic The previous section described the composition model of microSynergy logic. This section describes implementation concerns involved in creating a system that executes the composition logic. The microSynergy execution architecture. Figure 12 gives an overview of the execution architecture of microSynergy. Its three main elements are the microSynergy Editor, microSynergy run-time, and the embedded targets. The microSynergy Editor is used by the engineer to develop and maintain the coordination logic on the model level, as described in the previous section. This logic is automatically translated into a highly compact format called CEL (Connector Execution Language) and downloaded to mServer at deployment time. mServer is the micro controller that handles the message routing
Fig. 12. microSynergy execution architecture
312
J.H. Jahnke et al.
between targets. microSynergy run-time is a software component on mServer that handles the message routing among targets according to the CEL logic downloaded from the Editor. A target is a micro controller that participates in a microSynergy network, denoted as a LCN (Local Controller Network) in Fig. 12. Each target has a small piece of software on it that allows the target to expose its in-gate and out-gate interface, interpret the messages sent to it by run-time, and format messages to send to run-time. microSynergy run-time is designed using a layered architecture, as seen in Figure 13. The execution layer can be in one of two modes. The first is administration, where its purpose is to respond to messages sent to it by the microSynergy Editor. For example, the Editor might query what targets are currently registered in the network. The second mode is execution, during which it is interpreting the CEL instructions and, at a high level, controlling what messages are sent to targets. Because these instructions represent an encoding of an extended finite state machine, the job of the execution layer is to decide, based on its current state, how to respond to target signals. The execution layer sends messages to and from targets through the abstract transport layer.
Fig. 13. microSynergy runtime design
The abstract and concrete transport layers are designed to handle communicating with multiple micro controllers taking into account their extremely heterogeneous nature. microSynergy must be able to support a multitude of communication protocols. We may have a network of small, low-cost devices using RS-232 communicating with a real-time system that uses CANbus, a home PC using TCP-IP, and a wireless device that uses Bluetooth. The solution is to put the protocol-neutral aspects of communication into the abstract transport layer, while the protocol-specific aspects are dealt with in the concrete transport layer. Within the concrete transport layer, each protocol microSynergy runtime must support has a corresponding concrete transport component. In more detail, the abstract transport layer stores a mapping between a specific target id and an associated concrete transport component. When the execution layer instructs the abstract transport layer to send a message to a specific target id, the mapping is used to decide what concrete transport component to send the message to. The message is then transformed into a standard format and passed to the concrete transport layer. Within the concrete transport layer, an attempt is made to translate the generic requests of the abstract transport layer, such as sending a message to a specific target, into a protocol-specific realization of that request, such as sending a message through TCP-IP to a specific target. Occasionally this results in slightly different semantic interpretations of the generic request depending on the concrete transport component. For example, when the execution layer, in an administrative state, requests a list of the cur-
Component-Based Engineering of Distributed Embedded Control Software
313
rently connected targets, to the Bluetooth concrete transport component this is a request to do a discovery of any currently available Bluetooth targets available. However to the TCP-IP transport component this is a request for a list of TCP-IP targets that have previously connected to mServer. One transport component interprets the message as a request for targets that are participating in the network and one transport component interprets the message as a request for targets that could participate in the network. Implementation concerns. When working in the domain of embedded systems, clearly it is important to be concerned about resource usage, such as disk space and RAM. Although it was necessary that microSynergy runtime be able to support multiple communication protocols, it was equally necessary that this be realized in as efficient a way as possible. The design solution was to make the particular microSynergy runtime used customizable. Only the protocols that a particular network is using need to be supported available in its infrastructure footprint. The engineer can specify target protocols for each inter-controller connection during the design phase. She does this by adding stereotypes to the communication channels in a System Diagram. Annotations indicate which protocol(s) a particular target supports. In the context of model-driven design, such model refinements can be seen as PIM to PSM transformations. Before deployment, the annotated system diagram is analyzed to determine the configuration of protocols that need to be supported at runtime. 4.3 Using the microSynergy Method and Technology in Practice The microSynergy Editor tool (see Fig. 14) was designed to support the developer in visualizing the embedded devices connected to the network, defining connector logic, as well as importing and exporting pre-built connector logic. As the Editor was designed for both expert and novice users, it supports several different development scenarios. We will now discuss three typical scenarios to give the reader an idea of the
Fig. 14. microSynergy editor
314
J.H. Jahnke et al.
way microSynergy and microCommander can be used in concert for the development of component-based, distributed embedded controller applications. Green field network development. When a network of embedded devices is designed and implemented from scratch, engineers have the freedom to determine all the hardware and software components being used. Additionally, engineers will have some foreknowledge as to which devices will be communicating with one another. Most importantly, engineers developing embedded device networks typically have the expertise required to customize the embedded devices to meet their needs precisely. However, developing a network from scratch can be time consuming. Although engineers have the ability to design the embedded devices, they often do not have control over time and money constraints. The microSynergy Editor was designed to assist expert developers by minimizing the time and effort required to implement, maintain and evolve the logic controlling embedded device communication. Engineers can use the microSynergy Editor to create system and connector interaction diagrams. They can add target devices including their respective in-gates and out-gates to the system diagram as required, then define connector logic state machines in terms of states, inputs, outputs and conditionals. The result is a system specification document which can be downloaded and deployed to the network. Using microCommander, engineers then assemble each controller’s embedded software to meet their requirements. They then define an interface consisting of in-gates and out-gates for each controller as specified by the system diagram documentation developed using microSynergy. From a microCommander developer’s perspective, in-gates and out-gates are merely two additional mComponents that can be deployed and integrated with other software components. Finally, each controller, now equipped with the embedded software developed using microCommander, is connected to the network and the microSynergy coordination logic is downloaded to the dedicated controller hosting the microSynergy runtime engine. The engineers’ workload is significantly lightened by allowing them to download and deploy system specification diagrams developed using the microSynergy Editor rather than having to manually translate design documents into an executable format. In addition, subsequent examinations of the network result in documentation that is both up to date and accurate in terms of the devices connected to the network and the connectors defining device communication. Hence, the microSynergy Editor acts as a round-trip development tool, where design documentation can be generated from an existing implementation as well as downloaded and executed on the network. Network development with third-party controllers. In many cases, application developers themselves will not design and implement all embedded devices connected to a network from scratch. Often, they will purchase devices off-the-shelf, connect them to the network, and focus on creating custom connector logic to allow the devices to work together. microSynergy supports this method of network development, enabling the technology-savvy to quickly install and configure complex networks of third party controllers. Initially, the network developer purchases pre-programmed controllers they require from third party vendors and connects them to the network. They use microSynergy Editor to introspect the network and automatically create a System Diagram for the
Component-Based Engineering of Distributed Embedded Control Software
315
current network topology. Connectors and their internal logic are then defined, anchored to the in-gates and out-gates associated with the controllers as required, and downloaded to the network. Here, we note that embedded device developers are alleviated from developing communication logic for their controllers and network developers are alleviated from the design and implementation of the embedded software controlling the devices in the network. This exemplifies one of the benefits of connection-based programming, that components can be bound together as required with little forethought concerning how they may actually be interconnected during their implementation. Development with third-party connectors. An increasing number of embedded automation and control applications target non-engineers as their customers. Home automation, for example, is a rapidly emerging market in many industrialized countries. The average layperson will typically have little knowledge of embedded devices, finite state machines or other concepts in the domain. However, such users should be empowered to configure and customize their network to meet their needs, e.g., their home environment and the processes they would like to automate. To address this issue, microSynergy supports the development and reuse of third-party connector logic in the form of templates. Using templates, end users can develop and configure their embedded device network without having to implement the connector logic themselves. To begin, the end user purchases the devices they require and connects them to the network. After examining the network, they can import predefined connector logic templates developed by third parties. The template logic can then be downloaded and deployed to the network. Templates can be seen as pre-defined, customizable interaction patterns between embedded devices.
5
Related Work
The idea of constructing software by configuring and connecting proven, reusable components (as opposed to manual programming) has existed for several decades. During the 90’s component-oriented construction gained increasing interest in the commercial section. This popularity has been driven by the availability of reusable frameworks and pattern libraries for object-oriented languages like C++ and Java [15] [11]. Johnson gives a good overview on the pros and cons of employing components and other re-usability technology for software construction [9]. One prominent problem of component and framework reuse is how to efficiently store, maintain, and look up a generally very large number of reusable components. Several representations, query languages and algorithms have been proposed for this purpose, e.g., by Sahraouim and Benyahia [14]. Even though the problem of component-oriented construction for general software has not yet been sufficiently solved, current industrial practice proves that this approach is viable and productive for specific application domains. For example, component-oriented techniques play an important role in constructing current graphical user interfaces, e.g., Java Beans [2]. Stewart has shown that similar advantages of domain-dedication apply to the use of component-orientation in the design of embedded systems [18]. The notion of making component reuse feasible by focusing on a particular domain is related to the idea of
316
J.H. Jahnke et al.
product lines as presented in [3]. In this sense, microCommander and microSynergy are clearly focused on supporting control applications. They do not provide adequate support for developing other types of embedded system application, e.g., software for cellular phones. Using PC-based user interfaces for monitoring and controlling embedded systems is also the approach taken in National Instrument’s LabView [10]. However, the use of LabView requires significant knowledge and skills, e.g., requiring users to program with iterative loops and conditional statements. Our approach is simpler, being based on traditional control components that represent such things as switches, timers, status indicators, etc. Moreover, our notion of an embedded component encapsulates the functional embedded code as well as the visual representation of interactive, PC-based dialogues. Our approach, to connect different application block components, is related to work performed in the area of architectural interconnectors as presented by Allan and Garlan [1]. The difference in our approach is that it is currently restricted to asynchronous (signal-based) communication only. Furthermore, we deal with a-posteriori integration. We have chosen SDL for specifying the integration among components. This is in contrast to many other modeling approaches that employ the Unified Modeling Language (UML) for this purpose [19]. We made this decision because SDL has formal semantics and is widely used in the embedded systems domain [5]. However, the UML 2.0 specification [21], which is currently nearing completion, has adopted SDL semantics within its scope.
6
Discussion and Experiences
Throughout our research investigating component-oriented engineering of embedded control software for network-centric systems, several new issues have been brought to light. In the following paragraphs we will discuss some of these issues and the research opportunities they present. 6.1 3rd Party and Custom Components No matter how many components types are available, there will always be applications requiring at least one more specialized component that is missing in the available libraries. Alternatively, a user may have valuable legacy code that she would like to run side by side with other microCommander components. That is, she would like to wrap her legacy code into an mComponent-like interface to enjoy all of the features of microCommander, while the code performs as before on the target controller. This brings up the following issues: 1. Already, some target platforms have insufficient resources to simultaneously support all existing component types. A dynamic linking solution is being considered, which would allow the user to load only the subset of component types required by her application. However, dynamic component linking would require the component infrastructure to be extended. The resulting increase in the memory and processing footprint of the component could lead to bottlenecks on small devices.
Component-Based Engineering of Distributed Embedded Control Software
317
2. At some point, the number of components and the task of finding the right component for the job will overwhelm the user. One view is that this will eventually create demand for some type of discovery service. These issues are subject to ongoing research at the University of Victoria. Relationships between components. As microCommander applications become more complex, the user not only needs to be aware of the mComponents and their properties, but also needs to understand the dependencies between components in order to manage them efficiently. This is particularly true when many logic components are involved or when a number of components can trigger a job on another component. For instance, there could be an overtemperature alarm/shutoff on the greenhouse heaters that could be triggered from any of a dozen temperature sensors. Seeing the link to the offending sensor provides the engineer with cognitive support during application development. This is analogous to a visual cross-referencing browser in traditional programming. Intec Automation is currently experimenting with a dependency browser for component-based applications (see Fig. 15, the logic to accomplish wear leveling of the greenhouse unit heaters, as an example). The research challenge in developing such a browser is to devise different logical views, showing different concerns of a component composition without overloading the screen with clutter. Techniques like fish-eye views and dynamic component animations might represent appropriate mechanisms.
Fig. 15. Dependency view (excerpt) of a microCommander application
6.2 Multi-tiered Architecture microCommander is currently based on a two-tier architecture comprising an embedded application tier and a PC-based operator tier. Automation micro controllers are designed to interface with the real world and are not good communication engines. This presents a problem when multiple simultaneous users are introduced into this architecture. Intec and UVic have been working on an approach that uses a three-tier architecture to provide more scalability, security and reliability for interactive embedded applications. With a three-tier architecture much of the burden of serving multiple users can be moved from the micro controller to the middle tier. This separation of concerns allows the middle tier (mGateway) to act as a liaison between the micro controller and the end-users. This tier also includes a data cache for the values from the micro controller, which will reduce redundant communications in a multi-user scenario. Interesting research topics lie in the negotiation strategies of quality of service attributes between interactive
318
J.H. Jahnke et al.
clients and the embedded targets. For example, the middle tier might use caching strategies to decrease the latency of fulfilling client requests at the cost of the currency of the requested information etc. 6.3 Semantic Interface Ontologies One of the goals of embedded device network development and research is to support a ubiquitous computing environment that automatically evolves in step with changing requirements. Embedded networks should adapt and evolve as transparently as possible to meet the changing needs of their users. However, as networks evolve, they become increasingly complex, making subsequent network modification increasingly difficult. We are currently researching the concept of embedded components with semantic interfaces in an attempt to address the problem of transparent network evolution. Our goal is to allow a controller to be connected to the network and to have the network automatically reconfigure itself to incorporate the new device. To achieve this, a controller will be ascribed a set of predicates with a tool like microCommander that will describe the device in detail. The predicates used to describe controllers will be defined in a universally accessible ontology [16] describing the micro controller domain. Though artificial intelligence literature offers many definitions of an ontology, for our purposes an ontology is a hierarchical structure that describes entities and their inter-relations within a specified domain. Predicates defined in the ontology are used to describe a controller’s role and context within the network. The roles that can be ascribed to a controller, such as a window or a burner, will include a description of the interface common to all controllers of the same type. For example, a burner may support a Heat in-gate as well as status and malfunction out-gates. Hence, the network will be able to introspect the predicates declared by a newly connected controller and determine how signals should be routed to and from it in accordance to the connector logic. This topic provides interesting and challenging research questions such as how to describe semantic interfaces and how to publish and evolve the common ontology.
References 1. Allen, R. and D. Garlan. Beyond definition/use: architectural interconnection. Proceedings of the workshop on Interface definition languages. 1994. Portland, USA: ACM Press. 2. Deitel, H.M. and P.J. Deitel, Java: how to program. 1999, Prentice Hall:Upper Saddle River, N.J. 3. Donohoe, P.: Software product lines: experience and research directions: proceedings of the First Software Product Lines Conference (SPLC1), August 28-31, 2000, Denver, Colorado. 2000, Boston, MA: Kluwer Academic. xv, 532 4. Eisenecker, U. and Czarnecki, K., Generative Programming: Methods, Tools, and Applications, Addison-Wesley, 2000 5. Ellsberger, J., D. Hogrefe, and A. Sarma, SDL - Formal Object-oriented Language for Communicating Systems. 1997: Prentice Hall Europe. 6. Estrin, D., Govindan, R., and Heidemann, J., Embedding the Internet. Communications of the ACM, 2000. 43: p. 38-50.
Component-Based Engineering of Distributed Embedded Control Software
319
7. Fowler, M. and Scott, K.,. UML Distilled: A Brief Guide to the Standard Object Modeling Language, 2E, Addison Wesley Professional, ISBN: 0-201-65783-X. 2000 8. Harel, D. and Gery, E., Executable Object Modeling with Statecharts Proceedings of the 18th Intl. Conf. on Software Engineering, pp. 246-257, IEEE CS / ACM Press, 1996. 9. Johnson, R. Components, frameworks, patterns. in 1997 Symposium on software reusability. 1997. Boston, USA: ACM Press. 10. LabVIEW - The Software That Powers Virtual Instruments, National Instruments Corporation, Austin, Texas. http://www.ni.com/labview 11. Leavens, G.T. and M. Sitaraman, Foundations of component-based systems. 2000, Cambridge, [England] ; New York: Cambridge University Press. ix, 312. 12. Lorenz, D., and Vlissides, J., Designing Components versus Objects: A Transformational Approach. ICSE 2001: 253-262, Toronto, Ontario, Canada, May 12-19 2001 13. MDA - The Architecture of Choice for a Changing World, Object Management Group. http://www.omg.org/mda/ 14. Mili, H., H. Sahraouim, and I. Benyahia. Representing and querying reusable object frameworks. in Symposium on software reusability. 1997. Boston, USA: ACM Press. 15. Nierstrasz, O., S. Gibbs, and D. Tsichritzis, Component-Oriented Software Development. Communications of the ACM, 1992. 35(9): p. 160-165. 16. Noy, Natalya F. and. McGuinness, Deborah L; Ontology Development 101: A Guide to Creating Your First Ontology; Stanford University, Stanford, CA, 94305 http://protege.stanford.edu/publications/ontology development/ontology101.pdf 17. Petri, C., Concurrency Theory Advanced Course on Petri Nets, pp. 1-22, Gesellschaft f¨ur Mathematik und Datenverarbeitung, St. Augustin, Germany, 1986. 18. Stewart, D. Designing Software Components for Real-Time Applications. in Embedded System Conference. 2000. San Jose, CA, USA. 19. Stevens, P. and R.J. Pooley, Using UML software engineering with objects and components. 2000, New York: Addison-Wesley. 20. Szyperski, C., Component Software, Beyond Object-Oriented Programming. 1997: AddisonWesley. 21. UML, Unified Modeling Language, UML 2.0 specification http://www.uml.org/
Component-Based Development of Dependable Systems with UML Jan J¨urjens and Stefan Wagner Software & Systems Engineering Technische Universit¨at M¨unchen Boltzmannstr. 3, D-85748 Garching, Germany {juerjens,wagnerst}@in.tum.de
Abstract. Dependable systems have to be developed carefully to prevent loss of life and resources due to system failures. Some of their mechanisms (for example, providing fault-tolerance) can be complicated to design and use correctly in the system context and are thus error-prone. This chapter gives an overview of reliability-related analyzes for the design of component-based software systems. This enables the identification of failure-prone components using complexity metrics and the operational profile, and the checking of reliability requirements using stereotypes. We report on the implementation of checks in a tool inside a framework for tool-supported development of reliable systems with UML and two case studies to validate the metrics and checks.
1
Introduction
There is an increasing desire to exploit the flexibility of software-based systems in the context of critical systems where predictability is essential. Examples include the use of embedded systems in various application domains, such as fly-by-wire in Avionics, drive-by-wire in Automotive and so on. Given the high reliability requirements in such systems (such as a maximum of 10−9 failures per hour in the avionics sector), a thorough design method is necessary. We define reliability as the probability of failure-free functioning of a software component for a specified period in a specified environment. Reliability mechanisms cannot be “blindly” inserted into a critical system, but the overall system development must take these aspects into account. Furthermore, sometimes such mechanisms cannot be used off-the-shelf, but have to be designed specifically to satisfy given requirements. For example, the use of redundancy mechanisms to compensate for the failures that occur in any operational system may require complex protocols whose correctness can be non-obvious [41]. This can be non-trivial, as spectacular examples for software failures in practice demonstrate (such as the explosive failure of the Ariane 5 rocket in 1997). Any support to aid reliable systems development would thus be useful. In particular, it would be desirable to consider reliability aspects already in the design phase, before a system is actually implemented, since removing flaws in the design phase saves cost and time. This is significant; for example, in avionics, verification costs represent 50% of the overall costs. Moreover a means to estimate reliability, or at least identify failureprone components, early in the life-cycle of the software would be helpful to make C. Atkinson et al. (Eds.): Component-Based Software Development, LNCS 3778, pp. 320–344, 2005. c Springer-Verlag Berlin Heidelberg 2005
Component-Based Development of Dependable Systems with UML
321
verification more efficient. We believe that the design models are the best indicator in early phases for the future behavior of the system and thus should be used for reliability estimation. Following an idea advocated in [1], we thus aim to incorporate quality attributes of models (such as measures derived from structural or behavioral attributes) into component-based models of software systems within the context of model-based software development. As a design notation, we use the Unified Modeling Language (UML) [35], the de facto industry-standard in object-oriented modeling. It offers an unprecedented opportunity for high-quality critical systems development that is feasible in an industrial context. Problems in critical systems development often arise when the conceptual independence of software from the underlying physical layer turns out to be an unfaithful abstraction (for example in settings such as real-time or more generally safety-critical systems, see [42]). Since UML allows the modeler to describe different views on a system, including the physical layer, it seems promising to try to use UML to address these problems by modeling the interdependencies between the system and its physical environment. To support safe systems development, safety checklists have been proposed for example in [15]. In the present chapter, based on an extending work presented in [20], we tailor UML in a similar approach to reliable systems by precisely defining some checks with stereotypes capturing reliability requirements and related physical properties. We also provide metrics that estimate the failure-proneness of a software system based on the complexity of its design models and its operational profile. The reliability requirements can then be compared with the results for failure-proneness. In this way we encapsulate knowledge on prudent reliability engineering and thereby make it available to developers who may not be specialized in reliable systems. A prototypical framework for tool-support for this approach is also presented within this chapter. Outline. In Sect. 2.1 we explain the foundation for checking the constraints associated with the stereotypes suggested for reliable systems development which are presented in Sect. 2.2, together with examples of their use. A metrics suite for models is defined in Sect. 3.1 and these metrics are used to analyze the failure-proneness of components in Sect. 3.2. In Sect. 4, we briefly describe the tool assisting our approach. Two case studies describing an automatic collision notification system in Sect. 5 and an automotive network controller in Sect. 6 are finally used to validate our work.
2
Model-Based Reliability Specification and Analysis
In safety-critical systems, an important concept also used here is that of a safety level (see, e.g. [39]). Since safety-critical systems generally need to provide a high degree of reliability, it makes sense to analyze these systems with respect to their maximum allowed failure rate. We thus define the concept of a reliability level analogous to the mentioned safety levels. We exemplarily consider the following kinds of failure semantics in this chapter (other kinds have to be omitted for space reasons):
322
Jan J¨urjens and Stefan Wagner
– crash/performance failure semantics means that a component may crash or may deliver the requested data only after the specified time limit, but it is assumed to be partially correct. – value failure semantics means that a component may deliver incorrect values. Possible failures include: – message loss which may be due to hardware failures or software failures (for example, buffer overflows) – message delay which may in turn result into the reordering of messages if the delay is variable – message corruption when a message is modified in transit. Forms of redundancy commonly employed against these failures include space redundancy (physical copies of a resource), time redundancy (rerunning functions) and information redundancy (error-detecting codes). UML Profile Mechanisms. We use the three main profile mechanisms (stereotypes, tagged values and constraints) to include reliability requirements in a UML specification, together with the constraints formalizing the requirements. To evaluate a model against the requirements, we refer to a precise semantics for the used fragment of UML extended with a notion of failures sketched in Sect. 2.1. 2.1 Evaluation of Reliability Requirements in UML Diagrams We briefly give an idea how the constraints used in the UML extension presented in Sect. 2.2 can be checked in a precise and well-defined way. A precise semantics for a (restricted and simplified) fragment of UML supporting these ideas can be found in [21]. It includes activity diagrams, statecharts, sequence diagrams, composite structure diagrams, deployment diagrams, and subsystems, each restricted and simplified to keep a mechanical analysis that is necessary for some of the more subtle behavioral reliability requirements feasible. The subsystems integrate the information between the different kinds of diagrams and between different parts of the system specification. For reliability analysis, the reliability-relevant information from the reliability-oriented stereotypes is then incorporated as explained below. Outline of Precise Semantics. In UML the objects or components communicate through messages received in their input queues and released to their output queues. Thus for each component C of a given system, the semantics defines a function [[C ]]() which – takes a multi-set I of input messages and a component state S (a multi-set, also called a bag, is a set whose elements may occur more than once) and – outputs a set [[C ]](I, S) of pairs (O, T ) where O is a multi-set of output messages and T the new component state (it is a set of pairs because of the non-determinism that may arise)
Component-Based Development of Dependable Systems with UML
323
together with an initial state S0 of the component. The behavioral semantics [[D ]]() of a state machine diagram D models the run-to-completion semantics of UML state machines. Similarly, one can define the semantics for UML 1.5 activity diagrams. Given a sequence diagram S, we define the behavior [[S.C ]] () of each contained component C. Subsystems group together diagrams describing different parts of a system: a system component C given by a subsystem S may contain subcomponents C1 , . . . , Cn . The behavioral interpretation [[S ]]() of S is defined by iterating the following steps: 1. It takes a multi-set of input events. 2. The events are distributed from the input multi-set and the link queues connecting the subcomponents and given as arguments to the functions defining the behavior of the intended recipients in S. 3. The output messages from these functions are distributed to the link queues of the links connecting the sender of a message to the receiver, or given as the output from [[S ]] () when the receiver is not part of S. When performing reliability analysis, after the last step, the failure model may corrupt the contents of the link queues in a certain way explained below. Note that this approach is similar to that taken in [21], where a security analysis is performed in place of the reliability analysis. As an example, the state chart in Fig. 1 is executed as follows: The fuel controller starts out in state WheelsOut. It awaits either the message fuel() or wheelsin(). In the first case, the argument of the message is multiplied with the constant d and the result returned. In the second case, if the argument is false, no change occurs. In case of true, the state is switched to WheelsIn. In that state, the same behavior occurs, except that the argument of fuel() is now multiplied with the constant c.
«containment» Fuel controller {reliable={fuel}} fuel(x:Data):Data wheelsin(x:Bool) Fuel control fuel(x:Data):Data wheelsin(x:Bool)
fuel(x)/return(c.x)
fuel(x)/return(d.x)
wheelsin (true) WheelsIn
WheelsOut wheelsin (false)
wheelsin(true)
wheelsin(false)
Fig. 1. Example State chart
As in standard terminology in high assurance systems, the values output by a component by the means of call or send actions could be referred to as controlled quantities, whereas the values input as events are the monitored quantities [3]. Reliability Analysis. For a reliability analysis of a given UML subsystem specification S, we need to model potential failure behavior. We model specific types of failures that can corrupt different parts of the system in a specified way, depending on the used redundancy model. For this we assume a function F ailuresR s which takes a redundancy
324
Jan J¨urjens and Stefan Wagner
model R and a stereotype s ∈ {crash/performance,value} and returns a set of expressions F ailuresR s ⊆ { delay(t) : t ∈ N ∧t > 0 } ∪ { loss(p) : p ∈ [0, 1] } ∪ { corruption(q) : q ∈ [0, 1] }. Here R is a name representing a redundancy mechanism (such as duplication of components together with a voting mechanism), which is semantically defined through the Failures{} sets. The natural number t represents the maximum delay to be expected in time units. p gives the probability that an expected data value is not delivered after the t time units specified in delay (t). Given a value delivered within this time period, q denotes the probability that this value is corrupted. As an example for a failures function, Table 1 gives the one for the absence of any redundancy mechanism (R = none). Here, the time and probability parameters are still included as parameters; for a given system, these will be concrete numeric values. Table 1. Failure semantics Risk Failuresnone () Crash/performance {delay (t), loss(p)} Value {corruption (q)}
The consistency of the failure model with the physical reality (and in particular the completeness in the sense that no possible failures are missing from the failure model) can for example be established by simulating the model and comparing the results with data obtained from experiments on the physical systems. Similarly, the probabilistic values and other numerical data can also be derived. Note that this consistency cannot, for principled reasons, be proved, since mathematical proofs can only be constructed with respect to mathematical models of reality, not reality itself. The consistency of the running code with the execution semantics of the UML diagrams used here can be guaranteed in two ways: Firstly, one can use automated code generation, where the code generator would (for high assurance applications) ideally be formally proved to be correct with respect to the semantics. Where code generation is not useful, the manually implemented code can still be checked against the semantics by using model-based testsequence generation (this is not considered here; [19] gives an introduction). Then we model the actual behavior of a failure, given a redundancy model R, as a failure function that, at each iteration of the system execution, non-deterministically maps the contents of the link queues in S and a state S to the new contents of the link queues in S and a new state T as explained below. For this, for any link l, we use a sequence (lqnl )n∈N of multi-sets such that at each iteration of the system, for any n, lqnl contains the messages that will be delayed for further n time units. Here lq0l stands for the actual contents of the link queue l. At the beginning of the system execution, all these multi-sets are assumed to be empty. Also, for any execution trace h (that is, a particular sequence of system states and occurring failures describing a possible history of the system execution), we define a sequence (phn )n∈N of probabilities such that at the nth iteration of the system, the failure considered in the current execution trace happened with probability phn . Thus the probability ph that a trace h of length n will take place is the product of the values ph1 , . . . , phn (since in our presentation here, we assume failures to be mutually independent, to keep the exposition accessible). Then for
Component-Based Development of Dependable Systems with UML
325
an execution trace h, the failure function is defined as follows. It is non-deterministic in the sense that for each input, it may have a set of possible outputs. Failure behavior should be part of the trace. – For any link l stereotyped s where loss(p) ∈ FailuresR s we • either define lq0l := ∅ and append p to the sequence (phn )n∈N , • or append 1 − p to the sequence (phn )n∈N . – For any link l stereotyped s where corruption(q) ∈ FailuresR s we • either define lq0l := { } and append q to the sequence (phn )n∈N , • or append 1 − q to the sequence (phn )n∈N . – For any link l stereotyped s where delay (t) ∈ FailuresR s and lq0l = ∅, we define lqnl := lq0l for some n ≤ t and append 1/t to the sequence (phn )n∈N . l . – Then for each n, we (simultaneously) define lqnl := lqn+1 The failure types define which kind of failure may happen to a communication link with a given stereotype, as explained above. Note that for simplicity we assume that delay times are uniformly distributed. Also, corrupted messages (symbolized by 2) are assumed to be recognized (using error-detecting codes). To evaluate the reliability of the system with respect to the given type of failure, we define the execution of the subsystem S in presence of a redundancy model R to be the function [[S ]] R () defined from [[S ]]() by applying the failure function to the link queues as a fourth step in the definition of [[S ]] () as follows: 4. The failure function is applied to the link queues as detailed above. Containment. A system ensures containment if there is no unreliable interference between components on different reliability levels (this is called non-interference in [8]). Intuitively, providing containment means that an output should in no way depend on inputs of a lower level. We assume that we are given an ordered set Levels of reliability levels. Then the containment constraint is that in the system, the value of any data element of level l may only be influenced by data of the same or a higher reliability level: Write H(l) for the set of messages of level l or higher. Given a sequence m of messages, we write m H(l) for the sequence of messages derived from those in m by deleting all events the message names of which are not in H(l). For a set M of def
sequences of messages, we define M H = {m H : m ∈ M }. Definition 1. Given a component C and a reliability level l, we say that C provides containment with respect to l if for any two sequences i, j of input messages, i H(l) = j H(l) implies [[C ]]i H(l) =[[C]]j H(l) . 2.2 Stereotypes for Reliability Analysis: The “Reliability Checklist” In Table 2 we give some of the stereotypes, together with their tags and constraints, that we suggest to be used in the model-based development of reliable systems with UML, based on previous experience in the model-based development of reliable systems (for space restrictions, we can only give a representative selection). Thus, in a way, we define a UML-based “Reliability Checklist” (which one can verify mechanically on the design level). The constraints, which in the table are only
326
Jan J¨urjens and Stefan Wagner Table 2. Stereotypes Stereotype risk crash/ performance value
Base Class Tags link, node failure link, node
guarantee redundancy
link, node goal dependency, model component subsystem
reliable links reliable dependency critical reliable behavior containment error handling
Constraints
Description risks crash/performance failure semantics value failure semantics guarantees redundancy model
dependency reliability matched by links call, send respect data reliability
enforces reliable communication links structural data reliability critical object reliable behavior containment handles errors
link, node
subsystem object subsystem subsystem subsystem
(level) behavior fulfills reliability provides containment error object
Table 3. Tags Tag failure
Stereotype risk
Type P({delay (t), loss(p), corruption (q)}) goal guarantee P({immediate(t), eventual , correct }) model redundancy {none, majority, fastest } error object error handling string
Multipl. Description * specifies risks *
specifies guarantees
* 1
redundancy model error object
named briefly, are formulated and explained in the remainder of the section. Table 3 gives the corresponding tags. The relations between the elements of the tables are explained below in detail. Note that some of the concepts introduced below are easier to apply at component rather than object level. We explain the stereotypes and tags given in Tables 2 and 3 and give examples (which for space restrictions have to be kept simple). Note that the constraints considered here span a range in sophistication: Some of the constraints are relatively simple (comparable to type-checking in programming languages) and can be enforced at the level of abstract syntax (such as reliable links) and can be used without the semantics sketched in Sect. 2.1. Others (such as containment) refer to the semantics and can only be checked reliably using tool-support. Overview. We give an overview of the syntactic extensions together with an informal explanation of their meaning. redundancy, with associated tag {model}, describes the redundancy model that should be implemented. risk describes the risks arising at the physical level using the associated tag {failure}. guarantee requires the goals described in the associated tag {goal} for communicated data. reliable links ensures that reliability requirements on the communication are met by the physical layer. critical labels critical objects using the associated tags {level} (for each reliability level level ). reliable dependency ensures that communication dependencies respect reliability requirements on the communicated data. reliable behavior ensures that the system behaves reliably as required by
Component-Based Development of Dependable Systems with UML
327
guarantee, in the presence of the specified failure model. containment ensures containment as defined in Definition 1. error handling with tag {error object} provides an object for handling errors. In the following paragraphs, we define the stereotypes and their constraints in detail. Redundancy. The stereotype redundancy of dependencies and components and its associated tag {model} can be used to describe the redundancy model that should be implemented for the communication along the dependency or the values computed by the component. Here we consider the redundancy models none, majority, fastest meaning that there is no redundancy, there is replication with majority vote, or replication where the fastest result is taken (but of course there are others, which can easily be incorporated in our approach). Risk, Crash/Performance, Value. With the stereotype risk on links and nodes in deployment diagrams one can describe the risks arising at these links or nodes, using the associated tag {failure}, which may have any subset of {delay (t), loss(p), corruption(q)} as its value. In the case of nodes, these concern the respective communication links connected with the node. Alternatively, one may use the stereotypes crash/performance or value, which describe specific failure semantics (by giving the relevant subset of {delay (t), loss(p), corruption(q)}): For each redundancy model R, we have a function FailuresR s from a given stereotype s ∈ {crash/ performance,value} to a set of strings FailuresR s ⊆ {delay (t), loss(p), corruption(q)}. If there are several such stereotypes relevant to a given link (possibly arising from a node connected to it), the union of the relevant failure sets is considered. This way we can evaluate UML specifications. We make use of this for the constraints of the remaining stereotypes. An example for a failures function was given above in Table 1. Guarantee. call or send dependencies in object or component diagrams stereotyped guarantee are supposed to provide the goals described in the associated tag {goal} for the data that is sent along them as arguments or return values of operations or signals. The goals may be any subset of {immediate(t), eventual (p), correct (q)}. This stereotype is used in the constraints for the stereotypes reliable links and reliable behavior. Reliable Links. The stereotype reliable links, which may label subsystems, is used to ensure that reliability requirements on the communication are met by the physical layer. We recall that in UML deployment diagrams, communication is specified on the logical level by communication dependencies between components, which is supported on the physical level by communication links between the nodes on which the components reside. More precisely then, the constraint enforces that for each dependency d with redundancy model R stereotyped guarantee between subsystems or objects on different nodes n, m, we have a communication link l between n and m with stereotype s such that – if {goal} has immediate(t) as one of its values then delay (t ) ∈ verb Failures R s entails t ≤ t,
328
Jan J¨urjens and Stefan Wagner
– if {goal} includes eventual (p) as one of its values then loss(p ) ∈ FailuresR s entails p ≤ 1 − p, and – if {goal} has correct (q) as one of its values then corruption(q ) ∈ FailuresR s entails q ≤ 1 − q. Example. In Fig. 2, given the redundancy model R = none, the constraint for the stereotype reliable links is fulfilled if and only if T ≤ t, where t is the expected delay according to the Failuresnone (crash/performance) scenario in Fig. 1. client/server «reliable links»
client machine client apps
«guarantee» {goal= server machine {immediate(T)}} server «call» apps «crash/ performance»
Fig. 2. Example reliable links usage
Reliable Behavior. The stereotype reliable behavior ensures that the specified system behavior in the presence of the failure model under consideration does provide the reliability goals stated in the tag {goal} associated with the stereotype guarantee as follows, by referring to the semantics sketched in Sect. 2.1. – immediate(t). In any trace h of the system, the value is delivered after at most t time steps in transmission from the sender to the receiver along the link l. Technically, the constraint is that after at most t steps the value is assigned to lq0l . – eventual(p). In any trace h of the system, the probability that delivered value is lost during transmission is at most 1 − p. Technically, the sum of all ph for such histories h is at most 1 − p. – correct(q). In any trace h of the system, the probability that delivered value is corrupted during transmission is at most 1 − q. Technically, the sum of all ph for such histories h is at most 1 − q.
3
Model-Based Reliability Metrics
This section describes the possibilities and benefits of a reliability-related analysis of models based on complexity metrics. We first explain the motivation and assumed development process. Afterwards specific metrics for structured classes and state machines are proposed and combined with established object-oriented design metrics. These metrics are joined with the operational profile of the system to find the failureprone components. This information is finally used in combination with the reliability requirements from Sec. 2.
Component-Based Development of Dependable Systems with UML
329
The main idea is to identify failure-prone components early in the life cycle of the software by their complexity measures and operational profile, and use this information in checks regarding reliability requirements. The complexity information can help us to rethink design decisions and simplify the design in general. The further analysis can guide the test and review efforts to concentrate on the more critical and failure-prone components. Finally, annotated reliability requirements can be checked for consistency with this information. 3.1 Model Complexity Metrics The complexity of software code has been studied to a large extent. The most widely known metrics concerning complexity are Halstead’s Software Metric [12] and McCabe’s Cyclomatic Complexity [29] and many variations of these. In [25, 32] it is shown that the reliability of a software is related to its complexity. It is generally accepted that complexity is a good indicator for the reliability of a component. This means that a component with a high complexity is more likely to contain faults. Depending on the operational profile of the component, this can mean that the reliability is low. For example, it is stated in [40] that a combination of size and cyclomatic complexity delivers good results in reliability prediction. Although the traditional complexity metrics are not easily applicable to design models, there are already a number of approaches that propose design metrics [4, 6, 48]. However, they concentrate mainly on the structure or do not support object-oriented designs. Nevertheless there are also various metrics for object-oriented design models. The most important is the metrics suite proposed in [7] which concentrates on various aspects of classes. Most of these metrics were found to be good estimators of faultprone classes in [2] and will be used and extended in the following. In using a suite of metrics we follow [11, 30] stating that a single measure is usually inappropriate to measure complexity. Development Process. The metric suite described below is generally applicable in all kinds of development processes. It does not need specific phases or sequences of phases to work. However, we need detailed design models of the software to which we apply the metrics. This is most rewarding in the early phases as the models then can serve various purposes. Otherwise, we assume no specific process (apart from being modelbased) and therefore omit details on possible process models. The idea is mainly to incorporate reliability aspects during the development of the model. We base our metrics especially on some parts of UML 2.0 that are most relevant to embedded systems development. The parts that we will look at are classes, structured classes, components, and state machines. We adjust new metrics and the ones from [7] to parts of UML 2.0 based on the design approach taken in ROOM [43] or UML-RT [44], respectively. This means that we model the architecture of the software with structured classes (called actors in ROOM, capsules in UML-RT) that are connected by ports and connectors to describe the interfaces and which can have associated state machines that describe their behavior. The structured classes can have parts that may themselves be structured. Thus a hierarchical system decomposition is possible.
330
Jan J¨urjens and Stefan Wagner
The metrics defined below for the different model elements can predict the faultproneness of the components. To be able to make a reliability analysis we need information about the failure-proneness of components, i.e. the probability that the fault causes a failure. We will use a very simple form of an operational profile [33] to determine the usage level of a component. Therefore the development process must support the creation of operational profiles early in the development. Structured Classes and Components. Structured classes and components are a new concept in UML 2.0 derived mainly from ROOM and UML-RT. It introduces composite structures that represent a composition of run-time instances collaborating over communications links. This allows UML components and classes to have an internal structure consisting of other components or classes that are bound by connectors. Furthermore ports are introduced as a defined entry point to a class or component. A port can group various provided and required interfaces. A connection between two classes or components through ports can also be denoted by a connector. The parts of a class or component work together to achieve its behavior. A state machine can also be defined to describe additional behavior. The metrics defined in this section are applicable to components as well as classes. However, we will concentrate on structured classes following the usage of classes in ROOM. Therefore the set of documents under consideration in the following are composite structure diagrams of single classes or components with their parts, provided and required interfaces, connectors and their state machines if existing. An example for this is depicted in Fig. 3. Number of Parts (NOP). The number of parts of a structured class or component contributes obviously to its structural complexity. The more parts it has, the more coordination is necessary because of the more dependencies. Therefore, we define NOP as the number of direct parts Cp of a class or component. Number of Required Interfaces (NRI). This metric is (together with the NPI metric below) based on the fan-in and fan-out metrics from [16] and is also a substitute for the old Coupling Between Objects (CBO) that was criticized in [28] in that it does not represent the concept of coupling appropriately. It reduces ambiguity by giving a clear direction of the coupling. We use the required interfaces of a class to represent the usage of other classes. This is another increase of complexity which may as well lead to failure, for example if the interfaces are not correctly defined. Therefore we count the number of required interfaces Ir for this metric. Number of Provided Interfaces (NPI). Very similar but not as important as NRI is the number of provided interfaces Ip . This is similarly a structural complexity measure that expresses the usage of a class by other entities in the system. State Machines. State machines are used to describe the behavior of classes of a system. They describe the actions and state changes based on a partitioning of the state space of the class. Therefore the associated state machine is also an indicator of the complexity of a class and hence its fault-proneness. State machines consist of states
Component-Based Development of Dependable Systems with UML
331
class C1 P1
P2
C1 NOP = 3 NRI = 2 NPI = 2
P3
Fig. 3. An example structured class with three parts and the corresponding metrics
and transitions where states can be hierarchical. Transitions carry event triggers, guard conditions, and actions. We use cyclomatic complexity [29] to measure the complexity of behavioral models represented as state machines because it fits most naturally to these models as well as to code. This makes the lifting of the concepts from code to model straightforward. The basic concept is to transfer the metric from the realization of the state machine in code to the graphical representation. To find the cyclomatic complexity of a state machine we build a control flow graph similar to the one for a program in [29]. This is a digraph that represents the flow of control in a piece of software. For source code, a vertex is added for each statement in the program and arcs if there is a change in control, e.g. an if- or while-statement. This can be adjusted to state machines by considering its code implementation. For a possible code transformation of state machines see [43]. An example of a state machine and its control flow graph is depicted in Fig. 4. At first we need an entry point as the first vertex. The second vertex starts the loop over the automata because we need to loop until the final state is reached or infinitely if there is no final state. The next vertices represent transitions, atomic expressions of guard conditions, and event triggers of transitions. A guard condition can consist of several boolean expressions that are connected by conjunctions and disjunctions. An atomic expression is an expression only using other logical operators such as equivalence. For a more thorough definition see [29]. These vertices have two outgoing arcs each because of the two possibilities of the control flow, i.e. an evaluation to true or false. Such a branching flow is always joined in an additional vertex. The last vertex goes back to the loop vertex from the start and the loop vertex has an additional arc to one vertex at the end that represents the end of the loop. This vertex finally has an arc to the last vertex, the exit point. If we have such a graph we can calculate the cyclomatic complexity using the formula v(G) = e − n + 2, where v is the complexity, G the control graph, e the number of arcs, and n the number of vertices (nodes). There is also an alternative formula, v(G) = p + 1, which can also be used, where p is the number of predicate nodes. Predicate nodes are vertices where the flow of control branches. Hierarchical states in state machines are not incorporated in the metric. Therefore the state machine must be transformed into an equivalent state machine with simple
332
Jan J¨urjens and Stefan Wagner
sm Example
S2 e2 [g1] / a1 e2 [g1] / a1
S3 S1 e3 [g2]
e4 e5 [g3 && g4] / a2
e5 [g3 && g4] / a2
S4
e3 [g2]
Number of Nodes: 34 Number of Edges: 46 v(G) = e − n + 2 v(G) = 46 − 34 + 2 Cyclomatic Complexity = 14
(a)
e4
(b)
Fig. 4. (a) A simple state machine with one hierarchical state, event trigger, guard conditions, and actions. (b) Its corresponding control flow graph. The black vertices are predicate nodes. On the right the transitions for the respective part of the flow graph are noted
states. This appears to be preferable to viewing sub-states as a kind of subroutines and keeping them out of the complexity calculation, because this would lose a considerable amount of information on the complexity. Furthermore internal transitions are counted equally to normal transitions. Pseudo states are not counted themselves, but their triggers and guard conditions. Cyclomatic Complexity of State Machine (CCS). Having explained the concepts based on the example flow graph above, the metric can be calculated directly from the state machine with a simplified complexity calculation. We count the atomic expressions and event triggers for each transition. Furthermore we need to add 1 for each transition because we have the implicit condition that the corresponding source state is active. This results in the formula CCS = |T | + |E| + |AG | + 2, where T is the multi-set of transitions, E is the multi-set of event triggers, and AG is the multi-set of atomic expressions in the guard conditions. This formula yields exactly the same results as the longer version above but has the advantage that it is easier to calculate. For this metric we have to consider two abstraction layers. First, we transform the state machine into its code representation and afterwards use the control flow graph of the code representation to measure structural complexity. Note that this is done only for measuring purposes; our approach also applies if the actual implementation is not automatically generated from the UML model but manually implemented. The first “ab-
Component-Based Development of Dependable Systems with UML
333
straction” is needed to establish the relationship to the corresponding code complexity. The code complexity is a good indicator of the fault-proneness of a program. The proposition is that the state machine reflects the major complexity attributes of the code that implements it. The next abstraction to the control flow graph was established in [29]. In [17] the correlation of metrics of design specifications and code metrics was analyzed. One of the main results was that the code metrics such as the cyclomatic complexity are strongly dependent on the level of refinement of the specification, i.e. the metric as a lower value the more the specification is abstract. This also holds for the CCS metric. Models of software can be based on various different abstractions, such as functional or temporal abstractions [37]. Depending on the abstractions chosen for the model, various aspects may be omitted, which may have an effect on the metric. Therefore, it is prudent to consider a suite of metrics rather than a single metric when measuring design complexity to assess fault-proneness of system components. In addition to the metrics which we defined above, we will now complete our metrics suite by adding two existing metrics from the literature. Metrics Suite. Three of the metrics from [7] can be adjusted to be applicable to UML models. The metrics chosen are the ones that were found to be good indicators of faultprone classes in [2]. However, we omit Response For a Class (RFC) and Coupling Between Objects (CBO) because they cannot be determined on the model level. The remaining two metrics together with the new ones developed above form our metrics suite in Tab. 4. We now describe these two adapted metrics. Depth of Inheritance Tree (DIT). This is the maximum depth of the inheritance graph T to a class c. This can be determined in any class diagram that includes inheritance. Number of Children (NOC). This is the number of direct descendants Cd in the inheritance graph. This can again be counted in a class diagram. Table 4. A summary of the metrics suite with its calculation Name Depth of Inheritance Tree Number of Children Number of Parts Number of Required Interfaces Number of Provided Interfaces Cyclomatic Complexity of State machine
Abbr. DIT NOC NOP NRI NPI CCS
Calculation max (depth(T, c)) |Cd | |Cp | |Ir | |Ip | |T | + |E| + |AG | + 2
We consider whether our metrics are structural complexity measure by the definition in [30]. The definition says that for a set D of documents with a pre-order ≤D and the usual ordering ≤R on the real numbers R, a structural complexity measure is an order preserving function m : (D, ≤D ) −→ (R, ≤R ). Each metric from the suite fulfills this definition with respect to a suitable pre-order on the relevant set of documents. The document set D under consideration is depending on the metric: either a class diagram
334
Jan J¨urjens and Stefan Wagner
that shows inheritance and possibly interfaces, a composite structure diagram showing parts and possibly interfaces, or a state machine diagram. All the metrics use specific model elements in these diagrams as a measure. Therefore there is a pre-order ≤D between the documents of each type based on the metrics: We define d1 ≤D d2 for two diagrams d1 , d2 in D if d1 has fewer of the model elements specific to the metric under consideration than d2 . The mapping function m maps a diagram to its metric, which is the number of these elements. Hence m is order preserving and the metrics in the suite qualify as structural complexity measures. As mentioned before, complexity metrics are good predictors for the reliability of components [25, 32]. Furthermore the experiments in [2] show that most metrics from [7] are good estimators of fault-proneness. We adopted DIT and NOC from these metrics unchanged, therefore this relationship still holds. The cyclomatic complexity is also a good indicator for reliability [25] and this concept is used for CCS to be able to keep this relationship. The remaining three metrics were modeled similarly to existing metrics. NOP resembles NOC, NRI and NPI are similar to CBO. NOC and CBO are estimators for fault-proneness, therefore it is expected that the new metrics behave accordingly. This metrics suite can now be used to determine the most fault-prone classes and components in a system. However, different metrics are important for different components. Therefore one cannot just take the sum over all metrics to find the most critical component. Some component models may have an associated state machine, others not. This makes the sum meaningless. We propose to use the metrics so that we compute the metric values for each component and class and consider the ones that have the highest measures for each single metric. This way we can for example determine the components with complex behavior, or high fan-in and fan-out.
3.2 Failure Proneness We pointed out already that the fault-proneness of a component does not directly imply low reliability because a high number of faults does not mean that there is a high number of failures [45, 46]. However a direct reliability measurement is in general not possible on the model level. Nevertheless we can get close by analyzing the failure-proneness of a component, i.e. the probability that a fault leads to a failure that occurs during software execution. It is not possible to express the probability of failures with exact figures based on the design models. We propose therefore to use more coarse-grained failure levels, e.g. LF = {high, medium, low }, where LF is the set of failure levels. This allows an abstract assessment of the failure probability. It is still not reliability as generally defined but the best estimate that we can get in early phases. To determine the failure level of a component we use the metrics suite from above to define complexity levels LC = {high, low }. We assign each component such a complexity level by looking at the extreme values in the metrics results. Each component that exhibits a high value in at least one of the metrics is considered to have the complexity level high, all other components have the level low. It depends on the actual distribution of values to determine what is to be considered a high value.
Component-Based Development of Dependable Systems with UML
335
Having assigned these complexity levels to the components, we know which components are highly fault-prone. The operational profile [33] is a description of the usage of the system, showing which functions are mostly used. We use this information to assign usage levels LU to the components. This can be of various granularity. An example would be LU = {high, medium, low }. When we know the usage of each component we can analyze the probability that the faults in the component lead to a failure. The combination of complexity level and usage level leads us to the failure level LF of the component. It expresses the probability that the component fails during software execution. We describe the mapping of the complexity level and usage level to the failure level with the function fp: fp = LC × LU −→ LF , where LF = LU ∪ {low} What the function does is simply to map all components with a high complexity level to its usage level and all component with a low complexity level to low. y if x = high fp(x, y) = low otherwise This means that a component with high fault-proneness has a failure probability that depends on its usage and a component with low fault-proneness has generally a low failure probability. Having these failure levels for each component we can use that information to guide the verification efforts in the project, e.g. assign the most amount of inspection and testing on the components with a high failure level. 3.3 Checking of Reliability Requirements In this section, we sketch exemplarily how to use the information on fault-proneness of components obtained using the metrics in the previous section in combination with the checks of reliability requirements considered in Sect. 2. More specifically, we explain how to relate the levels derived from an UML model and the operational profile to reliability requirements formulated in the UML diagram using UML stereotypes. Critical. As defined in Sect. 2, the stereotype critical labels classes whose instances are critical in some way, as specified by the associated tags {level} for each reliability level level ∈ Levels. The intention is now that for the failure level f ∈ LF defined in Sect. 3.2 for any reliability level l ∈ Levels, and for any component C stereotyped with this level l, if the levels l and f are contradictory, C should be more closely inspected for possible flaws (for example, using a formal verification, which in general would be too costly to apply to the whole system). Contradictory means here a failure level and a reliability level that are not compatible, e.g. a high failure level and a high reliability level. Containment. We use the stereotype containment of subsystems defined in Sect. 2 to detect system parts with a high failure level which may influence data values that are supposed to be highly reliable These system parts can then be inspected more thoroughly for possible flaws.
336
4
Jan J¨urjens and Stefan Wagner
Tool Support for Model-Based Reliability and Safety Analysis
To support our approach, we developed automated tools for the formal verification of UML models for the constraints associated with the stereotypes introduced in the Secs. 2 and 3. We describe a framework that incorporates several formal verification tools (including the model-checker Spin and the automated theorem prover e-SETHEO). Functionality. There are three consecutive stages in implementing full verification functionality for the formalized UML models. There exist verification tools for all stages. The framework is, however, designed to be extensible, so new analysis plugins can be added easily. – Static features. Checkers for static features (for example, a type-checking like enforcement of safety levels in class and deployment diagrams) have been implemented directly. – Simple dynamic features. Checks of UML models of a bounded size for simple dynamic properties (for example, that a deterministic Machine without interaction with the environment does not reach a certain critical state) can still be directly implemented. – Complex dynamic features. Checks for complicated behavioral properties or of large, or highly non-deterministic or interactive UML models require the use of sophisticated tools (such as model-checkers). This is implemented by translating the required UML constructs into the model-checker input language (for example, a Temporal Logic formula). To be able to apply sophisticated tools (such as model checkers) to compute a metric, one needs a front-end which automatically produces a semantic model and includes the relevant formalized safety requirements, when given a UML model. This avoids requiring the software developers themselves to perform this formalization, which usually needs a high level of specialized training in formal methods. UML supports this approach by offering predefined safety primitives (such as safety requirements or mechanisms) with a strictly defined semantics, which can be applied by a developer without an extensive training in safety-critical systems by simply including the relevant stereotypes in the UML model. These primitives are translated into the targeted formal language, protecting from potential errors in manual formalization of the safety properties. Since safety requirements are usually defined relative to failure model, to analyze whether the UML specification fulfills a safety requirement, the tool-support has to automatically include the failure model arising from the physical view contained in the UML specification. We briefly describe the functionality of the UML tool that meets the listed requirements. The developer creates a model and stores it in the UML 1.5 / XMI 1.2 file format. The framework will be updated to UML 2.0 as soon as the official DTDs will be available. The file is imported by the tool into the internal MDR repository. The tool accesses the model through the JMI interfaces generated by the MDR library. The checker parses the model and checks the constraints associated with the stereotype. The results are delivered as a text report for the developer describing found problems, and
Component-Based Development of Dependable Systems with UML
337
a modified UML model, where the stereotypes whose constraints are violated are highlighted. The tool can be executed as a console application, as a web-application, or a GUI application.
5
Case-Study: Automatic Collision Notification
In this part of the chapter, we validate our proposed safety and reliability analyzes in a case study of an automatic collision notification system as used in cars to provide automatic emergency calls. First, the system is described and designed using the UML extension that we made in the previous sections, then we analyze the model and present the results. Description. This case study that we used to validate our results was done in cooperation with the automotive manufacturer BMW. There is a similar project currently in development. The problem to be solved is that many accidents of automobiles involve only a single vehicle. Therefore it is possible that no or only a delayed emergency call is made. The chances for the casualties is significantly higher if an accurate call is made quickly. This has lead to the development of so called Automatic Collision Notification (ACN) systems, sometimes also called mayday systems. They automatically notify an emergency call response center when a crash occurs but also manual notification using the location data from a GPS device can be made. We used the public specification from the Enterprise program [9, 10] as a basis for the design model because the work together with BMW is in an early stage. In this case study, we will concentrate on the built-in device of the car and ignore the obviously necessary infrastructure such as the call center. Device Design. Following [9] we will call the built-in device MaydayDevice and divide it into five components. The architecture is illustrated in Fig. 5 using a composite structure diagram of the device. The device is a processing unit that is built into the vehicle and has the ability to communicate with an emergency call center using a mobile telephone connection and retrieving position data using a GPS device. The components that constitute the mayday device are: – ProcessorModule. This is the central component of the device. It controls the other components, retrieves data from them and stores it if necessary. – AutomaticNotification This component is responsible for notifying a serious crash to the processor module. It gets notified itself if an airbag is activated. – LocationModule. The processor module request the current position data from the location module, that gathers the data from a GPS device. – CommunicationsModule. The communications module is called from the processor module to send the location information to an emergency call center. It uses a mobile communications device and is responsible for automatic retry if a connection did fail. – ButtonBox. This is finally the user interface that can be used to manually initiate an emergency call. It also controls a display that provides feedback to the user.
338
Jan J¨urjens and Stefan Wagner
class MaydayDevice ButtonBox
AutomaticNotification
<> {low}
<> {high}
ProcessorModule <> {high}
LocationModule
CommunicationsModule
<> {high}
<> {high}
Fig. 5. The composite structure diagram of the mayday device
These components are again shown in Fig. 6 in a class diagram showing the attributes and methods of each. It also shows that we have exactly one instance of each class in the system. Furthermore we used some tagged values based on Sect. 2 to describe safety requirements on some data values. The central ProcessorModule has the annotated requirements that the method getGpsData from the LocationModule delivers its data in real time and correct, and that the data of the call is transfered correct by the
ButtonBox
AutomaticNotification
− redLED : boolean − greenLED : boolean
− sent : boolean + acknowledge() + airbagTriggered()
+ startButtonPressed() + testButtonPressed() + cancelButtonPressed() + lightLED(in color : Color)
CommunicationsModule {correct = makeCall(callData)}
ProcessorModule LocationModule {realtime = getGpsData(location), correct = getGpsData(location)} − gpsData : byte[] + newGpsData(in data : byte[]) + getGpsData(out data : byte[])
{realtime = getGpsData(location), correct = getGpsData(location), correct = makeCall(callData)} − location : byte[] − testNumber : int − emergencyNumber : int − callData : byte[] + startCall() + cancelCall() + testCall() + notify() + failure() + success()
Fig. 6. The class diagram of the parts
− retries : int = 0 − callData : byte[] − finished : boolean − connection : boolean + makeCall(in callData : byte[]) + send Data() + lineFree() + lineBusy() + connected() + noConnection() + connectionAborted() + done() + cancel()
Component-Based Development of Dependable Systems with UML
339
method makeCall of CommunicationsModule. LocationModule and CommunicationsModule have the corresponding annotations, therefore are these requirements consistent, as defined in Sect. 2.2. Each of the components of the mayday device has an associated state machine to describe its behavior. We do not show the state machines because of space reasons but they can be found in [47]. Results. The five subcomponents of MaydayDevice are further analyzed in the following. At first we used our metrics suite from Sec. 3 to gather data about the model. The results can be found in Tab. 5. It shows that we have no inheritance in the current abstraction level of our model and also that the considered classes have no parts. Therefore the metrics regarding these aspects are not helpful for this analysis. Table 5. The results of the metrics suite for the components of MaydayDevice Class DIT NOC NOP NRI NPI CCS ProcessorModule 0 0 0 4 4 16 AutomaticNotification 0 0 0 2 1 4 LocationModule 0 0 0 1 2 4 CommunicationsModule 0 0 0 2 2 32 ButtonBox 0 0 0 2 2 8
More interesting are the metrics for the provided and required interfaces and their associated state machines. The class with the highest values for NRI and NPI is ProcessorModule. This shows that it has a high fan-in and fan-out and is therefore fault-prone. The same module has a high value for CCS but CommunicationsModule has a higher one and is also fault-prone. Therefore we assign the complexity level high to these two components, the other have the level low. The documentation in [10] shows that the main failures that occurred were failures in connecting to the call center (even when cellular strength was good), no voice connect to the call center, inability to clear the system after usage, and failures of the cancel function. These main failures can be attributed to the component ProcessorModule that is responsible for controlling the other components and CommunicationsModule that is responsible for the wireless communication. Therefore our reliability analysis labeled the correct components with a high failure level.
6
Case Study: MOST Network Master
We further validated our approach on the basis of the project results of an evaluation of model-based testing [38]. A network controller of an infotainment bus in the automotive R domain, the MOST Network Master [31], was modeled with the case tool AutoFocus [18] and test cases were generated from that model and compared with traditional tests. We use all found faults from all test suites in the following but as we have mainly fault
340
Jan J¨urjens and Stefan Wagner
information, we concentrate on fault-proneness rather than failure-proneness. AutoFocus is quite similar to UML 2.0 and therefore the conversion was straight-forward. The composite structure diagram of the Network Master is shown in Figure 7.
class NetworkMaster Divide
MonitoringMgr
RegistryMgr RequestMgr
Merge
Fig. 7. The composite structure diagram of the MOST Network Master
We omit further parts of the design, especially the associated state machines, because of space and confidentiality reasons. The corresponding metrics are summarized in Table 6. Table 6. The results of the metrics suite for the NetworkMaster Class DIT NOC NOP NRI NPI CCS NetworkMaster 0 0 5 4 4 0 Divide 0 0 0 1 3 11 Merge 0 0 0 3 1 8 MonitoringMgr 0 0 0 2 1 0 RequestMgr 0 0 0 2 1 14 RegistryMgr 0 0 0 4 7 197
The data from the table shows that the RegistryMgr has the highest complexity in most of the metrics. Therefore we classify it as being highly fault-prone. As described in [38] several test suites were executed against an implementation of the Network Master. 24 faults have been identified by the test activities. 21 of which can be attributed to the RegistryMgr, 3 to the RequestMgr. There were no revealed faults in the other components. Hence, the high fault-proneness of the RegistryMgr did indeed result in a high number of faults revealed during testing.
Component-Based Development of Dependable Systems with UML
7
341
Related Work
In the related area of real-time systems there has been a substantial amount of work regarding the usage of UML. For example, [44] describes constructs to facilitate the design of software architectures in this domain which are specified using UML. [22–24] contain several approaches to developing systems with various criticality requirements using UML. In particular, [13] discusses a pattern-based approach for using UML use cases for safety-critical systems. The focus is on the development of a testing strategy rather than model analysis. [36] discusses methods and tools for the checking of UML state chart specifications of embedded controllers. The focus is on the use of statecharts and on efficient methods for automated checking and does not include the use of other UML diagrams or the inclusion of safety requirements using stereotypes. Also relevant is the work towards a formal semantics of UML (see the proceedings of related conferences, including the UML and FASE conferences). [27] proposes the automated generation of fault trees based on the source code of software which may be combined with fault trees based on the electronic circuit design of the hardware, allowing the software and the hardware fault trees to be composed into a fault tree of the system. It presents a prototype of a fault tree generation tool that is capable to generate fault trees based on C++ code. [5] presents an integrated tool environment where automatic transformations of UML models can capture dependability requirements. The proposed metrics suite as a basis to find fault-prone components can also be found in [47]. Lano et al. [26] propose a method to analyze object-oriented models in terms of safety and security but not considering complexity directly. In [48] an approach is proposed that includes a reliability model that is based on the software architecture. A complexity metric that is in principle applicable to models as well as to code is discussed in [6], but it also only involves static structure. Another approach related to safety-critical systems is proposed in [20]. It annotates UML models with safety-related information for further analysis. In [4] the cyclomatic complexity is suggested for most aspects of a design metric. Safety checklists have been proposed for example in [15]. [14] uses Z and Petri nets for modeling safety-critical systems.
8
Conclusions
In this chapter we propose means to incorporate reliability requirements into UML models. It is achieved using the UML profile mechanisms of stereotypes and tagged values. This makes these important requirements visible in the model, helps to encapsulate knowledge of reliability mechanisms, and simplifies its use by non-experts. Furthermore by formalizing the requirements, checks can be done. Having annotated a model with the defined stereotypes and tagged values, one can check the consistency of the requirements through-out the model. This lends itself to tool support for automatic checking. We describe a framework in which several of such checks are implemented. To provide a reasonable basis for the reliability analysis of a system, we also present a metrics suite for UML models based on the work of [7] to measure the structural
342
Jan J¨urjens and Stefan Wagner
complexity of the models. Specifically, we use the numbers of provided and required interfaces as a metric for fan-in and fan-out, and lift the cyclomatic complexity [29] to the machine level to measure the complexity of state machines. The suite is then used to find fault-prone components in a system. Fault-proneness, i.e. the probability of containing faults, is not a good measure for reliability because it does not take into account the probability of the faults of leading to a failure. Therefore the operational profile [33, 34] is used to estimate the usage of a component and the combination of fault-proneness and usage yields the failureproneness of the component. This information can finally be used to check consistency with earlier defined reliability requirements in the model and to improve the efficiency of verification efforts. We finish our chapter with two case studies. One of these describes an Automatic Collision Notification system for automobiles that sends automatically an emergency call in case of a crash. It shows that the metrics suite, especially in combination with an operational profile, indeed can identify failure-prone components. Furthermore, the case study illustrates the interplay of the metrics suite with the annotation with stereotypes. The second case study investigates only the capabilities of the metrics suite. It confirms that the suite helps to identify fault-prone components.
Acknowledgments We gratefully acknowledge the joint work with Martin Baumgartner, Christian K¨uhnel, Alexander Pretschner, Wolfgang Prenninger, Bernd Sostawa, and R¨udiger Z¨olch on the MOST Network Master. Furthermore we are grateful to Manfred Broy and Wolfgang Prenninger for commenting on a draft version. This work was partly sponsored by the DFG within the project InTime and the German Ministry for Science and Education within the Verisoft project.
References 1. C. Atkinson, C. Bunse, and J. W¨ust. Driving component-based software development through quality modelling. In A. Cechich, M. Piattini, and A. Vallecillo, editors, ComponentBased Software Quality, volume 2693 of LNCS, pages 207 – 224. Springer, 2003. 2. V.R. Basili, L.C. Briand, and W.L. Melo. A Validation of Object-Oriented Design Metrics as Quality Indicators. IEEE Trans. Software Eng., 22(10):751–761, 1996. 3. R. Bharadwaj and C. Heitmeyer. Developing high assurance avionics systems with the SCR requirements method. In 19th Digital Avionics Systems Conference, 2000. 4. J.K. Blundell, M.L. Hines, and J. Stach. The Measurement of Software Design Quality. Annals of Software Engineering, 4:235–255, 1997. 5. A. Bondavalli, M. Dal Cin, D. Latella, I. Majzik, A. Pataricza, and G. Savoia. Dependability analysis in the early phases of UML based system design. Journal of Computer Systems Science and Engineering, 16:265–275, 2001. 6. D.N. Card and W.W. Agresti. Measuring Software Design Complexity. The Journal of Systems and Software, 8:185–197, 1988. 7. S.R. Chidamber and C.F. Kemerer. A Metrics Suite for Object Oriented Design. IEEE Trans. Software Eng., 20(6):476–493, 1994.
Component-Based Development of Dependable Systems with UML
343
8. B. Dutertre and V. Stavridou. A model of noninterference for integrating mixed-criticality software components. In DCCA, San Jose, CA, January 1999. 9. Mayday: System Specifications. The ENTERPRISE Program, 1997. Available at http://enterprise.prog.org/completed/ftp/mayday-spe.pdf (October 2004). 10. Colorado Mayday Final Report. The ENTERPRISE Program, 1998. Available at http://enterprise.prog.org/completed/ftp/maydayreport.pdf (October 2004). 11. N.E. Fenton and S.L. Pfleeger. Software Metrics. A Rigorous & Practical Approach. International Thomson Publishing, 2nd edition, 1997. 12. M.H. Halstead. Elements of Software Science. Elsevier North-Holland, 1977. 13. K. Hansen and I. Gullesen. Utilizing UML and patterns for safety critical systems. In J¨urjens et al. [22], pages 147–154. 14. M. Heiner and M. Heisel. Modeling safety-critical systems with Z and Petri Nets. In M. Felici, K. Kanoun, and A. Pasquini, editors, 18th International Conference on Computer Safety, Reliability and Security (SAFECOMP’99), volume 1698, pages 361–374, 1999. 15. C. Heitmeyer, R. Jeffords, and B. Labaw. Automated consistency checking of requirements specifications. ACM Trans. on Software Eng. and Methodology, 5(3):231–261, July 1996. 16. S. Henry and D. Kafura. Software Structure Metrics Based on Information Flow. IEEE Trans. Software Engineering, 7:510–518, 1981. 17. S. Henry and C. Selig. Predicting Source-Code Complexity at the Design Stage. IEEE Software, 7:36–44, 1990. 18. F. Huber, B. Sch¨atz, A. Schmidt, and K. Spies. AutoFocus: A tool for distributed systems specification. In B. Jonsson and J. Parrow, editors, Formal Techniques in Real-Time and Fault-Tolerant Systems, 4th International Symposium, FTRTFT’96, volume 1135 of LNCS, pages 467–470, Uppsala, Sweden, Sept. 9–13 1996. Springer. 19. J. J¨urjens. Critical systems development with UML and model-based testing. In The 22st International Conference on Computer Safety, Reliability and Security (SAFECOMP 2003), Edinburgh, Sept. 23-26 2003. Full-day tutorial. 20. J. J¨urjens. Developing safety-critical systems with UML. In P. Stevens, editor, UML 2003 – The Unified Modeling Language, volume 2863 of LNCS, pages 360–372, San Francisco, CA, October 20–24, 2003. Springer. 21. J. J¨urjens. Secure Systems Development with UML. Springer, 2004. 22. J. J¨urjens, V. Cengarle, E.B. Fernandez, B. Rumpe, and R. Sandner, editors. Critical Systems Development with UML, number TUM-I0208 in TU M¨unchen Technical Report, 2002. UML’02 satellite workshop proceedings. 23. J. J¨urjens, B. Rumpe, R. France, and E.B. Fernandez, editors. Critical Systems Development with UML, number TUM-I0317 in TU M¨unchen Technical Report, 2003. UML’03 satellite workshop proceedings. 24. J. J¨urjens, B. Rumpe, R. France, and E.B. Fernandez, editors. Third International Workshop on Critical Systems Development with UML, TU M¨unchen Technical Report, 2004. UML’04 satellite workshop proceedings. 25. T.M. Khoshgoftaar and T.G. Woodcock. Predicting Software Development Errors Using Software Complexity Metrics. IEEE Journal on Selected Areas in Communications, 8(2):253–261, 1990. 26. K. Lano, D. Clark, and K. Androutsopoulos. Safety and Security Analysis of Object-Oriented Models. In SAFECOMP 2002, volume 2434 of LNCS, pages 82–93. Springer, 2002. 27. P. Liggesmeyer and O. Maeckel. Quantifying the reliability of embedded systems by automated analysis. In 2001 International Conference on Dependable Systems and Networks (DSN 2001), pages 89–96. IEEE Computer Society, 2001. 28. T. Mayer and T. Hall. A Critical Analysis of Current OO Design Metrics. Software Quality Journal, 8:97–110, 1999.
344
Jan J¨urjens and Stefan Wagner
29. T.J. McCabe. A Complexity Measure. IEEE Trans. Software Engineering, 5:45–50, 1976. 30. A. Melton, D. Gustafson, J. Bieman, and A. Baker. A Mathematical Perspective for Software Measures Research. IEE/BCS Software Engineering Journal, 5:246–254, 1990. 31. MOST Cooperation. MOST Media Oriented System Transport—Multimedia and Control Networking Technology. MOST Specification Rev. 2.3. August 2004. 32. J.C. Munson and T.M. Khoshgoftaar. Software Metrics for Reliability Assessment. In Michael R. Lyu, editor, Handbook of Software Reliability Engineering, chapter 12. IEEE Computer Society Press and McGraw-Hill, 1996. 33. J.D. Musa. Software Reliability Engineering. McGraw-Hill, 1999. 34. J.D. Musa, A. Iannino, and K. Okumoto. Software Reliability: Measurement, Prediction, Application. McGraw-Hill, 1987. 35. Object Management Group. UML 2.0 Superstructure Final Adopted specification, August 2003. OMG Document ptc/03-08-02. 36. Z. Pap, I. Majzik, and A. Pataricza. Checking general safety criteria on UML statecharts. In U. Voges, editor, SAFECOMP 2001, volume 2187 of LNCS, pages 46–55. Springer, 2001. 37. W. Prenninger and A. Pretschner. Abstractions for Model-Based Testing. In M. Pezze, editor, Proc. Test and Analysis of Component-based Systems (TACoS’04), 2004. 38. A. Pretschner, W. Prenninger, S. Wagner, C. K¨uhnel, M. Baumgartner, B. Sostawa, R. Z¨olch, and T. Stauner. One Evaluation of Model-Based Testing and its Automation. In Proc. 27th International Conference on Software Engineering (ICSE), 2005. To appear. 39. F. Randimbivololona. Orientations in verification engineering of avionics software. In R. Wilhelm, editor, Informatics – 10 Years Back, 10 Years Ahead, LNCS, pages 131–137. Springer, 2000. 40. L. Rosenberg, T. Hammer, and J. Shaw. Software Metrics and Reliability. In Proc. 9th International Symposium on Software Reliability Engineering (ISSRE’98). IEEE, 1998. 41. J. Rushby. Critical system properties: Survey and taxonomy. Reliability Engineering and System Safety, 43(2):189–219, 1994. 42. B. Selic. Physical programming: Beyond mere logic. In A. Sangiovanni-Vincentelli and J. Sifakis, editors, Embedded Software Second International Conference (EMSOFT 2002), volume 2491 of LNCS, pages 399–406, 2002. 43. B. Selic, G. Gullekson, and P.T. Ward. Real-Time Object-Oriented Modeling. John Wiley & Sons, 1994. 44. B. Selic and J. Rumbaugh. Using UML for modeling complex real-time systems. Available at http://www-106.ibm.com/developerworks/rational/library/, 1998. 45. S. Wagner. Efficiency Analysis of Defect-Detection Techniques. Technical Report TUMI0413, Institut f¨ur Informatik, Technische Universit¨at M¨unchen, 2004. 46. S. Wagner. Reliability Efficiency of Defect-Detection Techniques: A Field Study. In Suppl. Proc. 15th IEEE International Symposium on Software Reliability Engineering (ISSRE’04), 2004. 47. S. Wagner and J. J¨urjens. Model-Based Identification of Fault-Prone Components. Draft. 48. W.-L. Wang, Y. Wu, and M.-H. Chen. An Architecture-Based Software Reliability Model. In Proc. Pacific Rim International Symposium on Dependable Computing (PRDC’99), pages 143–150, 1999.
Author Index
Atkinson, Colin
1
Lukkien, Johan J.
Berbers, Yolande 209 Bunse, Christian 1
164
Chaudron, Michel R.V. Cockburn, J. 296 Crnkovic, Ivica 232
164
Mahieu, Tom 185 Maydl, Walter 82 Mayer, Nikolas 107 McNair, A. 296 Michiels, Sam 185 Muskens, Johan 164
da Silva, Leandro Dias de Souza, P. 296 Desmet, Lieven 185 Dietrich, Christian 8
35
Nadjm-Tehrani, Simin
Furber, R.A.
Paredes Riano, Javier 107 Peper, Christian 1 Perkusich, Angelo 35 Purhonen, Anu 275
296
Gross, Hans-Gerhard 1, 107 Grunske, Lars 82, 249 Halang, Wolfgang A. Hansson, Jörgen 59 Jahnke, J.H. 296 Janssens, Nico 185 Joosen, Wouter 185 Jürjens, Jan 320 Kaiser, Bernhard 249 Kircher, Michael 143 Lavender, M. 296 Lu, Shourong 123 Lüders, Frank 232
59
8, 123
Reussner, Ralf H. 249 Rigole, Peter 209 Runeson, Per 232 Salzmann, Christian Tešanović, Aleksandra
143 59
Van Baelen, Stefan 209 Vandewoude, Yves 209 Verbaeten, Pierre 185 Voelter, Markus 143 Wagner, Stefan Zhang, Wei
8
320