BIOLOGICAL PETRI NETS
Studies in Health Technology and Informatics This book series was started in 1990 to promote research conducted under the auspices of the EC programmes’ Advanced Informatics in Medicine (AIM) and Biomedical and Health Research (BHR) bioengineering branch. A driving aspect of international health informatics is that telecommunication technology, rehabilitative technology, intelligent home technology and many other components are moving together and form one integrated world of information and communication media. The complete series has been accepted in Medline. Volumes from 2005 onwards are available online. Series Editors: Dr. O. Bodenreider, Dr. J.P. Christensen, Prof. G. de Moor, Prof. A. Famili, Dr. U. Fors, Prof. A. Hasman, Prof. E.J.S. Hovenga, Prof. L. Hunter, Dr. I. Iakovidis, Dr. Z. Kolitsi, Mr. O. Le Dour, Dr. A. Lymberis, Prof. J. Mantas, Prof. M.A. Musen, Prof. P.F. Niederer, Prof. A. Pedotti, Prof. O. Rienhoff, Prof. F.H. Roger France, Dr. N. Rossing, Prof. N. Saranummi, Dr. E.R. Siegel, Prof. T. Solomonides and Dr. P. Wilson
Volume 162 Recently published in this series Vol. 161. A.C. Smith and A.J. Maeder (Eds.), Global Telehealth – Selected Papers from Global Telehealth 2010 (GT2010) – 15th International Conference of the International Society for Telemedicine and eHealth and 1st National Conference of the Australasian Telehealth Society Vol. 160. C. Safran, S. Reti and H.F. Marin (Eds.), MEDINFO 2010 – Proceedings of the 13th World Congress on Medical Informatics Vol. 159. T. Solomonides, I. Blanquer, V. Breton, T. Glatard and Y. Legré (Eds.), Healthgrid Applications and Core Technologies – Proceedings of HealthGrid 2010 Vol. 158. C.-E. Aubin, I.A.F. Stokes, H. Labelle and A. Moreau (Eds.), Research into Spinal Deformities 7 Vol. 157. C. Nøhr and J. Aarts (Eds.), Information Technology in Health Care: Socio-Technical Approaches 2010 – From Safe Systems to Patient Safety Vol. 156. L. Bos, B. Blobel, S. Benton and D. Carroll (Eds.), Medical and Care Compunetics 6 Vol. 155. B. Blobel, E.Þ. Hvannberg and V. Gunnarsdóttir (Eds.), Seamless Care – Safe Care – The Challenges of Interoperability and Patient Safety in Health Care – Proceedings of the EFMI Special Topic Conference, June 2–4, 2010, Reykjavik, Iceland Vol. 154. B.K. Wiederhold, G. Riva and S.I. Kim (Eds.), Annual Review of Cybertherapy and Telemedicine 2010 – Advanced Technologies in Behavioral, Social and Neurosciences Vol. 153. W.B. Rouse and D.A. Cortese (Eds.), Engineering the System of Healthcare Delivery Vol. 152. T.C. Lee and P.F. Niederer (Eds.), Basic Engineering for Medics and Biologists – An ESEM Primer Vol. 151. E.J.S. Hovenga, M.R. Kidd, S. Garde and C. Hullin Lucay Cossio (Eds.), Health Informatics – An Overview
ISSN 0926-9630 (print) ISSN 1879-8365 (online)
Biological Petri Nets
Edited by
Edgar Wingender University of Göttingen, Germany
Amsterdam • Berlin • Tokyo • Washington, DC
© 1998–2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior written permission from the publisher. This book comprises a compilation of papers which have previously been published in the journal In Silico Biology. ISBN 978-1-60750-703-1 (print) ISBN 978-1-60750-704-8 (online) Library of Congress Control Number: 2011921367 Publisher IOS Press BV Nieuwe Hemweg 6B 1013 BG Amsterdam Netherlands fax: +31 20 687 0019 e-mail:
[email protected] Distributor in the USA and Canada IOS Press, Inc. 4502 Rachael Manor Drive Fairfax, VA 22032 USA fax: +1 703 323 3668 e-mail:
[email protected]
LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS
Dedicated to Prof. Dr. Carl Adam Petri
12.07.1926 – 2.07.2010
This page intentionally left blank
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2010, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved.
vii
Preface
Petri Net Applications in Molecular Biology Edgar Wingender Department of Bioinformatics, University Medical Center G¨ottingen, G¨ottingen, Germany E-mail:
[email protected]
As “classical” bioinformatics developed further towards modern systems biology, the idea of a holistic view of a biological system was not completely new: the aim to provide a comprehensive picture of genes and the regulatory features encoded in a genome had been inherent in bioinformatics research from the very beginning, as was the attempt to come up with an integrative view across the different levels of organisation, which was at least conceptually implicit in the numerous approaches to integrating the rapidly growing information about biological objects into comprehensive knowledge bases. However, transcendence of the research focus on static objects and progress towards the computer-aided investigation of biological processes was significantly advanced by the emerging field of systems biology. The new paradigm for formally representing the processes that make up a biological system is now the “network”. The term “process” itself implies dynamic events, changes, that we may wish to simulate with the aid of a computer in order to predict the behavior of a biological system under certain circumstances. Biochemistry provides the formal instruments to do so for defined (bio)chemical reactions, usually resulting in a set of ordinary differential equations (ODEs). Solving the large number of ODEs that are required to describe the behavior of a complex biological system exactly may be cumbersome, but computationally feasible as soon as we have at hand all the necessary parameters, such as the corresponding kinetic constants for all reactions involved. Even in those cases where these kinetics have been studied in vitro, it is still questionable whether the insights gained from these experiments are applicable to specific in vivo conditions. Nevertheless, this approach has been proven to work for (parts of) the metabolic network of living cells, but regulatory events that depend on a very low number of individual molecules per cell may require different approaches. Moreover, applying ODEs to a large complex system may be mere overkill, and, presumably, a less exact approach might be of even more appropriate granularity, at least for the larger part of the network under consideration. Several years ago, Petri nets were suggested as being well suited to modeling metabolic networks, overcoming some of the limitations outlined above [Reddy et al., 1993]. Since then, a great deal of further conceptual work, technical tool implementations and applications to biological problems have been reported which have demonstrated the usefulness of this concept for what we know today as systems biology. Being intuitively understandable to scientists trained in the life sciences, they also have a robust mathematical foundation and provide the required flexibility with regard to the models’ granularity. As a result, Petri net technology appears to be a very promising approach to modeling biological systems.
viii
Preface
A significant part of the progress in this field has been published by In Silico Biology since it began in 1998. Four articles constituted the first Petri Net Special in 2003 (“Petri Nets for Metabolic Networks”; http://www.bioinfo.de/isb/toc vol 03.html#Petri nets). R. Hofest¨adt’s introduction to that Special Issue summarized the most essential topics as well as some of the basic requirements and constraints relevant for applying Petri net technology, and it therefore follows this preface [Hofest¨adt, 2003]. The early publication of Hofest¨adt and Thelen, 1998, demonstrated how the Petri net concept can be extended to a more quantitative modeling of metabolic networks, and also expanded these ideas to gene regulatory and cell-cell communication processes. In a subsequent work, Chen and Hofest¨adt were able to demonstrate how the Petri net approach can deal with an integrated process comprising gene regulatory and metabolic events, making use of the concept of hybrid Petri nets (HPN) [Chen and Hofest¨adt, 2003]. Having analyzed the requirements of the biochemical particularities of metabolic networks, Zevedei-Oancea and Schuster, 2003, studied the resulting topological properties of the corresponding Petri net models. Takai-Igarashi suggested a consistent definition of the Petri net units that are required for modeling signal transduction pathways; this was based on a specific ontology, the Cell Signaling Networks Ontology (CSNO) [Takai-Igarashi, 2005]. Voss et al., 2003, complemented these efforts by studying the Petri net models of metabolic steady states including all relevant reverse reactions. Similarly, Gambin et al. made an attempt to model the stationary state of a gene regulatory network in a Petri net [Gambin et al., 2006]. A big step forward towards dynamic simulation with the aid of Petri net models was undertaken by Matsuno et al., 2003, by extending the underlying approach further to the concept of hybrid functional Petri nets (HFPN). It was proved that this could be applied to simulating the dynamics of two regulatory pathways, and it was demonstrated how such a Petri net model could be constructed [Matsuno et al., 2003; Doi et al., 2004]. More recently, the concept of firing delay times was introduced into the HFPN approach and applied to a signaling process [Miwa et al., 2010]. The conceptualization phase was also accompanied by the active development of a number of tools assisting in the creation and manipulation of biological Petri nets. While the early efforts used a platform that was originally developed for technical purposes (Visual Object Net ++, VON++) [Chen and Hofest¨adt, 2003], a tool specifically developed soon afterwards for modeling biological processes with Petri nets was published: Genome Object Net, GON [Matsuno et al., 2003; Doi et al., 2004]. GON evolved subsequently into Cell Illustrator (CI), a platform that is suitable for easy modeling and simulating the dynamics of cellular processes [Nagasaki et al., 2010]. A specialized tool (STEPP) for searching in a Petri net for those paths that connect, e.g., two metabolites and thus generating hypotheses about the interconversion of these two compounds was introduced by Koch et al., 2004; the program is still available for download (http://www. bioinformatik.uni-frankfurt.de/tools/stepp/stepp.html). Janowski et al. developed a powerful network editor, VANESA, that is able to work with Cell Illustrator as a simulation engine [Janowski et al., 2010]. A number of applications of Petri net-based modeling and simulation have been published in recent years. While early work usually focused on certain parts of the metabolic network such as glycolysis and pentose phosphate pathway [Voss et al., 2003; Zevedei-Oancea and Schuster, 2003; Doi et al., 2004], nucleotide metabolism [Zevedei-Oancea and Schuster, 2003], urea cycle [Chen and Hofest¨adt, 2003], or the conversion of sucrose into starch in potato [Koch et al., 2004], different kinds of regulatory networks have attracted more attention of late. Thus, several signal transduction networks have been studied in great detail: the Fas ligand-induced cascade leading to apoptosis [Matsuno et al., 2003], the TGF-beta pathway [Takai-Igarashi, 2005], and the p53 network [Doi et al., 2006]. This year, the IL-1 pathway was added [Miwa et al., 2010]. Gene regulatory events have also been studied, such as the control of circadian rhythm in Drosophila [Matsuno et al., 2003], the regulation of glycolysis by the lac operon [Doi et al., 2004], the flower
Preface
ix
morphogenesis of Arabidopsis [Gambin et al., 2006]. The latter topic has now been resumed by Kaufmann et al., with the aid of Cell Illustrator [Kaufmann et al., 2010]. The mRNA turnover for a number of components relevant for cell-cycle regulation was studied with a stochastic Petri net by Csik´asz-Nagy and Mura, 2010. The assembly of the splicesosome has been modeled by Bortfeldt et al., 2010, analyzing the modular nature of this regulatory network. How intercellular communication processes can be linked with intracellular regulatory events and simulated has been demonstrated in the contributions of Janowski et al., taking the bacterial quorum sensing as an example [Janowski et al., 2010]. The impact of delays and noise on the dopamine signal transmission was investigated by E. Voit and colleagues [Wu et al., 2010], and both studies made use of the HFPN-base simulation engine of Cell Illustrator [Nagasaki et al., 2010]. This recent overview was published as a Special Issue on Petri Net Applications in Molecular Biology of ISB volume 10, and the entire collection now constitutes this First ISB Book on Biological Petri Nets. We are confident that the reader will benefit from this unique compilation of articles, and hope that it helps to illustrate the value of the Petri Net approach to modern life sciences. Sadly, on the 2nd of July, 2010, while this book was in the last stages of editing, Prof. Dr. Carl Adam Petri passed away. We mourn the loss of a great scientist. His work has inspired researchers from a broad range of disciplines, which clearly indicates his perspicacious mindset. We would like to honor his outstanding scientific merits by dedicating this book to his memory. REFERENCES • Bortfeldt, R. H., Schuster, S. and Koch, I. (2010). Exhaustive analysis of the modular structure of the spliceosomal assembly network - a Petri net approach. In Silico Biology 10, 0007. • Chen, M. and Hofestaedt, R. (2003). Quantitative Petri net model of gene regulated metabolic networks in the cell. In Silico Biology 3, 0030. • Csik´asz-Nagy, A. and Mura, I. (2010). Role of mRNA gestation and senescence in noise reduction during the cell cycle. In Silico Biology 10, 0003. • Doi, A., Fujita, S., Matsuno, H., Nagasaki, M. and Miyano, S. (2004). Constructing biological pathway models with hybrid functional Petri net. In Silico Biology 4, 0023. • Doi, A., Nagasaki, M., Matsuno, H. and Miyano, S. (2006). Simulation based validation of the p53 transcriptional activity with hybrid functional Petri net. In Silico Biology 6, 0001. • Gambin, A., Lasota, S. and Rutkowski, M. (2006). Analyzing stationary states of gene regulatory network using Petri nets. In Silico Biology 6, 0010. • Hofest¨adt, R. (2003). Petri nets and the simulation of metabolic networks. In Silico Biology 3, 0028. • Hofest¨adt, R. and Thelen, S. (1998). Quantitative modeling of biochemical networks. In Silico Biology 1, 0006. • Janowski, S., Kormeier, B., T¨opel, T., Hippe, K., Hofest¨adt, R., Willassen, N., Friesen, R., Rubert, S., Borck, D., Haugen, P. and Chen, M. (2010). Modeling of cell-cell communication processes with Petri nets using the example of quorum sensing. In Silico Biology 10, 0003. • Kaufmann, K., Nagasaki, M. and J´auregui, R. (2010). Modelling the molecular interactions in the flower developmental network of Arabidopsis thaliana. In Silico Biology 10, 0008. • Koch, I., Sch¨uler, M. and Heiner, M. (2004). STEPP – Search Tool for Exploration of Petri net Paths: A new tool for Petri net-based path analysis in biochemical networks. In Silico Biology 5, 0014. • Matsuno, H., Tanaka, Y., Aoshima, H., Doi, A., Matsui, M. and Miyano, S. (2003). Biopathways representation and simulation on hybrid functional Petri net. In Silico Biology 3, 0032. • Miwa, Y., Li, C., Ge, Q.-W., Matsuno, H. and Miyano, S. (2010). On determining firing delay time of transitions for Petri net based signaling pathways by introducing stochastic decision rules. In Silico Biology 10, 0004. • Nagasaki, M., Saito, A., Jeong, E., Li, C., Kojima, K., Ikeda, E. and Miyano, S. (2010). Cell Illustrator 4.0: A computational platform for systems biology. In Silico Biology 10, 0002. • Reddy, V. N., Mavrovouniotis, M. L. and Liebman, M. N. (1993). Petri net representation in metabolic pathways. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1, 328-336. • Takai-Igarashi, T. (2005). Ontology based standardization of Petri net modeling for signaling pathways. In Silico Biology 5, 0047.
x
Preface • Voss, K., Heiner, M. and Koch, I. (2003). Steady state analysis of metabolic pathways using Petri nets. In Silico Biology 3, 0031. • Wu, J., Qi, Z. and Voit, E. O. (2010). Impacts of delays and noise on dopamine signal transduction. In Silico Biology 10, 0005. • Zevedei-Oancea, I. and Schuster, S. (2003). Topological analysis of metabolic networks based on Petri net theory. In Silico Biology 3, 0029.
xi
Contents Preface Edgar Wingender
vii
Petri Nets and the Simulation of Metabolic Networks Ralf Hofestädt
1
Quantitative Modeling of Biochemical Networks R. Hofestädt and S. Thelen
3
Topological Analysis of Metabolic Networks Based on Petri Net Theory Ionela Zevedei-Oancea and Stefan Schuster
17
Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell Ming Chen and Ralf Hofestädt
38
Petri Nets for Steady State Analysis of Metabolic Systems Klaus Voss, Monika Heiner and Ina Koch
56
Biopathways Representation and Simulation on Hybrid Functional Petri Net Hiroshi Matsuno, Yukiko Tanaka, Hitoshi Aoshima, Atsushi Doi, Mika Matsui and Satoru Miyano
77
Constructing Biological Pathway Models with Hybrid Functional Petri Nets Atsushi Doi, Sachie Fujita, Hiroshi Matsuno, Masao Nagasaki and Satoru Miyano
92
STEPP – Search Tool for Exploration of Petri Net Paths: A New Tool for Petri Net-Based Path Analysis in Biochemical Networks Ina Koch, Markus Schüler and Monika Heiner Ontology Based Standardization of Petri Net Modeling for Signaling Pathways Takako Takai-Igarashi Simulation-Based Validation of the p53 Transcriptional Activity with Hybrid Functional Petri Net Atsushi Doi, Masao Nagasaki, Hiroshi Matsuno and Satoru Miyano
113 122
130
Analyzing Stationary States of Gene Regulatory Network Using Petri Nets Anna Gambin, Sławomir Lasota and Michał Rutkowski
143
Cell Illustrator 4.0: A Computational Platform for Systems Biology Masao Nagasaki, Ayumu Saito, Euna Jeong, Chen Li, Kaname Kojima, Emi Ikeda and Satoru Miyano
160
xii
Modeling of Cell-to-Cell Communication Processes with Petri Nets Using the Example of Quorum Sensing Sebastian Janowski, Benjamin Kormeier, Thoralf Töpel, Klaus Hippe, Ralf Hofestädt, Nils Willassen, Rafael Friesen, Sebastian Rubert, Daniela Borck, Peik Haugen and Ming Chen On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways by Introducing Stochastic Decision Rules Yoshimasa Miwa, Chen Li, Qi-Wei Ge, Hiroshi Matsuno and Satoru Miyano
182
204
Impact of Delays and Noise on Dopamine Signal Transduction Jialiang Wu, Zhen Qi and Eberhard O. Voit
222
Role of mRNA Gestation and Senescence in Noise Reduction During the Cell Cycle Attila Csikász-Nagy and Ivan Mura
236
Exhaustive Analysis of the Modular Structure of the Spliceosomal Assembly Network: A Petri Net Approach Ralf H. Bortfeldt, Stefan Schuster and Ina Koch
244
Modelling the Molecular Interactions in the Flower Developmental Network of Arabidopsis thaliana Kerstin Kaufmann, Masao Nagasaki and Ruy Jáuregui
279
Subject Index
299
Author Index
301
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2003, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-1
1
Petri Nets and the Simulation of Metabolic Networks Ralf Hofest¨adt Bioinformatics/Medical Informatics, Technische Fakult a¨ t, Universit¨at Bielefeld, Postfach 10 01 31, D-33501 Bielefeld, Germany E-mail:
[email protected]
Based on the Human Genome Project, the new interdisciplinary subject of Bioinformatics has become an important research topic during the last decade. An important catalytic element of this process is that methods of molecular biology (DNA-sequencing, proteomics etc.) allow the automatic data generation of cellular components. Based on this technology roboter systems allow to sequence small genomes in a few weeks. Moreover, the semi-automatic assembly and annotation of the sequence data can only be done using methods of computer science. The molecular data is stored in database systems available via the Internet. Based on that data, different questions can be solved using specific analysis tools. Regarding the DNA sequences we are looking for powerful software tools which will predict functional units in the DNA. Today this topic is called “From Sequence to Function” or “Post-Genomics”. The common definition of Bioinformatics addresses the application of methods and concepts of computer science in the field of biology. Bioinformatics currently stresses three main topics. The first major topic is sequence analysis or genome informatics. Its basic tasks are: assembling sequence fragments, automatic annotation, pattern matching and implementation of database systems, like EMBL, TRANSFAC, PIR, RAMEDIS, KEGG etc. The sequence alignment problem is still representing the kernel of sequence analysis tools. Nevertheless, sequence analysis is not a new topic. It was, and still is, a topic of Theoretical Biology and Computational Biology. Protein Design is the second current major research topic of Bioinformatics. The first task was to implement information systems that represent knowledge about the proteins. Today many different systems, like PIR or PDB, are available. The main goal of this research topic is to develop useful models which will allow the automatic calculation of 3D structures, including the prediction of the molecular behavior of this protein. Until now, molecular modeling failed. Protein design is also not a new research topic. Its roots are coming from Biophysics, Pharmaco Kinetics and Theoretical Biology. The third major topic is Metabolic Engineering. Its goal is the analysis and synthesis of metabolic processes. The basic molecular information of metabolic pathways is stored in database systems, like KEGG, WIT, etc. Models and specific analysis algorithms, based on the molecular knowledge represented by these database and information systems, allow the implementation of analysis tools. The idea of Metabolic Engineering represents the basic idea of the Virtual Cell. Using molecular data and knowledge, the implementation of specific models allows the implementation of simulation tools for cellular processes. Behind the algorithmic analysis of molecular data, modelling and simulation methods and concepts allow the analysis and synthesis of complex gene-controlled metabolic networks. The actual
R. Hofest¨adt / Petri Nets and the Simulation of Metabolic Networks
2
data and knowledge of the structure and function of molecular systems is still rudimentary. Furthermore, the experimental data available in molecular databases have a high error rate, while biological knowledge has a high rate of uncertainty. Therefore, only modelling and simulation and methods of artificial intelligence will suffice to discuss arising important questions. Such formal description can be used to specify a simulation environment. Therefore, modelling and simulation can be interpreted as the basic step for implementing virtual worlds that allow virtual experiments. As already mentioned more than 500 database and information systems are available, which represent molecular knowledge today. Furthermore, a lot of analysis tools and simulation environments are available. That means that basic components of the electronical infrastructure for the implementation of a virtual cell are present. The concepts and tools which are available in the literature and the Internet are based on specific questions, such as the gene regulation process phenomena, or the biochemical process control. To solve current questions, we have to implement integrative tools (Integrative Bioinformatics) which can be used finally to implement a virtual cell. If we take a look at the Internet, we can see that only online representations of cellular illustrations, taken directly from books, are available today (http://www.life.uiuc.edu/plantbio/cell/). One of the first implementations is the E-Cell project of M. Tomita (www.e-cell.org). Many new virtual cell projects are following the E-Cell project. Regarding the different methods for modelling and simulation of metabolic pathways we can divide these tools into two classes. The classical methods are members of the so called analytical class. All these tools are based on the theory of differential equations and try to realize the exact molecular simulation. The main argument against this class is that we do not have the dynamic molecular data. This was the main argument for a lot of different scientists coming from different research areas to develop discrete models. These models are based on the theory of formal languages, automata, objects, rules, expert systems etc. However, a few of these models are also hybrid models. Until now it is not clear which kind of model will be the best to help to implement the virtual cell. Exactly 10 years ago M. Mavrovouniotis and colleagues presented the first paper using Petri nets for this important application [Reddy et al., 1993]. In this paper they used only simple case condition systems for simulation of simple biochemical processes. During the last decade a lot of deeper papers were published using this method of simulation of metabolic networks. The advantages of this method are: 1. Deep theory and results are available, 2. Powerful simulation shells are available and 3. Petri nets can combine both classes in a simple way. Point three is very important because using higher Petri nets we are able to place differential equations to the arcs of the model. That means that we are able to expand a discrete to an analytical model at any time. REFERENCE • Reddy, V. N., Mavrovouniotis, M. L. and Liebman, M. N. (1993). Petri net representations in metabolic pathways. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1, 328-336.
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 1998, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-3
3
Quantitative Modeling of Biochemical Networks R. Hofest¨adta,∗ and S. Thelenb a b
University of Magdeburg, Department of Computer Science, Magdeburg, Germany University of Bonn, Department of Computer Science, Bonn, Germany
ABSTRACT: Today different database systems for molecular structures (genes and proteins) and metabolic pathways are available. All these systems are characterized by the static data representation. For progress in biotechnology, the dynamic representation of this data is important. The metabolism can be characterized as a complex biochemical network. Different models for the quantitative simulation of biochemical networks are discussed, but no useful formalization is available. This paper shows that the theory of Petrinets is useful for the quantitative modeling of biochemical networks. KEYWORDS: Bioinformatics, biochemical networks, modeling and simulation, petrinets
INTRODUCTION Methods of biotechnology allow the analysis of biochemical reactions and the isolation, sequencing, analysis, and synthesis of genes and proteins [1]. This opens a wide area of applications and will produce various changes in science and human behavior. However, in medicine new drugs could be designed by using these data methods of biotechnology and biocomputing [2]. To make this technology more useful, enormous efforts are necessary. The dream of drug design and gene therapy can become reality, if interdisciplinary efforts are successful. Therefore, the phenomena of gene regulation and the modeling of biochemical reactions has to be analyzed [3]. The new research area, that tries to solve these problems, is called metabolic engineering [4,5]. The goal of this research field is to develop and implement tools in practice and theory which will carry out the analysis and synthesis of metabolic engineering. In the case of theoretical parts, bioinformatics has already developed different tools to accomplish this version. However, database systems for genes (EMBL, GENBANK) and proteins (PIR, SWISSPROT) are available via Internet. Moreover, the Boehringer company is collecting the data of all analyzed biochemical reactions. This data is presented by the Boehringer pathway chart [6]. Since the beginning of 1997 the electronic representation of the biochemical pathways is available via internet [7]. The main gap is still the dynamic representation of these molecular data. Different models and simulation shells are developed, but this gap still exists [8]. ∗ Corresponding author: R. Hofest¨adt, Otto-von-Guericke-Universit¨at Magdeburg, Institut f¨ur Technische und Betriebliche Informationssysteme, AG/Bioinformatik und Medizinische Informatik, Universit¨atsplatz 2, D-39106 Magdeburg, Germany. E-mail:
[email protected].
4
R. Hofest¨adt and S. Thelen / Quantitative Modeling of Biochemical Networks
Biochemical pathways can be interpreted as complex graphs. Each node represents a metabolite and each edge represents a biochemical reaction which is catalyzed by specific molecular structures. Therefore, the application of the theory of Petrinets seems to be useful. The Petrinet application of biochemical reactions was introduced by Reddy et al. 1993 [9]. In this paper it is shown that Petrinets can easily simulate qualitative biochemical reactions. The problem of this presentation [9] is that gene regulation processes cannot be simulated. It is not possible to simulate the kinetic effect. The first gap could be closed, showing that different classes of conditions can be interpreted as genes, proteines, or enzymes [10]. Using this formalization, cell communication and gene regulation can easily be simulated. Moreover, the simulation of kinetic effects and the feedback control of biochemical reactions is very important. In this paper we present the extension of our formalization [10], which allows the quantitative modeling of regulatory biochemical networks.
METABOLIC ENGINEERING Metabolic engineering is the improvement of cellular activities by manipulating the enzymatic transport and the regulatory functions of the cell with the use of DNA recombination technology [4]. The opportunity to introduce heterologous genes and regulatory elements distinguishes metabolic engineering from traditional genetic approaches to improve the strain. Metabolic engineering includes manipulation of protein processing pathways as well as of pathways involving smaller metabolites. At present, metabolic engineering is more a collection of examples than a codified science. The main features of metabolic engineering can be subdivided into two parts: the theoretical and the practical part. The synthesis and creating of new products or new reactants and the synthesis of hybrid metabolic networks belong to the practical part. In this presentation, the theoretical part of metabolic engineering will be discussed. Biochemical data has to be stored by using integrative database systems. Moreover, specific analysis algorithms have to be implemented. The main task is to develop and implement interactive simulation environments, which allow the quantitative discussion of metabolic processes. Therefore, integrative simulation environments must be implemented, which allow the simulation of gene regulation, biosynthesis, and cell communication. The recombination of phenotypes and features in organisms can be carried out by using methods of DNA recombination. In the first step specific genes, which represent the desirable phenotype (for example body length), have to be isolated and integrated into a specific genome (e.g. Ti-plasmid). Gene transfer into the organism will be realized by infection of the vector molecule. This recombinant process can produce the corresponding phenotype of this gene. A well known example is the so-called ‘super mouse’ [11], which contains the growth gene of the rat. By expressing this gene, the body length of the ’super mouse’ is double the length of a mouse. A popular research field is to identify genetic defects, which will produce metabolic diseases. To repair such defects by using methods of biotechnology is the main task of human genetics. The first step is to identify and modify defect genes. The main problem is the regulation of genetic activity. This is the reason that cellular control mechanisms are analyzed in the field of molecular genetics and biotechnology. Metabolism represents a highly connected system of biochemical reactions, gene regulation mechanisms, and cell communication processes. Therefore, the main task is to develop new models and simulation shells which will allow to modify complex metabolic processes.
R. Hofesta¨ dt and S. Thelen / Quantitative Modeling of Biochemical Networks
5
Fig. 1. Abstract model of biochemical reactions.
MODELING AND SIMULATION Biochemical networks The metabolism is based on biochemical reactions. To understand the behavior of biochemical networks, modeling and simulation are important [3]. In the case of genes, enzymes, and biochemical reactions, database systems are available, which represent the analyzed molecular data. Several models have been developed, but the main gap in this area of modeling and simulation is the development of an integrative model and simulation shell [12], which allows the dynamic representation of biochemical networks. The meaning of “integrative” is that this model enables discussion of biosynthetic processes, gene regulation processes, and cell communication processes. Therefore, integrative models allow the discussion of regulatory metabolic networks. The genetic information (DNA) controls metabolism indirectly. The protein synthesis of structural genes produces specific enzymes which catalyze biochemical reactions. The transcription of these genes has to be regulated by enzymatic mechanisms. The fundamental model of gene regulation is based on the model of Jacob and Monod [13] for the synthesis of the Lactose operon. The primary unit of the gene regulation is the operon, which consists of the promoter, the operator, the gene(s), and the terminator sequence. The RNA polymerase identifies the promoter sequence of the operon and carries out the transcription process. The affinity of the promoter/RNA polymerase complex is defined by specific DNA signal structures, which are called the Pribnow box and the −35 box of the promoter (prokaryotes). Homeotic genes, transposons, enhancers and silencers demonstrate that gene regulation is a complex process. The metabolic control of a cell is defined by biochemical reactions, which change substrates into products (S → P). This can be done spontaneously or catalyzed by specific enzymes (S-E → P). Most of the biochemical reactions are 2-way processes, which are catalyzed by enzymes (S ← E → P). Therefore, concentration rates are important. In some cases specific molecules, which are called inhibitors (I), are able to reduce the flux. However, the flux of biosynthetic processes is controlled by enzyme affinity, enzyme concentration, and reaction rate (p). These parameters can be modified by proteins and enzymes, which are called influence proteins. In the case of 2-way biochemical reactions, the enzyme will catalyze biochemical reactions from the higher to the lower level of the concentration rates. Moreover, kinetic effects are important [14]. Most biochemical reactions follow the Michaelis-Menten kinetic scheme, which is characterized by the following equation: V = −dS/dt = Vmax ∗ S/S + Km
6
R. Hofest¨adt and S. Thelen / Quantitative Modeling of Biochemical Networks
where S is the substrate concentration at the given rate of reaction, V the maximum rate of hydrolysis and Km the Michaelis constant. V and K m are two constants that characterize the interactions of the enzyme with its substrate. Enzymes can be controlled by modifying the affinity, efficiency, and specificity of the enzyme. However, genes and their regulation mechanisms, biosynthesis and their catalytic, cell communication processes and liveliness of all these components are called elementary metabolic processes, which define the behavior of the metabolism. All these processes build metabolic networks, which are interconnected with elementary metabolic processes influencing each other in a well defined way. Related works The simulation of metabolic processes is based on specific models, which can be subdivided into the classes of abstract, discrete, and analytical models. The abstract models are based on automata and logical models, which permit the global discussion of fundamental aspects. The goal of analytical models is the exact quantitative simulation, where the analysis of kinetic features of enzymes is important. The paper of Waser et al. [15] presents a computer simulation of phosphofructokinase. This enzyme is part of the glycolysis pathway. Waser et al. model all kinetic features of the metabolic reaction by computer simulation. This computer program is based on chemical reaction rules, which are described by differential equations. Franco and Canelas simulate the purine metabolism by differential equations, where each reaction is described by the relevant substance and the catalytic enzyme using the Michaelisconstant of each enzyme [16]. Discrete models are based on state transition diagrams. Simple models of this class are based on simple production units, which can be combined. Overbeek presented an amino acid production system, a black-box with an input-set and an output-set describing a specific production unit [17]. The graphical model of Kohen and Letzkus [18], which allows the discussion of metabolic regulation processes, is representative for the class of graph theoretical approaches. They expand the graph theory by specific functions which allow the modeling of dynamic processes. In this case the approach of Petrinets is a new method. Reddy et al. [9] presented the first application of Petrinets in molecular biology. This formalism is able to model metabolic pathways. The highest abstraction level of this model class is represented by expert systems [19] and object oriented systems [20]. Expert systems and object oriented systems are developed by higher programming languages (Lisp, C++) and allow the modeling of metabolic processes by facts/classes (proteins and enzymes) and rules/classes (chemical reactions). The grammatical formalization is able to model complex metabolic networks [21]. PETRINETS – BASICS Petrinets, a graph-oriented formalism, allow the modeling and analysis of systems, which comprise properties such as concurrency and synchronization. A Petrinet consists of transitions and places, which are connected by arcs. In the graphical representation, places are drawn as circles, transitions are drawn as thin bars or as rectangles, and arcs are drawn as arrows. The places and transitions are labeled with their names. Places may contain tokens, which are drawn as dots. The vector representing the number of tokens in each place is the state of the Petrinet and is referred to as marking. The marking can be changed by the firing of the transitions, which is determined by arcs. The arcs can further be divided into input and output arcs. Generally, arcs may have multiplicities greater than one. In the following only single arcs are assumed. A transition is said to be enabled, if all places connected with input arcs contain tokens. An enabled transition may fire by removing a token from each place connected with an input
R. Hofesta¨ dt and S. Thelen / Quantitative Modeling of Biochemical Networks
7
arc and adding a token to each place connected with an output arc. The transitions can be divided into immediate transitions, firing without delay, and timed transitions, firing after a certain delay. A Petrinet is a bipartite directed graph, which can be represented graphically. The places contain indistinguishable tokens, which can be fired by the transitions. The vector representing the number of tokens in each place is called the marking of the net. The reachability graph contains all markings reachable from an initial marking. Formal definitions of elementary Petrinets and theoretical results about their structural properties can be found in [22]. Several extensions of elementary Petrinets were proposed for more compact models and a higher level of abstraction. In colored Petrinets tokens can be distinguished, and in predicate-transition nets tokens are expressions, which are manipulated by the firing of transitions. Initially, Petrinets model only qualitative aspects of a system. In order to include quantitative aspects, every place can hold a well defined number of tokens. The capacity of the place defines the maximum of tokens which can be held by this place. However, the definition of the firing rule has to be extended. The input and output arc will be labeled by integer values. In the case of the input arc, this value states that the transition can fire, if each input place represents equal or more tokens than the input arc will specify. The firing process of a transition will delete tokens and will produce tokens into the output places. The number of deleting and producing tokens will be specified by the arrow weight. More formally we will define some basic terms. A net N = (P,T,F) is defined as – P and T are finite disjunct sets and – F is a subset of the set (P × T) ∪ (T × P). A place-transition net consists of – – – – – –
places (represented as circles); transitions (represented as boxes); arrows from places to transitions and from transitions to places; a capacity indication for every place; a weight for every arrow (represented as a number); an initial marking, defining the initial number of tokens for every place.
In a place-transition net – a marking is indicated by the number of tokens in every place; – a place p is in the pre-set (or post-set) of a transition t, if there is an arrow from p to t (or an arrow from t to p); – a transition t is activated, if 1. for every place p from the pre-set of t the weight of the arrow from p to t is not greater than the number of tokens indicated at p; 2. for every place p in the post-set of t the number of tokens at p increased by the weight of the arrow from t to p is not greater than the capacity of p; – an activated transition t will occur in the number of tokens at every place p is decreased by g, if g is the arrow weight of (p → t) and in that the number of tokens at every place p’ is increased by g’, if g’ is the arrow weight of (t → p’).
8
R. Hofest¨adt and S. Thelen / Quantitative Modeling of Biochemical Networks
Fig. 2. A self-modified Petrinet for the biocatalytic reaction of lactose.
MODELING OF BIOCHEMICAL NETWORKS Self-modified Petrinets The main feature of metabolic processes is that the concentration of metabolites will influence the reaction activity of biochemical processes. Therefore, the actual concentration of any metabolite is an important component of the model. This can be done by the extension of the place-transition net including the self-modified component, which was at first defined by Valk [23]. The main feature of this formalization is that each identifier of any place can be used as a parameter of any arrow weight formula. Example 1: The concentration of the enzyme is important for the biocatalytic process. Using selfmodification networks the biocatalytic reaction of lactose into glucose and galactose can be described as follows: each unit lactose will produce one unit of galactose and one unit of glucose. The reaction will only be activated if the enzyme ß-galactosidase is available. Therefore, the concentration of ßgalactosidase will be used. The arrow weight from the place ß-galactosidase to the transition reaction will be ß-galactosidase. This set will be used and produced. Based on these ideas we give a formal definition of self-modified Petrinets and self-modified Petrinets with capacity. Definition: N = (P, T, F, Vs , m0 ) is called self-modified Petrinet, iff – (P,T,F) is a net, – Vs : P × PN × T → N with PN : = P ∪ N, – m0 start configuration. Definition: N = (P, T, F, Ku , Ko , Vs , m0 ) is called self-modified Petrinet with capacity, iff – – – –
(P, T, F, Vs , m0 ) is a self-modified net, Ku : P → N is the minimal capacity of each place, Ko : P → N is the maximal capacity of each place, mo : P → N a start mark with Ku (p) mo (p) Ko (p).
R. Hofesta¨ dt and S. Thelen / Quantitative Modeling of Biochemical Networks
9
Fig. 3. A self-modified Petrinet modeling the lactose biocatalytic process using functions.
Petrinets with functions The self-modified Petrinet allows the modeling of biochemical processes using actual concentrations. Moreover, it makes sense to model this biocatalytic reaction using functions, which allow each transition to simulate kinetic effects. The calculation of the dynamic biocatalytic process can be realized by using functions for specifying the arrow weight. Moreover, complex relations and conditions can be combined which will activate transitions. Functional Petrinets are specific predicate/transition networks [24], which represent abstract networks. Regarding predicate/transition networks, two kinds of modification will be described. Tokens of the same place can be of different types and the arrow weight will be described by using a specific description language. In the case of biochemical modeling, we only need the second feature of predicate/transition networks, because the possibility of using functions allows the mapping of natural numbers, where identifiers of different places within the net can be used as variables. Definition: N = (P, T, F, VF , m0 ) is called functional net, iff – (P,T,F) is a net, – VF (f) ∈ { g(x1 , . . . , xn } | g: PN x . . . x PN – m0 a start configuration of N.
N }, n ∈ N ,
Example 2: The biocatalytical reaction of example 1 will be extended using the functional description. In the following a linear dependence is suggested. The start configuration m 0 = (100,20,0,0) means: 100 units of lactose and 20 units of the enzyme ß-galactosidase. The linear factor is n = 2. Simulation with Petrinets The formalization of biochemical reactions by the Petrinet model as described in this paper allows the simulation of biochemical networks. At the beginning of the modeling process we have to identify the metabolites (places), the biochemical reactions (transitions), and their relations, which will define the structure of the model (arcs and the arrow weights). Moreover, we have to define the start configuration (tokens into places). Based on this configuration transitions will be enabled, and the firing process will produce new configurations of the Petrinet. However, based on this formalization all possible new
10
R. Hofest¨adt and S. Thelen / Quantitative Modeling of Biochemical Networks
configurations can be calculated using the matrix formalization of this method. Regarding example 2 the corresponding matrix and the vector of the start configuration will be: PLACES
TRANSITIONS
Lactose ß-Galactosidase Galactose Glucose
− (n * ß-Galactosidase) − (n * ß-Galactosidase) + (n * ß-Galactosidase) + (n * ß-Galactosidase) + (n * ß-Galactosidase)
start configuration: (100, 20, 0, 0). If m0 is the vector of the start configuration, C the actual reaction matrix, and x the vector, that represents the firing transition, the new configuration can be calculated using the following equation: m := m0 + Cx. After each generation the actual vector x has to be calculated. The simulation of a Petrinet can be a sequential or a parallel process. Regarding the sequential simulation only one transition can be fired. However, regarding parallel simulations activated or enabled transitions can be fired simultaneously. Definition: Let N = (P, T, F, Ku , Ko , V, m0 ) be a Petrinet. For each t ∈ T the mapping t + and t− is defined as: ⎧ if p ∈ tF and (tp) represent an arrow weight using a variable V (t, p) ⎪ ⎪ ⎨ k V s(p, q, t) if p ∈ tF and the arrow weight of (tp) represent q + t (p) := V (f )(x , . . . , x ) if p ∈ tF and the arrow (tp) represent a function ⎪ F 1 n ⎪ ⎩ 0 otherwise ⎧ if p ∈ tF and (tp) represent an arrow weight using a variable V (t, p) ⎪ ⎪ ⎨ k Vs (p, q, t) if p ∈ tF and the arrow weight of (tp) represent q t− (p) := V (f )(x , . . . , x ) if p ∈ tF and the arrow (tp) represent a function ⎪ F 1 n ⎪ ⎩ 0 otherwise
t(p): = t+ (p) − t− (p). The parallelism effect is a dynamic feature and not a structural component of the Petrinet. Definition: Let N = (P, T, F, Ku , Ko , V, m0 ) be a Petrinet, U T a transition set and m the actual configuration of P. The set U is called simultaneous application set regarding m, iff U− = t− and U + = t+ . t∈U
t∈U
Features of Petrinets A fundamental feature is reachability. Regarding a Petrinet and a start configuration, we have to answer the question, as to whether a specific configuration can be produced. Mayr et al. showed that the reachability question is solvable for each Petrinet [22]. However, the complexity of this algorithm allows no practical solution. We know that we need exponential growth in time to solve this problem [22]. The limitation of Petrinets is another important feature. A Petrinet which represents places with a restricted account of tokens is called a limited Petrinet. A Petrinet using capacities is limited by
R. Hofesta¨ dt and S. Thelen / Quantitative Modeling of Biochemical Networks
11
Fig. 4. Extension of the model of Reddy et al. [9].
definition. An algorithm for the examination of the limitation of Petrinets was presented. However, using self-modified Petrinets the detection of limitation is an unsolvable problem [23]. The capacity value is important for the detection of bottlenecks. The definition of capacities permits fixing an interval for each metabolite, which represents the normal scope of this concentration. Moreover, the detection of bottlenecks can be reduced to the reachability problem. Using small Petrinets the reachability graph can be constructed in practice, which permits calculating all bottlenecks. Biochemical networks represent a set of biochemical reactions which are highly connected. To analyze metabolic pathways, all activated biochemical reactions are of importance. However, death and liveliness of transitions and configurations must be considered. A transition is called death, if it can never be enabled. Otherwise the transition is called liveliness. The detection of death and liveliness depends on the reachability problem. APPLICATIONS Description of the model The formalization of Reddy et al. [9] does not permit modeling the kinetic effects of biochemical reactions. However, our extension allows a flexible modeling process. Therefore, we have to consider the following aspects: – – – –
actual arrow weight depends on the actual configuration, inhibitor metabolites reduce the concentration of the metabolites, a transition can also be activated without inhibitors and activators, for the detection of bottlenecks and critical configurations we define concentration borders for every place which will be tested after the firing of each transition.
The extension of our model will be demonstrated in Fig. 4. Activators and inhibitors will be used regarding the actual configuration. The arrow weight is no longer a constant description. In our example F and H design the concentration of the activator and
12
R. Hofest¨adt and S. Thelen / Quantitative Modeling of Biochemical Networks
Fig. 5. Petrinet representation of the biocatalytic reaction using self-modified arcs.
the inhibitor. F and H are elements of the arrow weight description. Therefore, the Petrinet represents self-modified arrows. The reaction t produces the product RP using three units of S 1 and four units of S2 . These values are multiplied with a factor, that represents the actual concentration of activators, inhibitors, and the actual configuration. For every place capacity values are defined. These values signal concentration intervals, which describe the correct flux of the metabolism. Biocatalytic reaction For the simulation of the flux of biochemical reactions our model has to be able to represent the concentrations of the simulated metabolites. Using our formalization of self-modified Petrinets these features are available. The identifier of any place, which represents the enzyme, can be used as a parameter inside any definition of an arrow weight. Therefore, our model is able to simulate biocatalytic reactions using the theory of Petrinets. The transitions are biochemical reactions, which will not consume the enzyme, and the places are metabolites of these reactions. The formal description of a simple biocatalytic reaction can be given: P = { Substance, Enzyme, Product } T = { Reaction } F = { (Substance, Reaction), (Enzyme, Reaction), (Reaction, Enzyme), (Reaction, Product) } Regarding this process, Reaction is the catalysed transition of one unit of the Substance into two units of the Product. The arrow weight can be defined: V(Substance, Reaction) = 1 V(Enzyme, Reaction) = Enzyme V(Reaction, Reaction) = Enzyme V(Reaction, Product) = 2 The flexible simulation will use functions as: V(Substance, Reaction) = 1 * Enzyme V(Reaction, Product) = 2 * Enzyme Gene regulation With regard to gene regulation processes, we have to distinguish positive and negative control. Both regulation processes can be either inducible or repressible. Regarding the positive control, we know that the activator will initiate the transcription process. Regarding the inducible positive control, an effector
R. Hofesta¨ dt and S. Thelen / Quantitative Modeling of Biochemical Networks
13
element (inducible enzyme) will enable the transcription process. However, the effector element will activate the activator element. Both elements will build a protein complex which enables the transcription process. The formal description of the positive control is: P = {activator inactive, effector, activator active, protein} T = { conformation, transcription } F = { (activator inactive, conformation), (effector, conformation), (conformation, activator active), (activator active, transcription), (transcription, protein) } With regard to repressive positive control, the effector element will repress the transcription process. The catalytic element is inducible for the transcription process until the effector element will appear. The effector element will inactivate the activator. P = { activator active, effector, activator inactive, protein } T = { conformation, transcription } F = { (activator active, conformation), (effector, conformation), (conformation, activator inactive), (activator active, transcription), (transcription, protein) } The negative control will be controlled by a repressor element which will repress the transcription process. Regarding the inducible negative control, an active repressor will suppress the transcription process. The enzyme complex of the active repressor and the operator will prevent the transcription process. The effector element inactivates the repressor that will enable the transcription process. P = { repressor active, repressor/operon, inductor, repressor inactive, active operon, protein } T = { connection, docking/RNA polymerase, transcription } F = { (repressor active, connection), (connection, repressor/operon), (repressor/operon, docking/RNA polymerase), (inductor, docking/RNA polymerase), (docking/RNA polymerase, repressor inactive), (docking/RNA polymerase, active operon), (active operon, transcription), (transcription, protein) } The last type of gene regulation processes discussed here is the repressible negative control. The main feature of this control mechanism is that the appearance of the inductor will suppress the transcription process. Without the inductor the operon is active. The repressor is inactive and will be activated by the inductor element. The activated repressor will deactivate the operon, the transcription process will be blocked up. P = { repressor inactive, inductor, repressor active, repressor/active operon, active operon, protein} T = { conformation, docking repressor, transcription} F = { (inductor, conformation), (repressor inactive, conformation), (conformation, repressor active), (repressor active, docking repressor), (docking repressor, repressor/active operon), (active operon, transcription), (transcription, protein) } Cell communication Regarding cell communication processes, which are based on exocytose and endocytose or cellular gaps, the formalization by using Petrinets is simple. Uptake of metabolites by a cell can be formalized using a transition without incoming arrows. On the other hand, if substances leave the cell, we use transitions without outgoing arrows. Regarding specific receptor activities new transitions have to be added.
14
R. Hofest¨adt and S. Thelen / Quantitative Modeling of Biochemical Networks
Wall chart representation All analyzed biochemical reactions are collected by the Boehringer company [6]. Moreover, the KEGG information system represents the static representation of this biochemical data [7]. Based on that data our model allows the dynamic representation of biochemical pathways. In this chapter we will discuss the representation of the glycolysis using our Petrinet model. This biochemical network is a subset of the Boehringer pathway chart. The glycolysis is an important biochemical process which allows the metabolic production of energy. Most of the biochemical reactions of glycolysis are biochemical reactions, which are controlled by positive (metabolites) and negative components (ADP, Insulin). However, the Petrinet modeling of biochemical processes makes regulation components visible. In our Petrinet representation the inhibitor process of P-enol-pyruvate can be shown directly. Glycolysis is a complex example which consists of eight reactions (transitions); different metabolites are connected. Our Petrinet representation describes the biochemical process in direction of the glycolysis. The effects of the enzymes are shown by bold arcs. The positive and negative influence of substances will be shown, using bi-directional interrupt arcs. CONCLUSION An important task of Molecular Bioinformatics is to develop information systems for the simulation of biochemical networks. Therefore, models have to be defined which are able to simulate biochemical networks based on the static data representation ci7. A lot of different models are presented [2,8], but we are still looking for a useful formalization which will solve this task. Petrinets belong to the class of discrete models, which also allow quantitative analysis. Quantitative and qualitative simulations are important in order to understand the molecular behavior of biochemical reactions. Moreover, kinetic effects can be studied directly using this method. The first Petrinet approach for the simulation of metabolic pathways was presented by Reddy et al. 1993 [9]. This approach is based on the condition event net and discusses qualitative aspects. Moreover, positive and negative components are not included, and the dynamic behavior of biochemical reactions is not represented. Using our approach, the modeling of metabolic networks is possible. This formalization can be used, for example, for the dynamic representation of the Boehringer pathway chart [6], which differs between two domains: the genetic pathways and the domain of metabolic reactions. The advantage of our approach is: – the graphical representation is a model which corresponds to biochemical reactions, – the components of our model are substances (places) and reactions (transitions), – the relations between substances will be characterized by directed arcs. Our qualitative model permits a biochemical reaction to consume and produce concentrations. This biochemical behavior is similar to the pre- and post-conditions of Petrinets, and our definition of selfmodification permits the representation of kinetic effects. Moreover, consuming substances depend on the actual concentration of substances, and the kinetic behavior can be discussed in detail using functions as specific arrow weights. Quantitative changes can also be modeled by the modification of the structure of the Petrinet which allows the discussion of the influence of specific substances. Moreover, the rates of the places can be modified. By means of modifying the actual arrows, new reactions can be defined. Our formalization is a parallel and discrete model which allows the quantitative simulation of metabolic processes. In the research field of biotechnology and molecular medicine the quantitative simulation
R. Hofesta¨ dt and S. Thelen / Quantitative Modeling of Biochemical Networks
15
Fig. 6. Petrinet modeling of the glycolysis pathway.
of metabolic pathways is important. The detection of genetic defects, which will modify metabolism (metabolic defects), is one approach. The detection of metabolic defects is important, because these effects cause metabolic diseases. Therefore, regarding the corresponding Petrinet the detection of metabolic bottlenecks is necessary. The capacity component of our formalization allows the detection of metabolic bottlenecks. Moreover, we can identifiy and discuss the causal reason for this effect.
16
R. Hofest¨adt and S. Thelen / Quantitative Modeling of Biochemical Networks
ACKNOWLEDGEMENT This work was supported by the Ministry of Science and Art of the Government of Rheinland-Pfalz. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24]
von Heijne, G. (1987). Sequence Analysis in Molecular Biology. Academic Press, San Diego. Hofest¨adt, R., Lengauer, T., L¨offler, M. and Schomburg, D. (1997). Bioinformatics. LNCS 1278, Springer-Verlag, Heidelberg. Hofest¨adt, R., Collado-Vides, J., L¨offler, M. and Mavrovouniotis, M. (1996). Modelling and Simulation of Metabolic Pathways, Gene Regulation and Cell Differentiation. BioEssays 18, 333-335. Bailey, J. (1991). Toward a Science of Metabolic Engineering. Science 252, 1668-1674. Mavrovouniotis, M., Stephanopoulos, G. and Stephanopoulos, G. (1990). Computer-Aided Synthesis of Biochemical Pathways. Biotechnol. Bioeng. 36, 1119-1131. Michal, G. (1993). Biochemical Pathways. Boehringer Mannheim, Penzberg. Kanehisa M., and Goto, S. (1997). A Systematic Analysis of Gene Functions by the Metabolic Pathway database. In: Suhai, S. (ed.), Theoretical and Computational Methods in Genome Research, Plenum Press, New York, pp. 41-56. Collado-Vides, J., Hofest¨adt R., L¨offler M. and Mavrovouniotis, M. (1996). Modeling and Simulation of Gene and Cell Regulation. Dagstuhl-Seminar-Report 130. Reddy, V. N., Mavrovouniotis, M. L. and Liebman, M. N. (1993). Petri Net Representation in Metabolic Pathways. In: Hunter, L. et al. (eds.). Proceedings First International Conference on Intelligent Systems for Molecular Biology, AAAI Press, Menlo Park, pp. 328-336. Hofest¨adt, R. (1994). A Petri Net Application of Metabolic Processes. Journal of System Analysis, Modelling and Simulation 16, 113-122. Gardner, E., Simmons, E. and Snustad, D. (1991). Principles of Genetics. John Wiley and Sons, New York. Hofest¨adt, R. and Meineke, F. (1995). Interactive Modelling and Simulation of Biochemical Networks. Comput. Biol. Med. 25, 321-334. Jacob, F., and Monod, J. (1961). Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3, 318-356. H¨ofer, T. and Heinrich, R. (1993). A Second-order Approach to Metabolic Control Analysis. J. Theor. Biol. 164, 85-102. Waser, M., Garfinkel, L., Kohn, C. and Garfinkel, K. (1983). Computer Modeling of Muscle Phosphofructokinase Kinetics. J. Theor. Biol. 103, 295-312. Franco, R. and Canela, E. (1984). Computer simulation of purine metabolism. Eur. J. Biochem. 144, 305-315. Selkov, E., Basmanova. S., Gaasterland, T., Goryanin, I., Gretchkin, Y., Maltsev, N., Nenashev, V., Overbeek, R., Panyushkina, E., Pronevitch, L., Selkov jr., E. and Yunus, I. (1996). The Metabolic Pathway Collection from EMP: the Enzymes and Metabolic Pathways Database. Nucleic Acids Research 24, 26-28. Kohn, M. and Letzkus, W. (1982). A Graph-theoretical Analysis of Metabolic Regulation. J. Theor. Biol. 100, 293-304. Brutlag, D., Galper, D. and Millis, D. (1991). Knowledge-based simulation of DNA metabolism: prediction of enzyme action. Comput. Appl. Biosci. 7, 9-19. Stoffers, H. et al. (1992). METASIM: object-oriented modeling of cell regulation. Comput. Appl. Biosci. 8, 443-449. Collado-Vides, J. (1991). A Syntactic Representation of Units of Genetic Information – A Syntax of Units of Genetic Information. J. Theor. Biol. 148, 401-429. Baumgarten, B. (1992). Petri Netze. BI Verlag, Mannheim. Valk, R. (1978). Self-Modifying Nets: A Natural Extension of Petrinets. LNCS 62, 464-476 Reisig, W.(1986). Petri Netze. Springer-Verlag, Heidelberg.
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2003, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-17
17
Topological Analysis of Metabolic Networks Based on Petri Net Theory Ionela Zevedei-Oancea and Stefan Schuster∗ Max Delbr¨uck Center for Molecular Medicine, Department of Bioinformatics, Berlin-Buch, Germany
ABSTRACT: Petri net concepts provide additional tools for the modelling of metabolic networks. Here, the similarities between the counterparts in traditional biochemical modelling and Petri net theory are discussed. For example the stoichiometry matrix of a metabolic network corresponds to the incidence matrix of the Petri net. The flux modes and conservation relations have the T-invariants, respectively, P-invariants as counterparts. We reveal the biological meaning of some notions specific to the Petri net framework (traps, siphons, deadlocks, liveness). We focus on the topological analysis rather than on the analysis of the dynamic behaviour. The treatment of external metabolites is discussed. Some simple theoretical examples are presented for illustration. Also the Petri nets corresponding to some biochemical networks are built to support our results. For example, the role of triose phosphate isomerase (TPI) in Trypanosoma brucei metabolism is evaluated by detecting siphons and traps. All Petri net properties treated in this contribution are exemplified on a system extracted from nucleotide metabolism. KEYWORDS: Petri nets, elementary flux mode, metabolic networks, P -invariant, T -invariant, incidence matrix
INTRODUCTION The aim of theoretical approaches is to build models, which should facilitate the study of real systems. A huge variety of models has been developed in theoretical biology. For example, biochemical reaction networks are the subject of extensive modelling studies [Fell and Wagner, 2000; Heinrich and Schuster, 1996; Jeong et al., 2000; Peleg et al., 2002; Schuster et al., 2002a, 2002b; Teusink et al., 1998]. This modelling has important applications in biotechnology [Klamt and Stelling, 2003; Liao et al., 1996; Schuster et al., 2000; Van Dien and Lidstrom, 2002] and genome research [Dandekar et al., 1999; Price et al., 2002; Fo¨ rster et al., 2002]. Specific features of biochemical systems are that most reactions are catalyzed by enzymes and that many reactions utilize more than one substrate (reactant) and/or generate more than one product. Reaction systems that only involve isomerizations (that is, reactions with one substrate and one product) can be depicted as graphs and their properties can be studied from the point of view of graph theory. However, real biochemical networks cannot be represented as graphs due to bior multimolecular reactions. These cases would require that arcs linking three or more nodes exist. One, quite complicated, approach to coping with this situation is to consider the groups of substances on each side of reaction arrows as nodes (so-called complexes) [Clarke, 1980; Horn and Jackson, 1972]. ∗ Corresponding author: Stefan Schuster, Max Delbr¨uck Center for Molecular Medicine, Department of Bioinformatics, Robert-R¨ossle-Str 10, 13092 Berlin-Buch, Germany. Tel.: +49 30 94063125; Fax: +49 30 94062834; E-mail:
[email protected].
18
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
Fig. 1. Components of a Petri net. Their counterparts in metabolic networks are as follows: a) A → B (isomerization); b) A → B + C; c) A + B → C; d) a product that is not further consumed; e) a substrate that is not produced; f) a metabolite produced and then consumed; g) a metabolite produced in one reaction and then consumed in two or n reactions; h) the situation opposite to g); i) inhibition phenomena.
A simpler solution is offered by Petri net theory [Reisig, 1985; Starke, 1990]. Two kinds of nodes are considered: places and transitions. The nodes and arcs between them represent the static structure, while some more elements/components such as tokens indicating time-dependent weights of places are used to describe the dynamics. Beside place/transition nets, also condition/event Petri nets have been proposed in the literature. Here, we only deal with the first type. Petri nets can be employed for the graphical description of processes. They allow us to understand more intuitively the temporal evolution of systems by considering flows of tokens through the nets. They offer also an appropriate formalism for the analysis of biochemical networks, as has been pointed out earlier by several authors [Genrich et al., 2001; Heiner et al., 2000; Heiner et al., 2001; Hofest a¨ dt, 1994; Ku¨ ffner et al., 2000; Oliveira et al., 2003; Peleg et al., 2002; Reddy et al., 1996], while the above-mentioned modelling approaches [Fell and Wagner, 2000; Heinrich and Schuster, 1996; Jeong et al., 2000; Leiser and Blum, 1987; Schuster et al., 2002a, 2002b; Teusink, 1998] are independent of Petri net theory. In the present paper, we shall show the correspondence between concepts in both, Petri nets theory and traditional metabolic network analysis. Some examples will help the reader see the similarities and to exploit them. In the present paper, we shall focus on the use of Petri nets for the topological analysis of biochemical networks rather than for the analysis of the dynamic behaviour. In particular, we shall deal with various invariants and other features in these nets such as boundedness and liveness and reveal their biochemical meaning. Moreover, we shall discuss the appropriate treatment of source and sink metabolites. SIMILARITIES BETWEEN PETRI NET THEORY AND TRADITIONAL BIOCHEMICAL MODELLING In graphical representations of Petri nets, circles are used for places, while rectangles stand for transitions (Fig. 1). The correspondence place – substance (in biochemistry often called metabolite) and transition – reaction/enzyme is obvious. Metabolic networks have a static level – the stoichiometry, and a dynamic one, characterized by fluxes. The stoichiometric coefficients indicate how many molecules
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
19
Table 1 Definition of the terms preset and postset Name Preset of t Postset of t Preset of p Postset of p
Notation •t t• •p p•
Definition {p|p ∈ P , pre (p, t) = 0} {p|p ∈ P , post (t, p) = 0} {t|t ∈ T , pre (p, t) = 0} {t|t ∈ T , post (t, p) = 0}
Fig. 2. Marking and firing. M : P → N is called marking. For each place p ∈ P , M (p) represents the number of tokens which exist in p. M (p) gives the local state, while the vector m gives the state of the system and is called vector state. A transition is enabled/activated if M (p) pre(p, t) ∀p ∈ P and K(p) M (p)− pre(p, t) + post(t, p) ∀p ∈ P . The mapping K: P → N represents the maximal capacity of a place, if the number of tokens is limited. After the enabled transition fires, the new state of the system is M : P → N , so that M (p) = M (p) − pre(p, t) + post(t, p) ∀p ∈ P . In the example, the marking M = [0, 1, 1, 4, 1] is obtained from the marking M = [2, 5, 2, 1, 0] after transition t fires. The formal description is: m [t > M .
of a substance have to react to produce how many molecules of product. The stoichiometric coefficients are described by arc weights. Thus, the stoichiometry matrix containing these coefficients corresponds to the incidence matrix of a Petri net (see below). A further object – the token – was introduced to describe the dynamics of a Petri net. It is denoted by a solid dot (•) inside the circles representing places. In ordinary Petri nets, the tokens are indistinguishable. They indicate the presence or absence of a condition, a signal, or a resource. In our case, the number of tokens in a place stands for the number of molecules of that metabolite existing at a given moment. Alternatively, tokens may correspond to any predefined unit measuring the amount of substance, such as mole, millimole etc. However, this brings about that non-integer token numbers should be admitted. This leads to continuous Petri nets, which are currently being developed [Alla and David, 1998; Matsuno et al., 2000]. The tokens that exist in the system at a given time describe the state of the system. This is called marking, M (P ) [Reisig, 1985; Starke, 1990]. The system state changes when a transition fires. This can happen only if the transition is active/enabled, that means that every place from the input places set (Fig. 2 and Table 1) of the considered transition has at least as many tokens as the weight of the corresponding arc. The set of input and output places of a transition t is denoted by •t and t•, respectively. This set corresponds to the metabolites that act as reactants and products, respectively, in the reaction t. The new state is obtained by subtracting from each input place of the considered transition a number of tokens equal to the weight of the corresponding arc and adding in each output place a number of tokens equal to the weight of the corresponding arc (Fig. 2 and Table 1). Formally, we can also speak about the input and output transitions set of a place p − •p and p•, respectively, which contain all the transitions which produce, respectively consume, the metabolite p. It is useful to know between which places (P ) and transitions (T ) there exist arcs. For this purpose, two mappings describing weights were introduced (see also Table 1): pre: PT N and post: TPN (with N denoting the set of natural numbers). One can think about them also as matrices. The rows inpre correspond to places and the columns to transitions, while in the matrix post, the roles of rows and
20
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
Fig. 3. Simple example of capacity limitation in a metabolic system.
columns are transposed. The entries of these matrices have a nonzero value (equal to the weight of the arc), if an arc exists, and zero otherwise. Further, the topological structure of a Petri net can be represented by an integer matrix, C, called an incidence or flow matrix. C is an nm matrix whose m columns correspond to the transitions and n rows correspond to the places of the net. The following relation holds true: C = post T – pre. The mappings pre and post can be reconstructed from the matrix C in the following simple way: post(t j , pi ) = max{Cij , 0}, pre(pi , tj ) = min{Cij , 0}. It is worth finding whether another state can be reached from a given state. This is related with the property of reachability. In metabolic networks, we can search all possible subsequent states, knowing the initial state of resources. Another interesting problem is to deduce all appropriate initial states from a desired later state. Places can hold an arbitrary number of tokens or can be restricted by a given number – capacitated places. In Fig. 3, the unnamed places are considered external (with inexhaustible numbers of tokens). Transitions T1 and T2 are activated if there is at least one token in the place ATP (the currency of metabolic energy), respectively ADP. If the initial marking is [c, c, c, c, 1, 0], transition T 1 can fire and produce the marking [c, c, c, c, 0, 1]. Now, transition T 2 is enabled. It fires and we obtain again the initial marking. In this example, ATP + ADP = 1, independent of the system state. This is a conservation relation, which leads to boundedness of the capacity of all the internal places. Usually the places of biological systems are not considered to be limited because the limitation due to the finite size of living cells is not critical to most biochemical processes. There are only cases where the limitation comes from a conservation relation such as in the above case. Another situation important in biology is the presence of inhibitors. The corresponding Petri net model can be extended by a special element, called inhibitory arc (Fig. 1i). The inhibitor is represented by a place. If there is a token at that place, the transition is not enabled, so it does not fire. ´ and T´oth, 1989; Heinrich Note that the incidence matrix corresponds to the stoichiometric matrix [ Erdi and Schuster, 1996] for metabolic networks if the Petri nets are pure. That means that the networks do not involve self-loops (Fig. 4), because self-loops cannot be represented in the incidence matrix: a coefficient of 1 and a coefficient of −1 cancel each other to yield zero in the matrix, thus losing track of the self-loop. Thus, we should identify the situations that produce self-loops and the way to treat them without losing the biological meaning. One such situation occurs when enzymes are considered as normal substrates. There are algorithms for deleting/eliminating self-loops: First, one can treat the
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
21
Fig. 4. Self-loops. Left: the two types of self-loops. The conditions that a place and a transition are in a self-loop is: pre (p, t) · post(t, p) = 0. In model b), the place marked by E represents an enzyme, which is regenerated after the reaction. The self-loop can be eliminated by decomposing the reaction in half-reactions (the two reactions that are depicted in the second part of b) and considering the enzyme-substrate complex (ES). The new model represents a pure Petri net, which satisfies the relation: pre (p, t) · post(t, p) = 0, ∀p ∈ P ∀t ∈ T . The number of tokens in the place ES is the difference between the capacity of the old place (the total amount of enzyme, we consider this equal to 1) and the number of tokens that place E contains already.
Fig. 5. Petri net representation of autocatalysis.
enzyme as a parameter (as usually done in biochemical modelling, see Heinrich and Schuster, 1996) rather than explicitly as a substrate. Second, for each loop, one may introduce another place and another transition. If all weights equal unity, the number of tokens in the new place is the difference between the capacity of the old place and the number of tokens that this old place contains already (Fig. 4b). By this construction, the new place corresponds to the enzyme-substrate complex, and the so-called overall reaction (old transition) has been decomposed into half-reactions (the new transitions). This construction can be applied also when the arcs have a multiplicity larger than 1, but care should be taken that the new arcs might have also such multiplicities. If p and t stand for the new place, respectively for the new transition, the new arcs have to respect the formulae: pre(p , t) = post(t , p ) = pre(p, t ): = pre(p, t), while the old arcs keep the same multiplicity. This situation can be nicely illustrated by the biochemical example of autocatalysis: A + B give 2B (Fig. 5). This reaction cannot simply be reduced to A gives B, because a small quantity of the product B is needed to start the reaction. Second, a reversible reaction might be wrongly interpreted as a self-loop (Fig. 6). So, for each reversible reaction, we should consider only one flux direction and for the opposite one, we should introduce another transition. So, metabolic networks can be transformed into pure ones. Alternatively,
22
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
Fig. 6. Treatment of reversible reactions. To model a reversible reaction A + B ←→ C + D, one usually introduces a transition for each direction.
one might think of defining reversible transitions. This has, however, not been dealt with so far in the Petri net literature. Generally, Petri nets can be designed in different ways, called top-down and bottom-up. The first method supposes to start with a very generalized form of the system and then, to detail it as much as possible, until the basic units are reached. The second one starts with the “atoms”, building modules which are then joint to model the real system. Sometimes it is advantageous to combine both approaches. The importance of modularity is expressed by the ancient saying “divide et impera”. MODELLING OF EXTERNAL METABOLITES In metabolic networks, one needs to differentiate between internal and external metabolites. The former are totally produced and then consumed in the given network, while the external metabolites represent sources or sinks [Heinrich and Schuster, 1996]. Their amount is usually assumed to be constant, due to availability in large excess or well-tuned biological regulation. If one considers the given net as a part of a larger system, the external metabolites are a kind of boundary; or connection points with the remaining part, in which pathways producing or consuming these metabolites exist. An extension of the system so as to include those pathways is not useful, as the following example illustrates. Glycolysis (the well known pathway of sugar degradation [Stryer, 1995]) contains a sequence of reactions that transforms glucose into pyruvate, producing ATP. Glucose, pyruvate, ATP and also some other metabolites are usually considered “external” for this pathway. We might include, in the model, a reaction or pathway that consumes pyruvate, for example, for producing alanine. However, alanine then would be an external metabolite. The model needs to be delimited somewhere. Algebraically, the external metabolites can usually be identified also in the incidence matrix. Provided that each internal metabolite is both produced and consumed within the net, the external metabolites correspond to those rows in which all the coefficients have the same sign. The modelling of external metabolites can be done in different ways. One of them is to fill all initial places with an inexhaustible number of tokens (modelled by infinity). For the sink places, one could allow them to accumulate tokens but has to take care in computing T -invariants (see below). If it is preferred to use finite token numbers for the initial places, one could redefine the firing rule for the transitions that have initial places in their preset (see Table 1) or final places in their postset, in such a
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
(a)
23
(b)
Fig. 7. Part of nucleotide metabolism. The symbols stand for abbreviations of metabolites and enzymes usual in biochemistry. External metabolites are written in brackets. If the reactions indicated by dashed arrows are absent, the conservation relation ATP + ADP = const. holds. We do not consider R5P be part of the system in this case. A) Traditional biochemical representation. B) Petri net representation. The external metabolites were modelled with self-loops, depicted by dotted arrows.
way to not “consume” the input-place tokens and to not “produce” final-place tokens, M (p) = M (p)∀p ∈ I ∪ F
(1)
where I is the set of initial places and F is the set of final/terminal places. Note that the firing rule for the internal metabolites is M (p) = M (p) − pre(p, t) + post(t, p)∀p ∈ P − I − F.
(2)
Another possibility is to connect sink places with source places by additional transitions so that a circular flow occurs [Heiner et al., 2000]. However, it is difficult to find which places have exactly to be connected with each other, because such transitions could impose unrealistic constraints on the flow ratios. For example, one cannot regenerate carbon atoms from outgoing nitrogen atoms. A solution can be to use coloured Petri nets, in which different atom groups can be modelled by tokens of different colour. Starke [1990] proposed, as another way of description, not to include the initial and final places in the net. Thus, the boundary is made up of transitions without presets or without postsets. The initial transitions do not need any tokens to fire. In the traditional modelling of metabolic networks, a similar description is indeed sometimes used for external metabolites that are of minor importance, such as inorganic phosphate, water, protons etc. However, applying this technique to all external metabolites has the drawback that they are not made explicit so that overall molar yields cannot be computed. Here, we propose an alternative method. For each initial place, we add an arc feeding from the transition back to this place (Fig. 7) and use the firing rule (2) both for internal and external metabolites. This guarantees that the number of tokens in the initial places remains unaltered. For each final place, we add an arc feeding from this place back to the transition producing it. To guarantee that the transition can always fire, at least one token should be put in the final place at the beginning. However, one should be aware that this generates self-loops, so that the Petri net is no longer pure. Thus, as far as the external metabolites are concerned, the incidence matrix does not equal the stoichiometry matrix. This is no problem since the external metabolites are not usually included in the stoichiometry matrix [Heinrich and Schuster, 1996].
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
24
INVARIANTS IN PETRI NETS When studying a system, it is always appropriate to begin with the study of its structural invariants. They help in analysing the system’s behaviour and checking its logical properties. The same is true for Petri nets describing biochemical networks because the structural invariants do not depend on kinetic enzyme parameters, which vary due to external influences and internal fluctuations. Basically, there are two types of invariants in Petri nets: P -invariants and T -invariants [Reisig, 1985; Starke, 1990]. P -invariants (place invariants) are vectors, Y , with the property that multiplication of these vectors with any place marking reachable from a given initial marking yields the same result. If M 0 is the initial marking and m is some arbitrary marking, the relation Y T · M = Y T · M0 describes a P -invariant and is called relation of marking conservation. Taking into account consecutive markings (that are obtained by firing of only one transition), it results that Y T ·colt (C) = 0, for each transition t, where C is the incidence matrix. That means that, algebraically, these vectors are solutions of the equation Y T · C = 0.
(3)
Invariants in Petri nets correspond to basic concepts in traditional biochemical modelling. In particular, P -invariants express conservation relations for metabolites, as becomes clear in the scheme shown in Fig. 3. This net has the P -invariant ATP + ADP = const. In general, Eq. (3) is known, as for metabolic systems, as the general form of conservation relations [Horn and Jackson, 1972; Clarke, 1980, Heinrich and Schuster, 1996]. In most cases, these relations express the conservation of atom groups [Schuster and Hilgetag, 1995; Schuster and H o¨ fer, 1991]. In the example in Fig. 3, the adenosine moiety is conserved. In algebraic terms, invariants form a linear vector space. This implies that if I 1 and I2 are invariants, also c1 I1 + c2 I2 with c1 , c2 being real numbers, are invariants of the net [Reisig, 1985]. For example, if a biochemical net involves the P -invariants ATP + ADP = const. and NAD + NADH = const. [Stryer, 1995], then also ATP + ADP + 2 NAD + 2 NADH = const. is a P -invariant. Normally, one chooses invariants with the smallest integer coefficients and tries to decompose the invariants into the minimal terms (such as ATP + ADP = const.). This leads to the concept of minimal invariants (see below). In order that conservation relations reflect the conservation of atom groups, the coefficients in these relations have to be non-negative. This leads to non-negative conservation relations [Schuster and Hilgetag, 1995; Schuster and H o¨ fer, 1991]. They correspond to semi-positive P -invariants in Petri nets [Colom and Silva, 1990]. If all substances are involved in such conservation relations, the system is ´ called conservative [ Erdi and T´oth, 1989; Horn and Jackson, 1972]. This implies that a positive linear combination of all concentrations (token numbers in Petri nets) is constant in time, gi zi (τ ) = u, gi > 0 for any i (4) i
where denotes time. can be, for example, the number of some sort of atoms. In a closed system, that is, a system without external metabolites, there is always one relation of the type Eq. (4) in which represents total mass. In addition, there may be further relations of the type Eq. (4). If there is a positive linear ´ combination that increases in time, the system is called superconservative [ Erdi and T´oth, 1989]: gi zi (τ ) > gi zi (τ ), τ > τ , gi > 0 for any i (5) i
i
If the sum in Eq. (5) decreases in time until it reaches zero, the system is subconservative. The terms conservative, superconservative and subconservative have also been coined for Petri nets. The program
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
25
INA developed by Starke and coworkers [www.informatik.hu-berlin.de/∼starke/ina.html] determines whether a Petri net has one of these properties. Note that these three cases do not cover all networks. In fact, biochemical networks are usually open systems with a throughput of mass, as described by a flux between external metabolites. Therefore they may, depending on conditions, have a positive or negative mass balance, so that they usually belong to none of these classes. P -invariants are useful in checking the property of mutual exclusion. Two transitions are called to be in mutual exclusion if there is no reachable marking that allows the two transitions to fire simultaneously. The first step is to identify the marking set that could characterise the simultaneous activation of the specified transitions. This marking is reachable only if it satisfies the conservation relation given by the P -invariants. For example, if four places need to have at least one token each to enable two transitions to fire simultaneously, while the conservation sum is three, mutual exclusion occurs. Such a case is irrelevant for metabolic networks provided that the molecule numbers are large enough. Alternatively, if the token numbers represent millimoles or the like, token numbers need not be integer, so that mutual exclusion is no problem either. Often, two transitions are in mutual exclusion when they compete for the same input places set. If the tokens are indivisible and once a transition takes the existent tokens, the other transition cannot fire. If the tokens needed to reactivate the competing transitions are simultaneously regenerated for each conflict case, the net is called persistent. In this case, the two transitions do not deactivate each other. At first sight, some metabolic networks seam to be non-persistent because different enzyme reactions often compete for the same substance. However, in real metabolic nets, even if the quantity of the common resource is very small, the concurrent reactions share it, maybe in different percentage according to the various reactions rates. Until now, there are no techniques based on Petri nets that can model accurately this behaviour. Biochemical networks often reach, after some initial transient, a stationary state. More concretely, this is the case when the kinetic properties of the networks are such that the stationary state is asymptotically stable, as is often the case [Clarke, 1980; Heinrich and Schuster, 1996]. At steady state, the following equation holds: CV = 0
(6)
where V stands for the vector of net fluxes. They correspond to the flow of tokens per time in Petri nets. Special attention has to be paid to the involvement of external places in matrix C. If we connect outputs with inputs by additional transitions as explained in the previous section or additional arcs are added to create self-loops next to initial and final places, C can contain the coefficients both for internal and external places. Otherwise, it should only contain the coefficients for the internal places in order for Eq. (6) to hold true. A T -invariant (transition invariant) is a vector with the property that if each transition fires as many times as the value of the corresponding component of the vector indicates, the original marking is restored. Algebraically, these vectors are the solutions of Eq. (6). Therefore, T -invariants correspond to flux distributions in steady state. As Petri nets usually involve irreversible transitions only, all components of a T -invariant must be non-negative. T -invariants with this property are called true T -invariants. Frequently, the net direction of all biochemical reactions in a network is known, for example, because they are irreversible or have a defined biochemical function. In this case, the orientation of reactions can be chosen in such a way that all (net) fluxes are non-negative. Then, only steady-state flux distributions corresponding to true T -invariants are relevant.
26
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
A central concept in metabolic network analysis is that of elementary flux modes [Schuster and Hilgetag, 1994, Schuster et al., 2002a]. They stand for minimal sets of enzymes that could operate at steady state and are uniquely determined. That means that no other flux modes at steady state are proper subsets of the elementary modes. For a better understanding of the behaviour of biochemical systems, they can be decomposed into such simplest relevant routes. This has recently been demonstrated for sugar cane metabolism [Rohwer and Botha, 2001] and bacterial metabolism [Van Dien and Lidstrom, 2002]. In Petri net theory, elementary modes have, as counterpart, the minimal T -invariants [Starke, 1990]. The concept of elementary modes is, however, more general because reversible reactions are allowed. Colom and Silva [1990] developed an algorithm for computing minimal P -invariants. It can, by transposition of the incidence matrix, be used also for computing minimal T -invariants. It is based on row operations on the incidence matrix augmented with an identity matrix. In the course of calculation, care has to be taken to eliminate non-minimal and duplicate T -invariants. Colom and Silva [1990] propose two alternative tests to do so. A method for computing elementary flux modes based on convex analysis was proposed in [Pfeiffer et al., 1999; Schuster and Hilgetag, 1994; Schuster et al., 2000]. Although the latter algorithm was developed with different goals (Colom and Silva [1990] did not deal with metabolic networks) and completely independently of Petri net theory, the two algorithms show some similarities. However, they differ in that elementary modes can involve reversible reactions. This is taken into account by partitioning the stoichiometry matrix into “reversible” and “irreversible” submatrices. Moreover, the test for eliminating non-minimal and duplicate T -invariants (flux modes) is slightly different. For a more detailed comparison of the algorithms, see [Schuster et al., 2002a]. The T -invariants are helpful in studying several properties of Petri nets, such as consistency. This property means that there exists an initial marking and a corresponding firing sequence that regenerates the initial state and contains each transition at least once. As can be seen in the system shown in Fig. 8 with either reactions 3 and 4 completely inhibited or reactions 5 and 6 completely inhibited, not every metabolic system is consistent according to this definition. However, reactions 5 and 6 in the former case and reactions 3 and 4 in the latter case are not covered by true T -invariants. If we consider only a subnet that is covered by true T -invariants, such as transitions t 1 and t2 , it is consistent. This is because, once the minimal T -invariants (elementary flux modes) are identified, appropriate initial markings enabling these invariants to operate can be linearly combined and a new initial marking is obtained. The system can fire each above-mentioned T -invariant consecutively (the necessary resources exist due to the “construction” of the initial marking), each transition is used at least once and the initial marking is always regenerated. A further property studied for Petri nets, reversibility, means that for every marking m that can be reached from M0 , M0 can also be reached from M . It holds for metabolic networks, if some constraints are fulfilled. One constraint is that the network is covered by true T -invariants. The second constraint is that all external metabolites have enough tokens to operate all true T -invariants. The arguments read as follows: Let us denote the number of times the transitions ti have to fire in order to reach a marking m from M0 , by wi . The numbers wi are gathered in a vector W . Note that W need not fulfil Eq. (6). As the net is covered by true T -invariants, we can find a vector V that does satisfy Eq. (6) and a sufficiently large natural number such that V − W involves positive components only. This vector indicates how many times the transitions need to be fired to reach the initial marking again. SIPHONS, TRAPS, DEADLOCKS AND LIVENESS In Petri nets, special sets of places can be identified, for example, siphons, called also structural deadlocks, and traps [Reisig, 1985]. A siphon is a set of places that – once it is unmarked – remains
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
27
Fig. 8. Traps and deadlocks. Two situations are considered: either transitions t3 and t4 are operative (dashed arcs) or the transitions t5 and t6 (dash-dotted arcs). P1 and P2 , external metabolites; Si , internal metabolites.
so. A trap is a set of places that – once it is sufficiently marked – can never lose all its tokens. (It can happen that, if only some places of the trap are marked with a number of tokens smaller than a certain limit, the trap may lose all its tokens.) Clearly, any semi-positive P -invariant implies a trap because the total number of tokens is constant and can, hence, not reach zero. Moreover, superconservative subnets
28
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
form traps, while subconservative subnets form siphons. The algorithms calculating siphons and traps [Schmidt, 1996a, 1996b; Yamauchi and Watanabe, 1999; Yamauchi et al., 1996; Tanimoto et al., 1996] in nets with specific properties are based on the following alternative definitions: A siphon is a set of places having the property that its input transitions set is contained in its output transitions set. A trap is a set of places for which its output transitions set is contained in its input transitions set. A Petri net N , having m0 as initial marking, is said to be deadlock-free if for any reachable marking m, there is an enabled transition. A Petri net N , characterized by a current marking m is in deadlock if no transition is enabled to fire at marking m. Preventing deadlocks in an efficient way represents an intensely researched field [Kemper, 1993; Moody and Antsaklis, 1998; Huang et al., 2001; Chu and Xie, 1997; Varpaaniemi, 1993; Iordache et al., 2001]. Traps and structural deadlocks are interesting for biochemical modelling. Many biochemical networks have the function to produce storage substances in certain periods and consume these substances in other periods. For example, the potato plant produces starch and accumulates it in the potato tubers during growth, while starch is consumed after the tubers are deposited after the harvest. The starch and several of its precursors then form traps in the reaction net during growth, while starch and possible intermediates of degradation form siphons after the harvest. Consider the simple reaction system shown in Fig. 8. Transition t1 is always activated while t 2 only fires if at least one token exists in S 1 . Importantly, we consider t3 and t4 to be inoperative if t5 and t6 are operative in the system and vice versa. For example, the system could describe the production and degradation of starch. The internal metabolites then would be: S1 , glucose-1-phosphate, S 2 , UDP-glucose, S3 , starch [Stryer, 1995]. In the starch example, it is not necessary to consider an intermediate S 4 , while for other storage metabolites, it may be. In most cells containing starch, either the branch producing starch or the branch degrading it is functional. This is realized by complete inhibition of the appropriate enzymes. It can be easily observed that S 2 and S3 form a trap when reactions 3 and 4 are operative. Once a token arrives in S 3 , no transition able to fire exists in the system to consume this token, so it remains there independently of the later evolution of the system. In the other case, once the last tokens were extracted from S 3 and S4 , no transition able to generate a new token in these places exists, so they remain empty. This means that S 3 and S4 form a siphon. Current computer programs for simulating metabolic networks deal only partially with siphons and traps. For example, the program GEPASI developed by Pedro Mendes [1997] (http://gepasi.dbs.aber.ac. uk/softw/gepasi.html) detects all reactions that are at equilibrium in any steady state. For the example system shown in Fig. 8 with transitions 5 and 6 blocked, GEPASI would detect reactions 3 and 4 to have this status. Here, these reactions 3 and 4 are irreversible, so that no steady state can be reached. However, if the reactions were reversible, they would indeed attain thermodynamic equilibrium in any steady state. The Program METATOOL [Pfeiffer et al., 1999] (http://www.bioinf.mdcberlin.de/projects/metabolic/metatool/) indicates metabolite S 3 as a “non-balanced internal metabolite”, while it does not say anything about metabolites S 2 or S4 . It is worth including, in future refinements of simulation packages, routines for detecting all metabolites involved in traps or siphons. Another important concept is liveness [Reisig, 1985]. A transition t is said to be live if, for any marking m reachable from m0 , there is a marking m reachable from m, such that t is enabled by m . A transition t is dead at marking m if no marking m reachable from m enables t. A Petri net (N ; m 0 ) is said to be live if every transition is live. Importantly, deadlock-freeness and liveness are two different notions. Liveness means that all the system’s transitions may be repeated infinitely often, while deadlock-freeness only implies that at least a subset of these transitions may be repeated, but not necessarily all of these. It is interesting how the
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
29
concepts of liveness, deadlocks, siphons and traps are connected with each other. A net satisfies the deadlock-trap property if each non-empty siphon includes a trap and the maximal trap in each minimal deadlock is sufficiently marked. In this case, no dead marking is reachable. So, the net is deadlock-free. A further special class of Petri nets is made up of the free-choice nets. In these nets, each place has at most one output transition or the input places of the output transitions of p consist only of p for any place p belonging to the net. A free-choice net is live if and only if every non-empty siphon includes an initially marked trap. This property is also known as Commoner’s theorem [Commoner, 1972]. Siphon and trap are dual notions. A siphon in a Petri net N is a trap of the net N obtained by reversing direction of all edges of N . Therefore, the properties satisfied by siphons have counterparts for traps. Liveness and deadlock-freeness are structural properties, in which the initial marking plays, however, an essential role. An example for a net that is in deadlock is the above example A + B gives 2 B (Fig. 5) if the initial token number of B is zero. The input and output transition sets of B coincide. Therefore, {B} is simultaneously a siphon and a trap. {B} = {A,B} {B} (where {B} denotes the input places of the output transitions of B, that is, the co-substrates of B), but |A| = |{t}| = 1 and |B| = |{t}| = 1, so “A + B gives 2 B” is a free-choice net. Due to Commoner’ theorem, this net cannot be live if the token number of A is not infinite (if A is not what we called external metabolite in Section 2) and trap {B} included in siphon {B} has not at least one token. Without tokens in B, this autocatalytic reaction then does not start proceeding. In chemistry, such a situation is known as false equilibrium [Othmer, 1981]. A larger example is glycolysis, which requires 2 moles of ATP in its upper part and produces 4 moles of ATP in its lower part. If no ATP is present at the beginning, the glycolytic pathway cannot proceed. Therefore, this pathway has been said to have a turbo design [Teusink, 1998]. A test for liveness and deadlock-freeness of the net can thus help us decide whether the metabolic system can attain a situation where it is blocked. The detection of siphons and traps is instrumental for this purpose. EVALUATING THE ROLE OF TPI IN TRYPANOSOMA BRUCEI METABOLISM BY DETECTING SIPHONS AND TRAPS T. brucei is a unicellular, extracellular, eukaryotic parasite of the blood and tissue fluids of mammals. It is transmitted by tsetse flies and causes sleeping sickness in humans. The infections are lethal unless treated, but the few existing drugs have severe side-effects. Many studies [e.g. Bakker et al., 1999; Helfert et al., 2001] were focused on the carbon and free energy metabolism of this organism, which depends entirely on glycolysis. Accordingly to Scheme 1 in Helfert et al. [2001], glucose is imported into the glycosome and then converted into F-1,6-P. The two consecutive enzymes (HXK and PFK) are contracted in one step which consumes two ATP and produces two ADP. ALD converts F-1,6-P into DHAP and GA3P. These two substances are isomerised into each other by a reversible enzyme, TPI (triose-phosphate isomerase). DHAP is transformed into Gly3P by GPDH, with consumption of NADH and production of NAD+ . GAPDH1 uses NAD + to transform GA3P into BPGA and NADH. Gly3P can be either converted into glycerol by GLYK with consumption of ADP and production of ATP, or transported into the cytosol, where GPO oxidizes it to DHAP and H2O, DHAP being transported back to the glycosome. PGK uses an ADP molecule to convert BPGA in 3-PGA and ATP. 3-PGA is transported into the cytosol and converted, via 2-PGA, into PEP, which gives pyruvate and ATP, consuming one ADP. A similar system was treated by Overkamp et al. [2002]. They maximized the yield of glycerol in Saccharomyces cerevisiae using metabolic engineering. At the beginning, Helfert et al. [2001] supposed that glycolysis could proceed without TPI, producing glycerol and pyruvate in the same amount. This corresponds to an elementary mode described in Schuster
30
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
et al. [2000]. However, TPI knockout mutants turned out to be inviable. Thus, they built a kinetic model to explain the unexpected result that all system fluxes (PYR, GLYCEROL) decrease. We now give a structural explanation, thus ignoring the kinetics. In Fig. 9, the Petri net model corresponding to Scheme 1 in Helfert et al. [2001] is given. It should be noted that this network is not a free-choice net because, for example, |GLY3P| = |{t2, GLYK}|=2 and {GLY3P} = {GLY3P, ADP} {GLY3P}. The first property is known in Petri net theory as conflict, because two transitions {t2, GLYK} compete for the same resources (the tokens from place B). But in metabolic networks, the token number is large enough that the transitions in competition will simply “agree” on the tokens distribution depending on their reaction rate. Taking this aspect into account, we do not need to know the reaction rates, but we only assume that the flux through transition T2 in Fig. 9 is always greater than zero. Another important knowledge that we use is that ALD is reversible and therefore inhibited by its products [Stryer, 1995]. Let us consider the case when TPI is knocked out. t 1 = {NADH, NAD+} forms a siphon and a trap at the same time. Its input transitions set {GPDH, GAPDH} coincides with its output transitions. This means that once this set of places is sufficiently marked, it keeps its tokens. Moreover, {GPDH, GAPDH} forms also a P -invariant, their tokens sum remaining constant during the whole process. Another trap (T 2) consists of {DHAPc, DHAPg, GLY3Pc, GLY3Pg, Gly} because its input transitions set {ALD, GPDH, GPO, GLYK, t1 , t2 } includes its output transitions set {GPDH, GPO, GLYK, t 1 , t2 }. If the flux through t2 were equal to zero, Gly would accumulate, but because the flux through t 2 is greater than zero, GLY3P is partially transformed back into DHAPg. Let us start proceeding with the marking m0. Following the firing sequence {e 3 , ALD, GPDH, GAPDH, t2 , GPO, t1 , GPDH, t2 , GPO, t1 , PGK, t3 , e1 , e2 } as Table 2 illustrates, the network reaches a dead marking m1, because DHAP accumulates – no NADH being available for further converting it. This is because GPO is draining the flux, consuming NADH faster than GAPDH can produce it. This continues until the product inhibition of ALD is so strong that ALD ceases operating. Therefore, the whole system is dead, no more Gly and Pyr being produced. For deriving this result, it is informative that after deleting TPI, the three transitions t 1 , t2 and GPO are not involved in any elementary mode (minimal T -invariants) anymore. The importance of TPI can be seen if its corresponding transitions are added in the model. TPI being a reversible enzyme, two transitions, one acting forwards and one – backwards, have to be added. Of course, the T -invariant {TPI1, TPI2} has to be ignored, having no biological significance. In this new context, T2 is not any more a trap. Another minimal trap occurs in t 3 = {DHAPC, DHAPg, GLY3Pc, GLY3Pg, GLY, GA3P, BPGA, 3PGK, 2PGK, PEP, PYR} corresponding to the accumulation of GLY and PYR. Whenever DHAPg tends to accumulate, due to NADH lack, TPI1 converts a part of DHAPg tokens into GA3P. Due to the sufficient amount of NAD + , GAPDH fires with production of the necessary NADH, which gives GPDH the possibility to fire. Also if NAD + is deficient, but GA3P is sufficient, TPI2 converts GA3P into DHAPg, GPDH fires and produces the required NAD + . Taking into account almost only structural properties of the given network, especially the presence of traps, we could evaluate the role of TPI in glycolysis and glycerol production. In the next section an example taken from nucleotide metabolism will be used to illustrate the notions presented above. The program INA (www.informatik.hu-berlin.de/∼starke/ina.html) is utilized to facilitate the calculations. ILLUSTRATION OF PETRI NET PROPERTIES ON A SYSTEM EXTRACTED FROM NUCLEOTIDE METABOLISM Let us now consider the biochemical system depicted in Fig. 7A. It represents part of nucleotide metabolism, as it occurs, for example, in human liver [Stryer, 1995]. We have translated it in terms of a
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
31
Fig. 9. Petri net representation of the glycolysis metabolism of Trypanosoma brucei. T1 = {NADH, NAD+ } is simultaniously a P -invariant, a trap and a siphon. In the absence of TPI1 and TPI2 , t2 = {DHAPg, DHAPc, GLY3Pg, GLY3Pc, Gly} is a trap. If TPI1 and TPI2 act, t2 is not a trap any more, but t3 = {DHAPC, DHAPg, GLY3Pc, GLY3Pg, Gly, GA3P, BPGA, 3PGK, 2PGK, PEP, Pyr} forms a trap. We consider the case where the flux through t2 is positive and ALD is product-inhibited by DHAPg.
32
Places Firing ATP Glu F16P DHAPg DHAPc GA3P NADH NAD + GLY3Pg GLY3Pc BPGA 3PGA 2PGA PEP Pyr Gly sequence ADP 2 0 ∞ 0 0 0 0 1 0 0 0 0 0 0 0 0 e1 0 2 ∞ 1 0 0 0 1 0 0 0 0 0 0 0 0 ALD 0 2 ∞ 0 1 0 1 1 0 0 0 0 0 0 0 0 GPDH 0 2 ∞ 0 0 0 1 0 1 1 0 0 0 0 0 0 GAPDH 0 2 ∞ 0 0 0 0 1 0 1 0 1 0 0 0 0 PGK 1 1 ∞ 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 ∞ 0 0 0 0 1 0 1 0 0 0 1 0 0 t3 e2 1 1 ∞ 0 0 0 0 1 0 1 0 0 0 0 1 0 1 1 ∞ 0 0 0 0 1 0 1 0 0 0 0 0 1 e3 t2 1 1 ∞ 0 0 0 0 1 0 0 1 0 0 0 0 1 GPO 1 1 ∞ 0 0 1 0 1 0 0 0 0 0 0 0 1 1 1 ∞ 0 1 0 0 1 0 0 0 0 0 0 0 1 t1 GPDH 1 1 ∞ 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 ∞ 0 0 0 0 0 1 0 1 0 0 0 0 1 t2 GPO 1 1 ∞ 0 0 1 0 0 1 0 0 0 0 0 0 1 t1 1 1 ∞ 0 1 0 0 0 1 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
m0 . . . . . . . . . . . . . . dead marking
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
Table 2 Markings obtained during a firing sequence leading to a dead marking in the energy metabolism of T. brucei
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
33
Petri net (Fig. 7B) and then, analysed it using the program INA. For modelling the external metabolites, we have chosen to introduce a self-loop for each source and each sink and to keep the same firing rule (2) independently of the metabolite’s type. If we do not impose capacities for the internal metabolites, the net is unbounded because uridine, for example, can accumulate more and more tokens if Cdd keeps firing while Urk1 is not. Accordingly, the reachability tree is infinite. As INA has reported, the net is strongly connected, not pure (which is obviously due to the introduced self-loops), and not (sub-)conservative. There is no P -invariant, except the external metabolites on their own. Again due to the self-loops next to the external metabolites, the number of tokens in each of these places remains constant. INA reports four minimal semi-positive T -invariants. We give them here by indicating the transitions with positive components in the vectors representing these T -invariants: 1. 2. 3. 4.
Urk2, Kcy2 Cdd, Urk1, Kcy1 2 Kcy1, KPR, 2 UPP, APT Kcy1, KAD, KPR, UPP.
Note that some transitions (such as Kcy1 and UPP in the third invariant) have to fire twice, but not necessarily successively. One can see that firing all the activated transitions that belong to an invariant regenerates the initial marking. As each enzyme occurs in at least one minimal T -invariant, the net is covered by these invariants. Therefore, the net is persistent and live. For simplicity’s sake, although in biological organisms the reactions KAD and KPR are reversible, we considered them irreversible. If they are treated as reversible, care has to be taken that extra, irrelevant T -invariants, containing only KAD and KAD’, and KPR and KPR’ respectively, result (where the primed symbols denote the reverse reactions). They have to be discarded. The biochemical meaning of the minimal T -invariants can be explained as follows: (1) production of cytidine-diphosphate (CDP) from cytidine, (2) production of uridine-diphosphate (UDP) from cytidine, (3, 4) two invariants producing uridine-diphosphate (UDP) from uracil in different ways. In the invariant (3), one mole of adenine per two moles of UDP produced is formed as a by-product. This is because ATP is used as a source of the ribose moiety, which is necessary for forming UDP from uracil. Note that this invariant is not easy to determine by inspection. Moreover, it can be seen that the molar yield with respect to ATP is different for the pathways (3) and (4). While invariant (3) consumes 3 moles of ATP per mole of UDP produced, invariant (4) uses 3 moles of ATP per two moles of UDP [Schuster et al., 2002a]. All of these T -invariants correspond to the so-called salvage pathways, which serve to save nucleotides from leaving the cell and redirect them to nucleotide phosphates [Stryer, 1995]. Let us now assume that ATP and ADP are internal metabolites and that the two enzymes KPR and KAD are not expressed in a certain cell type. If we modify the network by eliminating the transitions that stand for these enzymes (and also the arcs that connect them with their neighbouring places), we do obtain a P -invariant. It can be translated in terms of a conservation relation: ATP + ADP = const. The constant would be 2 if we define the initial token numbers of ATP and ADP to be 1 each. With any conservation sum less than four, the four remaining transitions consuming ATP (Urk1, Urk2, Kcy1, and Kcy2) are in mutual exclusion. Since the net does not include transitions producing ATP or (in the second case) AMP, these substances are eventually running out, so that the places standing for ATP and AMP are deadlocks, while in the complete net, there is neither a trap nor a deadlock. To maintain a steady state, nucleotide metabolism requires permanent production of ATP, for example, by glycolysis.
34
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
DISCUSSION Petri nets provide a special formalism to describe processes in networks. In particular, they are suitable to model biochemical networks. Here, we have shown that several concepts from Petri net theory have a significance for this modelling. However, there are alternative formalisms, and it is difficult to decide which formalism is best suited. To implement the calculations on computer, one usually translates Petri nets into matrices. So one may argue that the networks could be modelled by matrices from the very beginning. Indeed, Petri nets have the advantage to provide a means of visualisation. On the other hand, biochemists use a special way of visualisation for decades [Stryer, 1995; Kanehisa and Goto, 2000] (and chemists already for centuries). Multimolecular reactions such as “A + B gives C + D + E” are represented by an arrow that has two upper ends and three lower ends. This arrow can be represented, in a formal language, as a pair of n-tuples: ((A,B), (C,D,E)). In contrast, in a normal graph as used in graph theory, edges correspond to simple pairs of nodes. In Petri nets, the representation is “disentangled” by introducing additional nodes and arcs. The above reaction would then be represented by five place nodes and one transition node (T) linked by five arcs. The arcs correspond to the following pairs of nodes: (A,T), (B,T), (T,C), (T,D) and (T,E). It is a matter of taste which representation is preferred - one pair of n-tuples or several pairs. Many concepts from Petri net theory have counterparts in traditional biochemical modelling, for example, P -invariants (conservation relations), T -invariants (flux modes), and minimal T -invariants (elementary flux modes). In metabolism, minimal T -invariants can be interpreted as biochemical pathways. Detection of these in complex networks is often not straightforward. It is helpful in determining maximal conversion yields [Rohwer and Botha, 2001; Schuster et al., 2000; Van Dien and Lidstrom, 2002]. The concepts of trap, siphon, deadlock, and liveness, among others, have not been considered in biochemical modelling so far. Here, we have shown that these are helpful to characterize special properties of metabolic networks. For example, the test for deadlock-freeness helps to determine whether a biochemical pathway can attain a false equilibrium, where it is blocked. From another point of view, this situation has been referred to as the danger of a turbo design of pathways [Teusink, 1998]. The liveness of a system indicates that all transitions are able to fire infinitely often, and the processes are not eventually restricted to a subsystem. Traps can correspond to storage metabolites that are produced during growth of an organism and steadily increase in their concentrations. We have here analysed the example of the energy metabolism in t. brucei. If the accumulations in the trap exceed a certain amount, this can cause product inhibition of some transitions (the aldolase reaction in the example), forcing the system to stop working. This result is of interest for elementary-modes analysis. It has been argued that this analysis can help assert the effects of enzyme deficiencies and knockout mutations [Klamt and Stelling, 2003; Schuster et al., 2000]. The example analysed here shows that in a deficient system, the remaining elementary modes may not be functional because of occurrence of a trap. Therefore, pathway analysis should be refined by considering traps, siphons, deadlock-freeness and liveness. Siphons can correspond to storage substances when they are gradually depleted during starvation. An analysis of traps and siphons appears to be promising in studying diseases such as obesity and hypercholesterolemia, which are related to over-accumulation of storage substances. It will be worth including the analysis of traps, siphons, deadlocks, and liveness in metabolic simulation packages. So far, in Petri net theory, transitions are always considered to be unidirectional. However, many biochemical reactions such as all isomerases are known to be reversible in that their net flow can change sign depending on the physiological state. If such a reaction is described by two oppositely directed
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
35
transitions, meaningless T -invariants arise. For example, in the scheme shown in Fig. 6, the T -invariant {T1 , t2 } occurs. In order to avoid the cancellation of such T -invariants after their computation, it will be worthwhile extending Petri net theory by allowing for reversible transitions. A property that can be checked for Petri nets is boundedness. As biochemical networks are open systems, they are not usually covered by semi-positive P -invariants; that is, they are not conservative. Nevertheless, subnets are often covered by such invariants and are, therefore, bounded. For example, if the conservation relation ATP + ADP = const. holds, one can deduce that the energy currency metabolite ATP cannot exceed a certain limit. If negative coefficients exist in the conservation relation, boundedness cannot be guaranteed even for the corresponding subnet. Beside conservative subnets, there may be superconservative subnets. Obviously, they imply unboundedness. First, there may be metabolites that are only produced by irreversible reactions but not consumed by any reaction (Fig. 8). Second, if consuming reactions exist, the catalysing enzymes may have such a low maximal velocity (saturation level) that the rate of production is higher than the rate of consumption. Here, we have focussed on topological analysis, which deals with the properties that occur from the static construction of the network. For many biological applications, such as the assignment of the metabolic function to an enzyme gene (functional genomics) [Dandekar et al., 1999; F o¨ rster et al., 2002; Selkov et al., 1997], it is sufficient to analyse these properties rather than the dynamics. The structural properties are the most representative features that one should look for. Compared to kinetic parameters of enzymes, they are constant in time and often much better known. Thus, reaction stoichiometries are easier to get from databases [Kanehisa and Goto, 2000; Selkov et al., 1996]. Topological analysis, (in particular, the computation of invariants) constitutes the basis for the simulation of the dynamics of the system. ACKNOWLEDGEMENTS We would like to thank Dr. Barbara Bakker (Amsterdam) for drawing our attention to the special properties of TPI mutants in T. brucei and Drs. I. Koch and P. H. Starke (Berlin) for helpful discussions. Financial support by the DFG to both authors is gratefully acknowledged. REFERENCES • Alla, H. and David, R. (1998). Continuous and hybrid Petri nets. J. Circ. Syst. Comp. 8, 159-188. • Bakker, B., Walsh, M. C., ter Kuile, B. H., Mensonides, F. I. C., Michels, P. A. M., Opperdoes, F. R. and Westerhoff, H. V. (1999). Contribution of glucose transport to the control of the glycolytic flux in Trypanosoma brucei. Proc. Natl. Acad. Sci. USA, 96, 10098-10103. • Chu, F. and Xie, X.-L. (1997). Deadlock analysis of Petri nets using siphons and mathematical programming. IEEE Trans. on Robotics and Automation, 13, 793-804. • Clarke, B. L. (1980). Stability of complex reaction networks. Adv. Chem. Phys. 43, 1-216. • Colom, J. M. and Silva, M. (1990). Convex geometry and semiflows in P/T nets. A comparative study of algorithms for computation of minimal P-semiflows. In: Advances in Petri Nets, Rozenberg, G. (ed.), Springer, Berlin, pp.79-112. • Commoner, F. (1972). Deadlocks in Petri nets. Technical report, Applied data research Inc. Wakefield, Massachusetts. • Dandekar,T., Schuster, S., Snel, B., Huynen, M. and Bork, P. (1999). Pathway alignment: Application to the comparative analysis of glycolytic enzymes. Biochem. J. 343, 115-124. ´ • Erdi, P. and T´oth, J. (1989). Mathematical Models of Chemical Reactions. Manchester University Press, Manchester. • Fell, D. A. and Wagner, A. (2000). The small world of metabolism. Nat. Biotechnol. 18, 1121-1122. • F¨orster, J., Gombert, A. K. and Nielsen, J. (2002). A functional genomics approach using metabolomics and in silico pathway analysis. Biotechnol Bioeng. 79, 703-712.
36
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory • Genrich, H., K?ffner, R. and Voss, K. (2001). Executable Petri net models for the analysis of metabolic pathways. Int. J. STTT 3, 394-404. • Heiner, M., Koch, I. and Schuster, S. (2000). Using time-dependent petri nets for the analysis of metabolic networks. In: Modellierung und Simulation Metabolischer Netzwerke, Preprint No. 10, Hofest¨adt R., Lautenbach, K. and Lange, M. (eds), Faculty of Computer Science, University of Magdeburg, pp. 15-21. • Heiner, M., Koch, I. and Voss, K. (2001). Analysis and simulation of steady states in metabolic pathways with Petri nets. In: CPN ’01 – Third Workshop and Tutorial on Practical Use of Coloured Petri Nets and the CPN Tools, Jensen, K. (ed), University of Aarhus, Denmark, pp. 15-34. • Heinrich, R. and Schuster, S. (1996). The Regulation of Cellular Systems. Chapman and Hall, New York. • Helfert, S., Estevez, A. M., Bakker, B., Michels, P. and Clayton, C. (2001). Roles of triosephosphate isomerase and aerobic metabolism in Trypanosoma brucei. Biochem. J. 357, 117-125. • Hofest¨adt, R. (1994). A petri net application to model metabolic processes. Syst. Anal. Mod. Simul. 16, 113-122. • Horn, F. and Jackson, R. (1972). General mass action kinetics. Arch. Rational Mech. Anal. 47, 81-116. • Huang, Y., Jeng, M. D., Xie, Z. and Chung, S. (2001). Deadlock prevention policy based on Petri nets and siphons. International Journal of Production Research 39, 283-305. • Iordache, M. V., Moody, J. O. and Antsaklis, P. J. (2000). Automated synthesis of liveness enforcing supervisors using Petri nets, Technical Report isis-00-004, Dept. of Electrical Engr., Univ. of Notre Dame. • Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. and Barab´asi, A. L. (2000). The large-scale organization of metabolic networks. Nature 407, 651-654. • Kanehisa, M. and Goto, S. (2000). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27-30. • Kemper, P. (1993). Linear time algorithm to find a minimal deadlock in a strongly connected free-choice net, In: Proc. 14th International Conference Application and Theory of Petri Nets, Chicago, M. Ajmone-Marsan (ed.), Springer, LNCS 691, 319-338. • Klamt, S. and Stelling, J. (2003). Two approaches for metabolic pathway analysis? Trends Biotechn. 21, 64-69. • K¨uffner, R., Zimmer, R. and Lengauer, T. (2000). Pathway analysis in metabolic databases via differential metabolic display (DMD). Bioinformatics 16, 825-836. • Leiser, J. and Blum, J. J. (1987). On the analysis of substrate cycles in large metabolic systems. Cell Biophys. 11, 123-138. • Liao, J. C., Hou, S.-Y. and Chao, Y.-P. (1996). Pathway analysis, engineering and physiological considerations for redirecting central metabolism. Biotechnol. Bioeng. 52, 129-140. • Matsuno, H., Doi, A., Nagasaki, M. and Miyano, S. (2000). Hybrid Petri net representation of gene regulatory network. Pac. Symp. Biocomput. 5, 341-352. • Mendes, P. (1997). Biochemistry by numbers: simulation of biochemical pathways with Gepasi 3. Trends Biochem. Sci. 22, 361-363. • Moody, J. O. and Antsaklis, P. J. (1998). Deadlock avoidance in Petri nets with uncontrollable transitions. Proceedings of 1998 American Control Conference, Philadelphia, pp. 24-26. • Oliveira, J. S., Bailey, C. G., Jones-Oliveira, J. B., Dixon, D. A., Gull, D. W. and Chandler, M. L. (2003). A computational model for the identification of biochemical pathways in the Krebs cycle. J. Comput. Biol. 10, 57-82. • Othmer, H. G. (1981). The interaction of structure and dynamics in chemical reaction networks. In: Modelling of Chemical Reaction Systems, Ebert, K. H., Deuflhard, P. and J¨ager, W. (eds), Springer, Berlin, p. 2. • Overkamp, K. M., Bakker, B. M., K¨otter, P., Luttik, M. A. H., van Dijken, J. P. and Pronk, J. T. (2002). Metabolic Engineering of glycerol production in Saccharomyces cerevisiae. Appl. Environm. Microbiol. 68, 2814-2821. • Peleg, M., Yeh, I. and Altman, R. (2002). Modeling biological processes using workflow and Petri net models. Bioinformatics 18, 825-837. • Pfeiffer, T., S´anchez-Valdenebro, I., Nu˜no, J. C., Montero, F. and Schuster, S. (1999). METATOOL: For studying metabolic networks. Bioinformatics 15, 251-257. • Price, N. D., Papin, J. A. and Palsson, B. O. (2002). Determination of redundancy and systems properties of the metabolic network of Helicobacter pylori using genome-scale extreme pathway analysis. Genome Res. 12, 760-769. • Reddy, V. N., Liebmann, M. N. and Mavrovouniotis, M. L. (1996). Qualitative analysis of biochemical reaction systems. Comput. Biol. Med. 26, 9-24. • Reisig, W. (1985). Petri Nets: An Introduction. Springer, Berlin. • Rohwer, J. M. and Botha, F. C. (2001). Analysis of sucrose accumulation in the sugar cane culm on the basis of in vitro kinetic data. Biochem. J. 358, 437-445. • Schmidt, K. (1996a). How to calculate symbolically siphons and traps of algebraic Petri nets. Technical Report, Helsinki University of Technology, A39, 1-40. • Schmidt, K. (1996b). Siphons and traps for algebraic Petri nets. Proc. Workshop CS&P, Berlin, 157-168. • Schuster, S. and Hilgetag, C. (1994). On elementary flux modes in biochemical reaction systems at steady state. J. Biol. Syst. 2, 165-182.
I. Zevedei-Oancea and S. Schuster / Topological Analysis of Metabolic Networks Based on Petri Net Theory
37
• Schuster, S. and Hilgetag, C. (1995). What information about the conserved-moiety structure of chemical reaction systems can be derived from their stoichiometry? J. Phys. Chem. 99, 8017-8023. • Schuster, S. and H¨ofer, T. (1991). Determining all extreme semi-positive conservation relations in chemical reaction systems. A test criterion for conservativity. J. Chem. Soc. Faraday Trans. 87, 2561-2566. • Schuster, S., Fell, D. A. and Dandekar, T. (2000). A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks. Nat. Biotechnol. 18, 326-332. • Schuster, S., Pfeiffer, T., Moldenhauer, F., Koch, I. and Dandekar, T. (2002a). Exploring the pathway structure of metabolism: decomposition into subnetworks and application to Mycoplasma pneumoniae. Bioinformatics 18, 351-361. • Schuster, S., Hilgetag, C., Woods, J. H. and Fell, D. A. (2002b). Elementary flux modes in biochemical reaction systems: Algebraic properties, validated calculation procedure and example from nucleotide metabolism. J. Math. Biol. 45, 153-181. • Selkov, E., Basmanova, S., Gaasterland, T., Goryanin, I., Gretchkin, Y., Maltsev, N., Nenashev, V., Overbeek, R., Panyushkina, E., Pronevitch, L., Selkov, E. Jr. and Yunus, I. (1996). The metabolic pathway collection from EMP: The enzymes and metabolic pathways database. Nucleic Acids Res. 24, 26-28. • Selkov, E., Maltsev, N., Olsen, G. J., Overbeek, R. and Whitman, W. B. (1997). A reconstruction of the metabolism of Methanococcus jannaschii from sequence data. Gene 197, GC11-GC26. • Starke, P. H. (1990). Analyse von Petri-Netz-Modellen. B. G. Teubner, Stuttgart. • Stryer, L. (1995). Biochemistry. Freeman, New York. • Tanimoto, S., Yamauchi, M. and Watanabe, T. (1996). Finding minimal siphons in general Petri nets. IEICE Trans. on Fundamentals in Electronics, Communications and Computer Science, E79-A, 1817-1824. • Teusink, B., Walsh, M. C., van Dam, K. and Westerhoff, H. V. (1998). The danger of metabolic pathways with turbo design. Trends Biochem Sci. 23, 162-169. • Van Dien, S. J. and Lidstrom, M. E. (2002). Stoichiometric model for evaluating the metabolic capabilities of the facultative methylotroph Methylobacterium extorquens AM1, with application to reconstruction of C(3) and C(4) metabolism. Biotechnol Bioeng. 78, 296-312. • Varpaaniemi, K. (1993). Efficient detection of deadlock in Petri nets. Licentiate’s thesis, Helsinki University of Technology, Department of Computer Science and Engineering, Digital Systems Laboratory. • Yamauchi, M. and Watanabe, T. (1999). Time complexity analysis of the minimal siphon extraction problem of Petri nets. IEICE Trans. on Fundamentals of Electronics, Communications and Computer Sciences, E82-A, 2558-2565. • Yamauchi, M., Tanimoto, S. and Watanabe, T. (1996). Finding a minimal siphon containing specified places in a general Petri net. IEICE Trans. on Fundamentals in Electronics, Communications and Computer Science, E79-A, 1825-1828.
38
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2003, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-38
Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell Ming Chen∗ and Ralf Hofest¨adt Bioinformatics/Medical Informatics, Technical Faculty, University of Bielefeld, Bielefeld, Germany
ABSTRACT: A method to exploit hybrid Petri nets (HPN) for quantitatively modeling and simulating gene regulated metabolic networks is demonstrated. A global kinetic modeling strategy and Petri net modeling algorithm are applied to perform the bioprocess functioning and model analysis. With the model, the interrelations between pathway analysis and metabolic control mechanism are outlined. Diagrammatical results of the dynamics of metabolites are simulated and observed by implementing a HPN tool, Visual Object Net ++. An explanation of the observed behavior of the urea cycle is proposed to indicate possibilities for metabolic engineering and medical care. Finally, the perspective of Petri nets on modeling and simulation of metabolic networks is discussed. KEYWORDS: Metabolic network, gene regulation, Petri nets, quantitative model, urea cycle, modeling and simulation
INTRODUCTION With the success of the human genomic project, we have experienced increasing floods of data, both in terms of volumes and in terms of new databases and new types of data. More and more experimental data both on the genetic and cellular level are systematically collected and stored in specific databases that are also available to public via the Internet [Baxevanis, 2003]. Some well-known databases are: gene sequence (e.g. GenBank, EMBL, DDBJ), protein (e.g. SWISS-Prot, PIR, BRENDA), biochemical reactions (e.g. KEGG, WIT/MPW), transcription factors (TRANSFAC) and signal induction reactions (e.g. CSNDB, TRANSPATH, GeneNet). In the post-genomic era, the focus is now shifting to the so called “from the sequence to the function”, i.e., in addition to completing genome sequences, we are learning about gene expression patterns and protein interactions on the genomic scale. Undoubtedly, analysis of metabolic networks is becoming a promising field, which requires providing new algorithms and tools to fulfill this task. The study of gene regulated metabolic networks plays an important role in the detection of genetic/metabolic defects as well as drug research. Genetic/metabolic defects often lead to metabolic blockades, resulting in metabolic diseases. Many inborn errors of metabolism result from a single gene encoded enzyme deficiency. Regarding drug research, it is necessary to first understand the reaction pathways that are affected by the drug, directly and indirectly, and to know the effect of the modification ∗
Corresponding author: Ming Chen, Bioinformatics/Medical Informatics, Technical Faculty, University of Bielefeld, Postfach 10 01 31, D-33501 Bielefeld, Germany. E-mail:
[email protected].
M. Chen and R. Hofest¨adt / Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell
39
of specific reaction steps on reaction networks. For instance, it is known that hyperammonemia is a hereditary disease concerning the urea cycle, resulting from an enzyme deficiency. Ornithine transcarbamylase (OTC), a deficiency of an X-linked enzyme disorder of urea synthesis, leads to a disease whose clinical manifestations include lethargy, coma, and cerebral edema. The identification of sites in a metabolic pathway that result in such diseases, caused by inborn errors in metabolism, would be useful so that the relevant enzyme or metabolite could be substituted or suitably modified. In addition, simulating the related complex metabolic networks will additionally help to understand the impact of various factors (e.g. enzyme insufficiency, metabolic blockade, drugs effects, etc.) on metabolic systems. This is particularly useful in the pharmaceutical industry for designing site-directed drugs to target mutant enzymes. In order to understand the molecular logic of the cell, methods of modeling and simulation are of importance. Different models are available in the literature, commonly falling into two different categories: the descriptive models (discrete approaches) and the analytical models (differential equations). Traditionally, metabolic pathways are regarded as coherent sets of enzymatic reactions and can be interpreted as relational graphs. Each node represents a metabolite and each edge represents a biochemical reaction that is catalyzed by a specific enzyme. Kohn and Letzkus [1983] expanded the graph theory by a specific function that allowed the modeling of dynamic processes. Then, the first application of Petri net on modeling metabolic pathways was introduced by Reddy et al. [1993]. In contrast to naive graphs, Petri net is a graphically oriented language of design, specification, simulation and verification of systems. It offers a formal way to represent the structure of a discrete event system, simulate its behavior, and draw certain types of general conclusions on the properties of the system. Ordinary Petri net models do not have such functions as quantitative aspects, so there are some extension of Petri nets that can support dynamic change, task migration, superimposition of various levels of activities and the notion of mode of operations. Various extensions of PNs, such as (Stochastic) Timed PNs [Wang, 1998; Wang, 1999], Colored PNs [Kurt, 1997], Predicate/Transition Nets [Genrich, 1987] and Hybrid PN [David and Alla, 1992], allow for qualitative and/or quantitative analyses of resource utilization, effect of failures, and throughput rate. Hofestaedt and Thelen [1998] also presented an extension formalization, a self-modified Petri net, which allows the quantitative modeling of regulatory biochemical networks. During the last years some more papers appeared [Genrich et al., 2001; Goss and Peccoud, 1999; Hofestaedt et al., 2000a; Matsuno et al., 2000; Matsuno et al., 2001], indicating that the Petri net methodology seems to be useful in modeling and simulation of metabolism. We are motivated to exploit the methodology of Petri net to model gene regulated metabolic networks in the cell, explain the importance of sustaining core research, and identify promising opportunities for future research. HYBRID PETRI NETS We suppose that readers have some background knowledge of Petri nets, otherwise, they are strongly referred to read W. Reisig’s Petri Nets: An Introduction [Reisig, 1985] or some basic reference books at Petri nets world at http://www.daimi.au.dk/PetriNets/ where a large amount of investigations on Petri nets have been compiled in the literature, and various applications have chosen Peti nets as their control models due to the intuitively understandable graphical notation of Petri nets. Herewith a brief description of hybrid Petri nets is presented as the following context. Definition 1: A hybrid Petri net is a six tuple Q = (P , T , Pre, Post, h, M ) such that: P = {P1 , P2 , . . . , Pn } is a finite, not empty, set of places;
40
M. Chen and R. Hofest¨adt / Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell
T = {T1 , T2 , . . . , Tm } is a finite, not empty, set of transitions; P ∩ T = Φ, i. e. the sets P and T are disjointed; h : P ∩ T → {D, C}, called “hybrid function”, indicates for every node whether it is a discrete node (sets P D and T D ) or a continuous node (sets P C and T C ); P re : P × T → R+ or N, is the input incidence mapping (R + denotes the set of positive real numbers, including zero, and N denotes the set of natural numbers); P ost : P × T → R+ or N is the output incidence mapping; M : P → R+ or N is the initial marking. We denote by M(t) = (mt1 , mt2 , . . . , mtn ) the vector which associates with each place of P its marking at the instant t. M0 = M(t0) = (m01 , m02 , . . . , m0n ) is the initial marking. At any time the present marking M is the sum of two markings M r and M n , where M r is the reserved marking and M n is the non-reserved marking. If h(Pi ) = D or C then, mi (t) = mri (t) + mni (t). When a variable dTj (called the delay time of Tj ) is assigned to each discrete transition T j (h(Tj ) = D), Tj is fired at time t+ dTj ⇒ ∀Pi ∈o Tj (o Tj denotes the set of input places of transition T j ), mi (t) > Pre(Pi , Tj ), mi (t + dTj ) = mi (t) − P re(Pi , Tj ) ∀Pi ∈ Tjo (Tjo denotes the set of output places of transition T j ), mi (t + dTj ) = mi (t) + P ost(Pi , Tj ) When a variable vTj (called the speed of Tj ) is assigned to each continuous transition T j (h(Tj ) = C ), Tj is fired at time t during a delay dt ⇒ ∀Pi ∈o Tj , mni (t) > P re(Pi , Tj ), mi (t + dt) = mi (t) − vj (t)Pre (Pi , Tj )dt ∀Pi ∈ Tjo , mi (t + dt ) = mi (t) + vj (t)Post (Pi , Tj )dt where vj (t) is the instantaneous firing flow of T j at time t. When the concept of that an inhibitor arc of weight r from a place P i to a transition Tj allows the firing of Tj only if the marking of Pi is less than r is used in a hybrid Petri net, we can extend the above-defined hybrid Petri net. If the inhibitor arc has its origin at a discrete place and has a weight r = 1, the corresponding transition can be fired only if m i > 1, actually, only if mi = 0 since mi is an integer. If the origin place is continuous, then a conventional value 0 + is introduced to represent a weight infinitely small but not nil. The new definition of an extended hybrid Petri net is similar to the definition of a hybrid Petri net (Def. 1), except that: One can have, in addition, inhibitor arcs; The weight of an arc (inhibitor or ordinary) whose origin is a continuous place takes its value in + R ∪ {O+ } instead of R+ ; The marking of a continuous place takes its value in R + ∪ {O + } instead of R+ . So far, the defined hybrid Petri net turns to be a flexible modeling process that makes sense to model biological processes, by allowing places using actual concentrations and transitions using functions. In this paper, we exploit a hybrid Petri net tool named Visual Object Net++ (VON++) to model and simulate gene regulated networks. VON++ is a small, quick, uncomplicated and intuitive Petri net tool that supports both discrete event and timed event/conditin PN. Documentations to this tool can be downloaded via its web site at http://www.systemtechnik.tu-ilmenau.de/∼drath/visual.htm. Figure 1 shows the basis elements of VON++, discrete place, continuous transition, continuous place and discrete transition connected with test arc, normal arc and inhibitor arc, respectively.
M. Chen and R. Hofest¨adt / Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell
41
Fig. 1. Elements of VON++.
Fig. 2. Presentation of transition rate in continuous systems.
Fig. 3. Presentation of intermediate reaction rate.
The discrete transition is the active element in discrete event Petri nets. Transition can fire if all places connected with input arcs contain equal or more tokens than the input arcs specify. It can be assigned with a delay time. The continuous transition differs from the traditional the discrete transition; its activity is not comparable with the abrupt firing of discrete transition. The firing speed assigned to a continuous transition describes its firing behavior. It can be a constant or a function, i.e. transport of tokens according to v(t), in Fig. 1, v(t) = 1. The rates of bioprocesses are not defined within a Petri net, they should be specified separately. In automated control systems represented by Petri nets, execution of transitions usually depends on the presence of a specific number of tokens in all staring places. However, in most chemical and biological systems the rate of a process (transition) is defined by the mass action law. The change of tokens (or concentration) is proportional to the number of tokens (or concentration) in all starting places as expressed in the Fig. 2. V is the rate of firing of the transition; k is a constant (called a rate coefficient in chemical kinetics); m3 , m4 are the concentration of place S 1 and place S 2 . Coefficient k varies with temperature, pressure, solvent, and other factors. As a result, v will become a function of several variables. Figure 3 indicates how token number (or concentration) of reaction intermediate P 1 changes. Traditionally, kinetics has been taught in biochemistry courses in terms of the steady-state kinetics. This corresponds to a detailed study of the local properties of individual enzymes. However, one can go further and create kinetic models of the whole pathway. Such models are composed of coupled ordinary differential (for time courses) or algebraic (for steady states) equations. These equations are non-linear
42
M. Chen and R. Hofest¨adt / Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell Table 1 A comparison of metabolic simulators with Petri nets approach Tools stoichiometry matrix presentation Core algorithm and method Pathway DB retrievable Pathways graphic editor Kinetic types Virtual cell model Simulation graphic display Mathematical model accessible and modifiable Data XML export User interface Programming language
Gepasia + MCA − − +++ − +++ + ∗
++ C++
Jarnacb + MCA − + ++ − ++ + SBML∗ + Delphi 5
DBsolvec + MCA WIT/MPW, EMP + ++ − + ++
E-Celld + SRM, MCA KEGG, EcoCyc − + + + +
++ C++
+ C++
∗
∗
a
Gepasi (http://www.gepasi.org/); Jarnac (http://members.lycos.co.uk/sauro/biotech.htm); c Dbsolve (http://homepage.ntlworld.com/igor.goryanin/); d E-Cell (http://www.e-cell.org/); ∗ SBML (Systems Biology Markup Language) (http://www.sbw-sbml.org/) is a description language for simulations in systems biology. It is oriented towards representing biochemical networks that are common in research on a number of topics, including cell signaling pathways, metabolic pathways, biochemical reactions, gene regulation, and many others. SBML is the product of close collaboration between the teams developing BioSpice (http://biospice.lbl.gov/), Gepasi, DBSolve, E-Cell, Jarnac, StochSim (http://www.zoo.cam.ac.uk/comp-cell/StochSim.html) and Virtual Cell (http://www.nrcam.uchc.edu/). b
and most often without analytical solutions. This means that they can only be studied through numerical algorithm, such as the Newton method for solving non-linear equations and numerical integrators. After many years of development, now Petri nets have a mature mathematical algorithm and can solve NAEs and ODEs and stoichiometric matrices. But biochemical systems are also rich in time scales and thus require sophisticated methods for the numerical solution of the differential equations that describe them. MATHEMATICAL METHODS AND ALGORITHMS Kinetic models of metabolic networks are becoming imperative not only the knowledge of more and more metabolic pathways has been acquired, but also the complexity of metabolic pathways requires an analytical and quantitative solution. Kinetic models can provide us spatiotemporal scale approaches and serve to check the consistency of metabolic theories with observed behaviors. Related simulation environments Many attempts have been made to simulate molecular processes in both cellular and viral systems. Several software packages for quantitative simulation of biochemical/metabolic pathways, based on the numerical integration of rate equations, have been developed. Table 1 shows a comparison of the most well-known metabolic simulation systems. Each tool possesses some prominent features while others little or no present. After a decade’s development, Gepasi is widely used both for research and education purposes to simulate biochemical systems due to its powerful simulation engine and user-friendly interface. Jarnac, as a replacement of SCAMP, has a nice pathway graphic editor, called Jdesigner, enabling users to interactively draw a biochemical network. It has an SBW interface (System Biology Workbench), providing simulation capabilities for alternative clients. DBsolve is good at model analysis and optimization. By using
M. Chen and R. Hofest¨adt / Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell
43
numerical procedures for the integration of ODEs and/or NAEs to describe the dynamics of these models, DBsolve offers explicit solver, implicit solver and bifurcation analyzer. The primary focus of E-Cell is to develop a framework for constructing simulatable cell models based on the gene sets that are derived from completed genomes. Contrast to other computer models that are being developed to reproduce individual cellular processes in detail, E-CELL is designed to “paint a broad-brush picture of the cell as a whole”. There is another program, named DynaFit (http://www.biokin.com/dynafit/), which is also useful in the analysis of complex reaction mechanism. In predicting cell behavior, the simulation of a single or a few interconnected pathways can be useful when the pathway being studied is relatively isolated from other biochemical processes. However, in reality, even the simplest and best studied pathway, such as glycolysis, can exhibit a complex behavior due to the high connectivity of metabolites. In fact, the more interconnections exist among different parts of a system, the harder it gets to predict how the system will react. Moreover, simulations of metabolic pathways alone cannot account for the longer time-scale effects of processes such as gene regulation, cell division cycle and signal transduction. When the system reaches a certain size, it will become unmanageable and non-understandable unless with decomposition of modules (hierarchical models) and/or presentation of graphs. In this sense, tools mentioned above appear faint. In comparison, Petri nets capture the basic aspects of concurrent systems of metabolism both conceptually and mathematically. The major advantages of Petri nets comprise a graphical modeling representation with sound mathematical background making it possible to analyze and validate the qualitative characteristics and quantitative behavior of a concurrent system, and to describe the system on different levels of abstraction (hierarchical models). In addition, the development of computer technology enables Petri net tools to have more friendly interfaces and possibility of standard data import/export supporting. We are motivated to exploit the Petri net methodology to model and simulate gene regulated metabolic networks. The normal discrete system is easy to understand, so we emphasize here on the continuous one that proves useful to dynamic systems. We will describe some mathematical formulations that occur frequently in biology models. A general differential equation for a single state variable is dx/dt = flowin – flowout, while the expressions for the flowin and flowout can be quite complex as every bioprocess gives rise to its own system of differential equations involving many dependent variables (species concentrations) and many free parameters (reaction rate constants). The mass action law assumes that particles move incessantly. However, cellular metabolites are not like gas molecules. A metabolic reaction is very complex; interaction delay or saturation effect often exists in a metabolic system. In these cases, the mass action law becomes violated and should be replaced by equations that better describe the metabolic interaction while the rest of the algorithm remains the same. PN model of metabolic reactions In biochemistry, the most commonly used expression that relates the enzyme catalyzed formation rate of the product to the substrate concentration is the Michaelis-Menten equation, which is given as v = vmax · S /Km + S . An example of its Petri net model and simulation result is shown as below. It is clear that such an enzymatic reaction is characterized by these two parameters: V max and Km , and biochemists are interested in determining these parameters from experiments. Fortunately, there are several biochemical reaction databases available for public such as BRENDA that provides enzymatic reaction kinetics. However, only for a subset of the well known pathways those kinetic parameters are complete, and moreover an enzymatic reaction can be affected by the presence of other compounds, i. e., the simplest form of the Michaelis-Menten equation does not account for the higher than first order
44
M. Chen and R. Hofest¨adt / Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell
Fig. 4. Petri net model of a simple enzyme catalyzed biochemical reaction (Michaelis-Menten reaction).
substrate concentration dependence found in many allosteric enzymes. In the first case, we can introduce a general function v = Kapp · S to meet the lack of unknown parameters, where K app is the apparent rate constant. As we know the Michaelis-Menten equation is only valid when the concentrations of substrate and enzyme meet the precondition that [E] is not less than 0.001[S]. Considering the effect of enzyme concentration on the reaction rate in case the enzyme is sensitively regulated, i.e., the enzyme concentration is a variable of the model, the Michaelis-Menten equation can be written as v=
kcat · E · S vmax · S = , Km + S Km + S
where kcat is known as turnover number. When there is more than one substrate involved in an enzymatic reaction, and its kinetic type is unknown, one gets processes more complicated than we discussed in the previous section. As the Michaelis-Menten equation is obviously invalid at this time, we simply apply the following function: v = vmax ·
n i−1
Si . kmi + Si
For instance, given a two-substrate biochemical reaction, v=
vmax · S1 · S2 . (Km1 + S1 ) · (Km2 + S2 )
Fortunately, if a two or more substrate biochemical reaction is already determined as one of the kinetic types such as competitive kinetics, Ping-Pong kinetics, etc. and available in the literature, the corresponding function is recommended to employ. Model of Genetic regulatory networks Although metabolic reactions determine anabolism and catabolism, the regulation of metabolism is mainly based on the regulation of gene expression because this determines whether a protein is present to carry out its particular metabolic reaction. Gene regulatory networks are the on-off switches and rheostats of a cell operating at the gene level. Based on interactions between genes and proteins and reactions of genes and proteins, they dynamically orchestrate the level of expression for each gene in the genome by controlling whether and how vigorously that gene will be transcribed into RNA. Each RNA transcript then functions as the template for synthesis of a specific protein by the process of translation.
M. Chen and R. Hofest¨adt / Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell
45
Process of gene regulatory networks is not restricted to the level of transcription, but also may be carried out at the levels of translation [Pyronnet et al., 1996], splicing [Yao et al., 1996], posttranslational protein degradation [Hochstrasser, 1996], active membrane transport [Weissmuller and Bisch, 1993], and other processes. In addition, such networks often include dynamic feedback loops that provide for further regulation of network architecture and output. Building complete kinetic models of genetic regulatory systems requires detailed knowledge on reaction mechanisms, often, the following steps ad hoc are considered: 1. The gene (DNA) is transcribed into RNA by the enzyme RNA polymerase. 2. RNA transcripts are subjected to post-transcriptional modification and control: rRNA transcript cut into appropriate size classes and initial assembly in nuclear organizer; tRNA transcript folds into shape; mRNA transcripts are modified, noncoding sequences (introns) removed from interior of transcript; in eukaryotes, all RNA types must move to the cytoplasm via the nuclear membrane pores. 3. Then mRNA molecules are translated by ribosomes (rRNA + ribosomal proteins) that match the 3-base codons of the mRNA to the 3-base anticodons of the appropriate tRNA molecules. 4. Finally, newly synthesized proteins are often modified after translation (post-translation) before carrying out its function, which may be transporting oxygen, catalyzing reactions or responding to extracellular signals, or even directly or indirectly binding to DNA to perform transcriptional regulation and thus forming a closed feedback loop of gene regulation. However, at present time, the information of the bioprocesses from genes to the gene-encoded products is still unclear or unavailable. So far, we can regard the unknown part as a black box of transition (one transition that can be visualized as the representation of a part of Petri net) and simplify the whole procedure as a higher level of abstraction (Fig. 5): This modification does not involve changing the structure of the complete net and any modification to this subnet is reflected in the behavior of the original transition. Therefore, Petri net models are extensible and ways of plug-in concept, they can be extended without significant deviation from the existing structure. As to model gene regulatory networks quantitatively, the state equations of the following form are used to model bioprocesses such as activation of proteins, binding of proteins to genes, binding of RNA polymerase and so on. dstate If state[i](condition), then dt [i] = state[i] (consequence) For example, the concentration of the gene product is state [i] . The condition contains regulatory terms for this gene and describes whether the gene is being expressed or not. It depends on the state of the cell, and may contain models for promoters, enhancers, other proteins, nucleic acid, etc. The consequence then describes the result of condition changing, here, the rate of gene expression. So the differential mass balances describing the concentration of mRNA and of the encoded protein can be given as: If (∃ (Gene, transcriptional factor(s), RNA nucleotides, binding of RNA polymerase, etc.) not (Repressors, etc.)) ]] = [mRNA](GPC, mRNA) = Then (transcription is initiated and mRNA is produced, d[mRNA dt kis [GPC] − kd [mRNA]) If (Modified mRNA, tRNA, initiation factor(s), amino acid, binding of ribosome, etc.) [P] = [P](P, mRNA) = k [mRNA]−k [P ]−k [P ]) Then (the gene-encoded protein is synthesized, d[dt r tl d Where kts and ktl are the rates of transcription and translation respectively, k d is the rate of degradation
46
M. Chen and R. Hofest¨adt / Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell
Fig. 5. Petri net model simplification.
and kr is the rate of consumption of biochemical reaction. GPC denotes the concentration of the binding complex of gene, TFs, RNA ploymerase etc. DNA is a stable molecule, but mRNA and proteins are constantly being degraded by cellular machinery and then to be recycled. Specifically, mRNA is degraded by a ribonuclease (RNase), which competes with ribosomes to bind to mRNA. If a ribosome binds, the mRNA will be translated, if the RNase binds, the mRNA will be degraded. Proteins are degraded by cellular machinery including proteasomes signaled by ubiquitin tagging. Protein degradation is regulated by a variety of more specific enzymes (which may differ from one protein target to another). In practice, the first-order rate constant of degradation kd often is replaced by a half life H , and the degradation rate 0.693 is expressed as dC dt = − H C , where H = 0.693/kd . mRNAs have specific half-lives ranging from hours to days. Regarding the model of binding procedures which also are common phenomena in signal transduction, say, – – – –
converting inactive proteins into active proteins, and vice versa. binding of proteins to genes, proteins binding of RNA polymerase to genes and gene-protein complex binding of receptors to transcription factors a general model [Complex] = Kb · ni−1 [Ai ], where Kb is the binding constant, is presented for systems consisting of one subject A i binding with other subjects.
M. Chen and R. Hofest¨adt / Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell
47
As in many situations, the information of gene regulatory pathway and mechanism is not available and one needs to take recourse to more approximate models. In this sense, the discrete model will be favorable. Diffusion transportation Most of the models deal with the amount of metabolites in a cell. In the simplest case, we may be able to assume that the cell is a “well-mixed pool”, i.e., the amount of metabolites, enzymes, etc. is uniform across the cell. In many situations, however, concentration gradients exist which will affect the local rate of biochemical reactions, in particular for large systems and different compartments, we must consider explicitly the effect of diffusion or transportation. In general, if concentration gradients exist within the spatial scale of interest it is highly likely that diffusion will have an impact on the modeling results, unless the gradients change so slowly that they can be considered stationary compared to the timescale of interest. A growing number of modeling studies [Markram et al., 1998; Naraghi and Neher, 1997] have emphasized the important effects of diffusion on molecular interactions. Moreover, many bioprocesses take place in different compartments in a cell. For instance, glycolysis conducts in cyotoplasma while TCA in mitochondria. Membranes play an important role to separate these bioprocesses and meanwhile maintain the normal transportation of metabolites inside and outside of them. In addition, signal transduction also occurs across the membranes. So far, in order to model a metabolic network, not only all effect of metabolites and reaction behaviors but different compartments should be considered. Diffusion, facilitated diffusion and active transport could be the very important physical effects in the models. We will focus on the membrane transportation. The rate of penetration of a metabolite across a membrane is related to the concentration gradient by Fick’s Law of Diffusion: d[S] Rate of penetration J = D · A · β · dx = Dβ Δx · A · ([S]out − [S]in ) where [S]out and [S]in are concentrations of metabolite outside and inside the membrane, respectively; D denotes the diffusion coefficient (D decreases with the size of the metabolite); A is the area of membrane (the greater it is, the more metabolite that can pass); β is the partition coefficient (β increases with increasing solubility) and dx is the membrane thickness (the greater the thickness, the slower the rate). Usually, Dβ Δx is called the permeability constant, a constant for a given substance moving through a given membrane. In carried systems, the carrier exhibit saturation kinetics, so that “Michaelis-Menten equation” formula might be used to describe such a process where low K m means a high rate of “affinity and transport”, and high Km a low “affinity and transport” rate. Some metabolites and/or signals (hormones) may modify carriers and change K m . Vmax is related to “carrier mobility”, the total number of carriers present. Petri net modeling algorithm Modeling algorithm and analysis of hybrid Petri nets can be done by the following procedures: Draft network construction Normally, a Petri net model is built manually by drawing places, transitions and arcs with mouse events. Fortunately, The XML based Petri net interchange format standardization which consists of a Petri net markup language (PNML) and a set of document type definitions (DTD) or XSL Schema is coming into being and intended to be applied. Several Petri net tools such as PNK, Renew, Design/CPN
48
M. Chen and R. Hofest¨adt / Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell
and GON are currently being equipped with an XML based file format exchange. We have developed an environment to extract data of metabolic networks from KEGG, BRENDA and RegulonDB and transform them into XML based files that can be used by PNK and Renew to display the Petri net models automatically [Chen, 2002; Chen et al., 2002b]. Data preparation The main feature of metabolic processes is that the concentration of metabolites will influence the reaction activity of bioprocesses. Therefore, the actual concentration of any metabolite is an important component of the model. Although some data nowadays are available to public via the Internet and other sources, some other data may be not complete. It requires a time consumption searching throughout the literature. In case of the unavailability of concentration values, additional experiments might be required. However, some computational prediction and experiential data also might help. Assignment of initial value of places is made after data gathering. Determination of the kinetics A series of predefined types of kinetics which are frequently used in biochemical reaction models are collected by SBML [Hucka et al., 2001]. However there are some circumstances in which the kinetic types are not yet defined. Then the kinetic modeling strategy is to be applied to ouline the kinetic properties of each reaction. All relationships and influences of metabolites are to be fixed by introducing the conresponding variables into the self-defined functions. Stoichiometry check Many metabolic pathway schemes contain mass conservation relations that must be taken into account in order to carry out the simulation. To check the mass conservation relations of a model we can go to the original reaction data from databases or the literature, or calculate them manually. In fact, we construct the model with the identification of reaction stoichiometry. Otherwise, it will lose something when the simulation is carried out because in continuous Petri nets, the weight of arcs is disable, so that all components involved in the reaction are changed with same rate which is defined by the transitions function. However, in the reaction 2*A + B C, the change of A should be twice than that of reaction rate. In VON++, unfortunately, we have to add more transition from A with the same function in order to obey the mass conservation law. Parameter tuning and simulation To build a model precisely requires as more as possible the variables and parameters involved in the metabolic network. The values of variables and parameters are determined either by experimental methods or deduced from other related values. However, it is impossible or sometimes unnecessary to put all variables and parameters into a model. The model is plausible when main influences are included. On the other hand, because of different purposes and situations, most data from laboratory do not fit the model very well and vice versa. We have to compare and tune the differences in order to find suitable ones. Then the effects of various parameters on the gene regulated metabolic networks and their relations can be understood. The key enzymes/proteins and intermediates related in the metabolic pathway can be determined, which will provide the necessary information to solve the metabolic bottlenecks.
M. Chen and R. Hofest¨adt / Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell
49
Fig. 6. Key enzymes in regulation of urea cycle in cells. CPS1: Carbamyl phosphate synthetase, EC 6.3.4.16; OTC: Ornithine transcarbamylase, EC 2.1.3.3; ASS: Argininosuccinate synthetase, EC 6.3.4.5; ASL: Argininosuccinate lyase, EC 4.3.2.1; ARG: Arginase, EC 3.5.3.1..
CASE STUDY Inborn errors of metabolism Inborn errors of metabolism are characterized by a block in a metabolic pathway, a deficiency of a transport protein or a defect in a storage mechanism caused by a gene defect. The defect gene leads to an absent or wrong production of essential proteins, especially enzymes that enable, disable or catalyze the biochemical reactions of metabolic networks. Thus, these disorders of the metabolism result in a threatening deficiency or accumulation of intermediate metabolites in the human organism and their following corresponding symptoms. For inborn errors of metabolism a lot of data is available in different databases accessible via the Internet. Most inborn errors of metabolism are included in OMIM (Online Mendelian Inheritance in Man http://www3.ncbi.nlm.nih.gov/Omim). Aside from the major molecular biological databases, e.g., GenBank, EMBL, TRANSFAC, KEGG and BRENDA, our group’s Metabolic Diseases Data Base (MD-Cave) has been developed to simplify the collect and persistent storage of knowledge about inborn metabolic diseases [Hofestaedt et al., 2000b; Kauert et al., 2001]. Although the amount of this electronically available knowledge of genes, enzymes, metabolic pathways and metabolic diseases increases rapidly, they give only highly specialized views of the biological systems. It is clearly that the next task is to integrate all this knowledge and make it biotechnologically and medically applicable. The MD-Cave is developed as such a bioinformatics system for representing, modeling and simulating genetic effects on gene regulation and metabolic processes in human cells. In the following section, we will present a case study that emphasizes on the modelling and simulation of the gene regulated urea cycle network by using the hybrid Petri net. Urea Cycle and its regulation In human cells, excess nitrogen is removed either by excretion of NH + 4 (of which only a little happens) or by excretion of urea. Urea is largely produced in the liver by the urea cycle, a series of biochemical reactions that are distributed between the mitochondrial matrix and the cytosol (Fig. 6). The cycle centers around the formation of carbamyl phosphate in hepatocyte mitochondria to pick up NH + 4 and incorporate
50
M. Chen and R. Hofest¨adt / Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell
Fig. 7. Schematic diagram of urea cycle metabolic network. Data sources: Metabolic pathway (enzyme reactions) from KEGG and BRENDA; Gene regulatory: TRANSFAC; Drug information: Metabolic Diseases Drug Database (MDDrugDB) (http://edradour.cs.uni-magdeburg.de/∼rkauert/MDDrugDB/Main.htm; drawing by Ralf Kauert).
it into ornithine to make citrulline. Citrulline is then transported to the cytosol where aspartate is added. As urea is removed it is converted back to ornithine that goes back into the mitochondria to start over. Deficiencies in the urea cycle enzymes lead to excessive NH + 4 and accumulation of its intermediates, resulting in neurological disorders. Any of five enzymes of the urea cycle may be deficient and lead to carbamyl phosphate synthetase (CPS) deficiency, ornithine transcarbamylase (OTC) deficiency, citrullinemia, argininosuccinic aciduria and argininemia. Although the urea cycle was discovered by Hans A. Krebs early in 1930’s, the modeling and simulation of the urea cycle so far have never been systematically explored. This case study therefore will attempt to model and simulate it. A model will show the interrelations of main metabolites invovled in the urea cycle. The simulation will be used to test the physico-chemical limitations and feasibility of certain proposed reactions as well as disease occurrences. Figure 7 shows a graphical representation of the urea cycle metabolic network using the objects presented above for describing entities and interactions. It shows an intricate network that links entities and interactions. This network includes not only the succession of biochemical reactions that lead to the transformation of CO2 and NH+ 4 to urea, but also the regulation of gene expression and enzymatic activities. It furthermore displays (e.g. asparate, fumarate) the links to other pathways, which are not detailed on the graph to preserve clarity.
M. Chen and R. Hofest¨adt / Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell
51
Table 2 Some kinetic parameters of enzyme reactions in human cells Enzyme
Substrate (mean concentration, mM) + HCO− 3 (0.05), NH4 (0.025)
Compartment
Km (mM)
mitochondia
HCO− 3 , 6.7 NH+ 4 , 0.8 MgATP, 1.1
OTC
carbamyl phosphate (0.001), L-ornithine (0.05)
mitochondia
CP, 0.16 L-ornithine, 0.40
2
[Charles et al., 1997]
ASS
L-citrulline (0.02), L-aspartate (0.325)
cytoplasm
L-citrulline, 0.03 L-aspartate, 0.03
0.15
[Charles et al., 1997]
ASL
argininosuccinate (0.034)
cytoplasm
argininosuccinate, 0.017
0.2
[Charles et al., 1997; Pierson and Brien, 1980]
ARG
L-arginine (0.06)
cytoplasm
L-arginine, 10
1.7
[Charles et al., 1997]
CPS1
Kcat E0 1.5
Reference [Pierson and Brien, 1980]
Fig. 8. Petri net model of the gene regulated urea cycle and its dynamic simulation result.
PN model Based on the proposed modeling strategy, a hybrid Petri net model of the gene regulated urea cycle metabolic network is presented (Fig. 8). The model of the intracellular urea cycle is made of the composition of gene regulatory networks and metabolic pathways. It comprises 155 Petri net elements, 14 kinetic blocks, 39 dynamic variables, and 22 reaction constants. Experimental data, partially listed in Table 2, are used for the initial evaluation of certain parameters of enzymatic reactions within the system.
52
M. Chen and R. Hofest¨adt / Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell Table 3 Interfering tests on the urea cycle Petri net model
Interfering test NH+ 4
Value of metabolites Citrulline Argininosuccinate Ornithine (plasma) (plasma) (plasma) ↓ ↓ ↓
Urea cycle defect Arginine
CPS1 blockade
↑↑
↓
OTC blockade
↑
↓
↓
↑
↑
ASS blockade
↑
↑↑
↓
↓
↓
ASL blockade
↑
↑
↑↑
↑
↑
ARG blockade Membrane transportation blockade
↑ ↑
↑ ↑
↓ −
↓ ↑↑
↑↑ ↑
Carbamylphosphate synthase deficiency Ornithine transcarbamylase deficiency Argininosuccinate synthase deficiency Argininosuccinase deficiency Arginase deficiency HHH syndrome
The values of the model parameters lacking in the literature are verified through numerical experiments or modifed from several references. The dynamic behavior of the model system, such as the metabolite fluxes, NH + 4 input and urea output are well described with continuous elements; while control of gene expression are outlined with discrete ones due to the insufficiency of explicit expression data. Nevertheless, when explicate knowledge about expression levels of the enzymes are available; it is possible to exploit our model of gene regulatory network to handle realistic gene expression data with the state equations. The initial values of variables were assigned and tuned so that the model system behavior would comply maximally with available experimetnal data on the dynamic characteristics of the system’s behavior, based on the following considerations: The availability of ammonia or amino acides (denoted as NH 3 ) is ingested continuously from plasma into mitochondria with a stable speed, i.e. the changes of ammonia concentration due to the rate of protein metabolism are not taken into account. The concentration of nitrogen excreted (urea) in plasma ranges from 3 mmol/L to 8 mmol/L and thus is regarded to be discharged with a certain rate. The degradation rates of an enzyme is 0.001 times of its concentration. The places of the main metabolites are directly linked to the transitions. Reaction rates assigned to these transitions are interpreted with differentail equations. However, from reality point of view, these transitions involve more than one variable that are presented in differential functions. In order to get a better understanding of these relationship, serval test arcs are used, e.g. the test arc between asparate and transition of ASS. There are no real input and output within these arcs, but the places linked are exploited by the transition firing speed. In the model, inhibitor arcs are also used to present negative effects of repressors and/or inhibitors to gene expression. On the biochemical reaction level, negative effects of metabolites are expressed as enzyme inhibitions that include competitive inhibition, noncompetitive inhibition, irreversible inhibition and feedback inhibition. Sequentially, the regulation of urea cycle enzyme activities can be brought in these two ways: First, the gene expression that is regulated by activators and inhibitors controls the enzyme synthesis, while enzyme synthesis and degradation determine the amount of the enzyme. Second, the activity of the enzyme can be altered during the metabolic catalysis. The formalization of the urea cycle model allows the quantitative simulation of metabolic pathways. Dynamics of the main components on the model regulating the urea cycle are shown in Fig. 8, too. Moreover, several tests on interfering the fluxes intentionally are conducted and results are observed (Table 3).
M. Chen and R. Hofest¨adt / Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell
53
As the urea cycle operates only to eliminate excess nitrogen. High concentration level of ammonia in the cell results in hyperammonemia which is a typical fatal event, coma and death ever been reported. Laboratory studies can reveal elevated arginine levels, mild hyperammonemia, and a mild increase in urine orotic acid. The diagnosis now can be confirmed by enzymatic analysis in the model. On highprotein diets or under starvation state, proteins are degraded and amino acid carbon skeletons are used to provide energy, thus increasing the quantity of nitrogen. But the amino nitrogen must be excreted. To facilitate this process, enzymes of the urea cycle are controlled at the gene level to enhance the concentrations of enzymes. As the urea cycle takes place both in mitochondria and cytoplasma, the effects involved also come from the membrane transportation. Some mitochondrial membrane diseases, e.g. ornithine transporter deficiency, surely effect the transportation of ornithine into matrix and results in high concentration of ornithine accumulation in plasma, which gets a feedback to the transition of arginine into urea and finally hyperammonnemia. From the model we know the treatment for defects of the enzymes in the urea cycle could be either limiting the input of ammonia (limiting protein intake) or replacing the missing intermediates from the cycle (supplementing with arginine or citrulline). Patients with OTC deficiency benefit from citrulline supplementation because citrulline can accept ammonia to form arginine. DISCUSSION The case study shows that the Petri net allows easy incorporation of qualitative insights into a pure mathematical model and adaptive identification and optimization of key parameters to fit system behaviors observed in gene regulated metabolic networks. The advantages of applying hybrid Petri nets (HPN) to model and simulate are: (i) The HPN model has a user-friendly graphical interface that allows an easy design, simulation and visualization. (ii) With the discrete and continuous events, HPN can easily handle gene regulatory and metabolic reactions. (iii) The inhibitor arc is useful for mechanistic studies to learn how enzymes interact with their substrates, to know the role of inhibitors in enzyme regulation and gene expression. Moreover, powered with mathematical equations, simulation is executable and dynamic results are visible. As in the cell, there are usually hundreds of interconnected metabolic pathways and gene regulatory networks and control of these presents more complex features. It is feasible to extend the Petri net model in a plug-in way. A large complex network model can be handled with the same set of structural and behavioral properties. When HPN models are applied to such a large network, the hierarchical concept makes it possible to develop a generalized variant of HPN at a global level. On the other hand, the subnet of the Petri net model provides us the basic model of which we already know its inner behavior and functions. Then we can construct a system by plugging together sub-models in order to understand the higher-level system and to predict its behavior. With rapid development of PN, many important extensions to the above general Petri nets classification appear. In the past few years, a number of Petri net tools have been used to model and simulate metabolic pathways and gene regulatory networks, e.g. UltraSAN, Design/CPN and VON++. More tools can be found at http://www.daimi.aau.dk/PetriNets/tools/quick.html. However, different tools have their characteristics and cannot embed various functions. Almost all Petri net tools were intended to model manufacture, distribution, and communication systems rather than biological systems. It requires the collaboration of biologists and Petri net researchers to construct a specific Petri net tool that contains all necessary features. Fortunately, Matsuno et al. [2001] adapted the VON++ to the GON program, and have made a significant progress.
54
M. Chen and R. Hofest¨adt / Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell
Provided with rich information about biochemical reactions and gene regulation, availability of various biological databases, building an integrative model of the whole cell (virtual cell modelling) that incorporates gene regulation, metabolic reactions and signal transductions is becoming a promising field in the post-genomic era. Several projects have been established. The challenge created with Petri nets is to understand how all the cellular proteins work collectively as a living system. Using powerful Petri nets and computer techniques, data of metabolic pathways, gene regulatory networks and signalling pathways could be converted for Petri net applications. Thus, a Petri net based virtual cell model could be implemented, and the attempt to understand the logic of the cell could be accomplished. ACKNOWLEDGEMENTS The work was partly supported by the Ministry of Science and Art of the Government of SachsenAnhalt, and by the German Research Foundation (DFG) graduate program “Bioinformatics” in the Uni-Bielefeld. REFERENCES • Baxevanis, D. A. (2003). The Molecular Biology Database Collection: 2003 update. Nucleic Acids Res. 31, 1-12. • Charles, R. S. et al. (1997). The metabolic and Molecular Bases of Inherited Disease. McGraw-Hill Companies, Inc. • Chen, M. (2002). Modelling and Simulation of Metabolic Networks: Petri Nets Approach and Perspective. In: Modelling and Simulation 2002: Proceedings of ESM2002, Amorski K. et al. (eds.), Darmstadt, pp. 441-444. • Chen, M., Freier, A., Koehler, J. and Rueegg, A. (2002). The Biology Petri Net Markup Language. In: Lecture Notes in Informatics: Proceedings of Promise2002, Desel J. et al. (eds.), Potsdam, Vol. 21, pp. 150-161. • David, R. and Alla, H. (1992). Petri Nets and Grafcet – Tools for Modeling Discrete Event Systems. Prentice Hall. • Genrich, H. J. (1987). Predicate/Transition Nets. In: Lecture Notes in Computer Science, Vol. 254: Petri Nets: Central Models and Their Properties, Advances in Petri Nets 1986, Part I, Proceedings of an Advanced Course, Brauer, W., Reisig, W., Rozenberg, G. (eds.), Springer-Verlag, pp. 207-247. • Genrich, H., Kueffner, R. and Voss, K. (2001). Executable Petri Net Models for the Analysis of Metabolic Pathways. International Journal on Software Tools for Technology Transfer 3, 394-404. • Goss, P. J. E. and Peccoud, J. (1999). Analysis of the stabilizing effect of Rom on the genetic network controlling ColE1 plasmid replication. Pac. Symp. Biocomput. 4, 65-76. • Hochstrasser, M. (1996). Protein degradation or regulation: Ub the judge. Cell 84, 813-815. • Hofestaedt, R. and Thelen, S. (1998). Quantitative modeling of Biochemical Networks. In Silico Biol. 1, 39-53. • Hofestaedt, R., Lautenbach, K. and Lange, M. (2000a). Modellierung und Simulation Metabolischer Netzwerke: DFGWorkshop im Rahmen des DFG-Schwerpunktes Informatikmethoden zur Analyse und Interpretation gro?er genomischer Datenmengen, Magdeburg, Mai 2000. • Hofestaedt, R., Mischke, U. and Scholz, U. (2000b). Knowledge acquisition, management and representation for the diagnostic support in human inborn errors of metabolism. In: Medical Infobahn for Europe: Proceedings of MIE2000 and GMDS2000, Hasman, A. et al. (eds.) Amsterdam, IOS Press, pp. 857-862. • Hucka, M., Finney, A., Sauro, H. and Bolouri, H. (2001). Systems Biology Markup Language (SBML) Level 1: Structures and Facilities for Basic Model Definitions, pp. 30-33. • Kauert, R., Toepel, T., Scholz, U. and Hofestaedt, R. (2001). Information System for the Support of Research, Diagnosis and Therapy of Inborn Metabolic Diseases. In: MEDINFO 2001, V.Patel et al. (eds), Amsterdam: IOS Press, pp. 353-356. • Kohn, M. and Letzkus, W. (1983). A Graph-theoretical Analysis of Metabolic Regulation. J. Theor. Biol. 100, 293-304. • Kurt, J. (1997). Coloured Petri Nets – Basic Concepts, Analysis Methods and Practical Use, In: EATCS Monographs on Theoretical Computer Science. 2nd edition, Berlin: Springer-Verlag. • Markram, H., Roth, A. and Helmchen, F. (1998). Competitive calcium binding: implications for dendritic calcium signaling. J. Comput. Neurosci. 5, 331-348. • Matsuno, H., Doi, A., Drath, R. and Miyano, S. (2001). Genomic Object Net: Basic Architecture for Representing and Simulating Biopathways. In: RECOMB 2001, April 2001. • Matsuno, H., Doi, A., Nagasaki, M. and Miyano, S. (2000). Hybrid Petri net Representation of Gene Regulatory Network. Pac. Symp. Biocomput. 5, 338-349.
M. Chen and R. Hofest¨adt / Quantitative Petri Net Model of Gene Regulated Metabolic Networks in the Cell
55
• Naraghi, M. and Neher, E. (1997). Linearized buffered Ca2+ diffusion in microdomains and its implications for calculation of [Ca2+] at the mouth of a calcium channel. J. Neurosci. 17, 6961-6973. • Pierson, D, L. and Brien, J. M. (1980). Human carbamylphosphate synthetase I. Stabilization, purification, and partial characterization of the enzyme from human liver. J. Biol. Chem. 255, 7891-7895. • Pyronnet, S., Vagner, S., Bouisson, M., Prats, A. C., Vaysse, N. and Pradayrol, L. (1996). Relief of ornithine decarboxylase messenger RNA translational repression induced by alternative splicing of its 5’ untranslated region. Cancer Res. 56, 1742-1745. • Reddy, V. N., Mavrovouniotis, M. L. and Liebman, M. N. (1993). Petri Net Representation in Metabolic Pathways. In: Proceedings First International Conference on Intelligent Systems for Molecular Biology, Hunter, L. et al. (eds.), AAAI Press, Menlo Park, pp. 328-336. • Reisig, W. (1985). Petri Nets: An Introduction, Monographs on Theoretical Computer Science, Springer • Wang, J. (1998). Timed Petri Nets: Theory and Application. Kluwer Academic Publishers, Boston Hardbound • Wang, J., Jin, C. and Deng, Y. (1999). Performance analysis of traffic networks based on stochastic timed Petri net models. In: Proc. 5th Int. Conf. on Engineering of Complex Computer Systems, October 1999, Las Vegas, pp. 77-85. • Weissmuller, G. and Bisch, P. M. (1993). Autocatalytic cooperativity and self-regulation of ATPase pumps in membrane active transport. Eur. Biophys. J. 22, 63-70. • Yao, K. S., Godwin, A. K., Johnson, C. and O’Dwyer, P. J. (1996). Alternative splicing and differential expression of DT-diaphorase transcripts in human colon tumors and in peripheral mononuclear cells in response to mitomycin C treatment. Cancer Res. 56, 1731-1736.
56
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2003, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-56
Petri Nets for Steady State Analysis of Metabolic Systems Klaus Vossa , Monika Heinerb and Ina Kochc,∗ a
Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI) [GMD], Sankt Augustin, Germany Computer Science Department, Brandenburg University of Technology at Cottbus, Cottbus, Germany c Molecular Bioinformatics, Institute for Computer Science, Johann Wolfgang Goethe-University Frankfurt a. Main, Germany b
ABSTRACT: Computer assisted analysis and simulation of biochemical pathways can improve the understanding of the structure and the dynamics of cell processes considerably. The construction and quantitative analysis of kinetic models is often impeded by the lack of reliable data. However, as the topological structure of biochemical systems can be regarded to remain constant in time, a qualitative analysis of a pathway model was shown to be quite promising as it can render a lot of useful knowledge, e. g., about its structural invariants. The topic of this paper are pathways whose substances have reached a dynamic concentration equilibrium (steady state). It is argued that appreciated tools from biochemistry and also low-level Petri nets can yield only part of the desired results, whereas executable high-level net models lead to a number of valuable additional insights by combining symbolic analysis and simulation. KEYWORDS: Metabolic pathway, steady state, elementary mode, high-level Petri net, S-invariant, T-invariant
INTRODUCTION With the rapidly growing amount of new experimental data, the modeling of biological pathways occuring in the cell regained great interest. For this challenge in biosciences, biologists need theoretical methods and computational tools in order to prove, analyse, compare, and simulate these complex networks for different organisms and tissues. The results are of major importance also for the biotechnology and the pharmaceutical industry. “The main focus in the mathematical modeling in biochemistry has traditionally been on the construction of kinetic models. The aim of these models is to predict the system dynamics” [Heinrich and Schuster, 1998]. Their analysis is commonly based on the solution of systems of differential equations. In this way, numerous kinetic models for different metabolic systems and membrane transport processes have been developed (for a review, see Heinrich and Schuster, 1996). A severe restriction, often encountered in the construction of these models, is the imperfect knowledge of the kinetic parameters. On the other hand, a structural analysis of metabolic pathways mainly deals with the topology of how substances are linked by reactions. A central role is played by stoichiometric matrices, which ∗
Corresponding author. E-mail:
[email protected].
K. Voss et al. / PN Analysis of Metabolic Systems
57
indicate how many molecules of each substance are consumed or produced in the single reactions. Their analysis is based on the solution of algebraic equations, and is independent of any kinetic parameter. Of particular interest are biochemical systems persisting in a steady state (see section “Steady state pathways, elementary modes”), i.e., in which the concentrations of their substances have reached an equilibrium. An elementary mode (this term has been coined in Schuster and Hilgetag, 1994) can be regarded as a minimal set of reactions (resp. of the enzymes catalyzing them) that can operate at steady state. Knowledge about the flux rates and the elementary modes of a system allows “to define and comprehensively describe all metabolic routes that are both stoichiometrically and thermodynamically feasible in a given group of enzymes” [Schuster et al., 2000a]. A metabolic system can be modeled as a Petri net in a straightforward way, as has been demonstrated for low-level nets in Reddy et al., 1993, and Hofestaedt, 1994, and for high-level nets in Genrich et al., 2001. The Petri net structure then truly reflects the biochemical topology, and the incidence matrix of the net is identical to the stoichiometric matrix of the modeled metabolic system. Accordingly, the mentioned elementary modes correspond almost directly to the minimal T-invariants known from the Petri net theory. An actual account of the structural analysis of metabolic networks and the analogy to Petri nets is given in Schuster et al., 2000b. The use of Petri nets for modeling quantitative (kinetic) properties of biochemical networks, especially for genetic and cell communication processes, was discussed in Hofestaedt, 1994, and Hofestaedt and Thelen, 1998. Other contributions followed, using various types of Petri nets like stochastic nets [Goss and Peccoud, 1998; 1999] and hybrid nets [Matsuno et al., 2000]. Executable high-level net models of metabolic pathways, and their (almost automated) construction, simulation, and quantitative analysis are described in Genrich et al., 2001. The application of Petri nets to this field began in the nineties with the publications of Reddy et al. [Reddy et al., 1993; 1996]. They present a low-level (place/transition) net to model the structure of the combined glycolytic pathway (GP) and pentose phosphate pathway (PPP) of erythrocytes. They use the well-known algebraic methods to compute S- and T-invariants of the net. A thorough analysis of an extended form of this pathway was performed by Koch et al., 2000, which forms the starting point for this paper. For computing conservation relations (S-invariants) and elementary modes (T-invariants) of metabolic pathways, the software package METATOOL [Pfeiffer et al., 1999] has been developed (by biochemists) and successfully applied in a number of cases. However, merely the integer weighted S-invariants are detected. Moreover, only the overall reaction equations, i. e., the net effects of a pathway execution, can be computed, and any consideration of its dynamics, in particular of the partial order of the reaction occurrences, is missing. The main achievements reported in this paper rely on the use of executable high-level net models, an executable high-level net model, and on symbolic analysis. This allows to consider the following crucial aspects: (a) It is well known that the detection and interpretation of invariants can substantially improve the understanding of systems. In the context of certain problems, however, the most interesting system properties are not invariant. In these cases, very often the divergence of these structures from an invariant is of major importance as it indicates a defect or effect of the substructure in question. We shall introduce and formally define these concepts for high-level Petri nets in section “Steady state pathways, elementary modes” and use them extensively, in section “Defects, effects and invariants”, for the symbolic structural analysis of the quite complex sample pathway in section “Models of the glycolysis and pentose phosphate pathway”.
58
K. Voss et al. / PN Analysis of Metabolic Systems
(b) In high-level nets, the model designer can distinguish tokens via their colors. This is a prerequisite for overcoming the restrictions of low-level nets or METATOOL, i.e., for both detecting hitherto unknown S-invariants of our model and determining its partial order dynamics, as shown in section “Defects, effects and invariants”. The software tool of our choice – for graphical editing, analyzing and executing the net models – is Design/CPN [Design]. STEADY STATE PATHWAYS, ELEMENTARY MODES First some Petri net notions are recalled that we will apply to metabolic pathways later on. The algebraic analysis of Petri nets mainly relies on their invariants. However, in a great number of systems, like metabolic pathways, deviations from invariants deserve even more attention. Hence the following definitions [see Genrich, 2002] will turn out to be useful. Let N be a colored (high-level) Petri net with the sets S of places and T of transitions. (A transition represents a whole class of transition occurrences, each one determined by a particular binding of the variables in the adjacent arc labels by color values). The incidence matrix C of N is an |S| × |T | matrix whose elements c ij are the positive/negative labels of the arcs pointing from/to transition t j to/from place si . An S− resp. T-vector is a vector with an entry for each s ∈ S resp. t ∈ T . A marking of N assigns a multi-set of tokens (colors) to each place s ∈ S . At a marking M , usually several transitions are enabled to occur. One transition occurrence leads to a follower marking, enabling other transitions, and thus defines a partial order among them. The initial marking M 0 of N and all markings reachable from M 0 are called states of the net system. The (symbolic) analysis of N is based on multiplying C with vectors of transformations of expressions. A distribution is a mapping transforming the elements of a color set D into linear combinations (with integer coefficients) of elements of a not necessarily different set D . A substitution replaces variables by expressions in colors. Let y be an S-vector such that, for every place S , the component y s is a combination (list) of distributions of S , and all the y s have the same range. Then the transpose matrix C T can be multiplied by the vector (one column matrix) y . A product is c ij · ysi the application of the distributions of si to the arc expression c ij and yields an integer linear combination of tokens from the range color set. Treating y as a one-column matrix, the product C T · y is a T-vector whose entries are integer linear combinations of colors denoting the marking differences caused by the individual transitions. It is called the defect of y . A vector consisting only of zero elements 0 (= 0 ∗ arbitrary color) is called null vector and denoted by O. The S-vector y is an S-invariant of N iff C T · y = O. An S-invariant represents a state quantity of the net system, i.e., a quantity which, starting from the initial state, is maintained during the whole life time of the system. It describes a conservation rule, as known from many areas in (natural) science. Such a mandatory S-invariant is a valuable means to detect inconsistencies of a system specification or model. A process is a partially ordered set of transition occurrences leading from a state M 1 to a state M2 , π M1 → M2 . Ignoring the order of occurrences yields a T-vector x of combinations of transition occurences which is called the action performed by π. To be precise, every entry of such a T-vector is a combination of integer weighted substitutions of color variables by expressions in colors, and all variables have to be substituted by colors of the same set. Each substitution corresponds to a binding of the variables
K. Voss et al. / PN Analysis of Metabolic Systems
59
Fig. 1. A sample reaction and its reverse. The transition (reaction) r3 consumes one R5P and one Xu5P molecule and produces one GAP and one S7P molecule. The arc labels are variables that denoting the identities (colors) of the respective molecules. Transition r3’ is the reverse of r3. Its additional guard [x <><>D] demands that the color x of the GAP molecule must not be equal to D in order to enable the transition r3’.
around a transition which determines the particular kind of its occurrence. A product c ij · xtj consists of simultaneously applying the substitutions in x tj to the variables in the arc expression c ij , and yields an integer linear combination of tokens from the color set of s i . Treating x as a one-column matrix, the state difference Δ = M 1 − M2 = C · x is an S-vector whose entries are integer linear combinations of colors denoting the marking differences effected by x on the different places. It is called the effect of x. The T-vector x is called a T-invariant of N iff C · x = O . π M performing a T-invariant leads from one state to the same state again (M = M ). A process M1 → 2 1 2 It re-generates the state M 1 , hence defines a cyclic process. As mentioned in the introduction, the approach described in this paper concentrates on the mere structure of the pathways, i.e., on the topology of the interconnections of metabolites via enzymatic reactions. Hence, it is structural or qualitative as it does not deal with the kinetics of the reactions. Constructing a Petri net of such a pathway is straightforward, representing metabolites as places, reactions as transitions, and the stoichiometric relations by labelled directed arcs between them. Examples can be found in Reddy et al., 1993, Hofestaedt, 1994, Reddy et al., 1996, Koch et al., 2000, 2005, and 2008 (low-level nets), or in Genrich et al., 2001, and section “Models of the glycolysis and pentose phosphate pathway” of this paper (high-level nets). In the following, such a net is called the Petri net model of the pathway. Figure 1 shows a sample high-level net model of a metabolic reaction. Jensen, 1992–1997, gives an excellent introduction into the theory and application of colored Petri nets. In our metabolic pathways, a distinction is made between external and internal metabolites according to whether or not they are involved in reactions outside the system considered. External metabolites are called sources resp. sinks of the pathway if they are produced resp. consumed by those (external) reactions. A metabolic pathway is said to persist in a steady state if the concentrations of all internal substances have reached a dynamic equilibrium: for each internal metabolite, the total rate of its consumption is to that of its production. Assuming a constant activity of all enzymes involved in the system, many (but not all) metabolic pathways reach such a dynamic equilibrium after some time. That and how this happens, has been demonstrated for glycolysis, gluconeogenesis, citric acid cycle (TCA), and combinations of them in Genrich et al., 2001, by simulation runs of quantitative high-level Petri net models. Structural analysis of metabolic systems in steady state aims at, among others, “elucidating relevant relationships among system variables” [Heinrich and Schuster, 1998] and does not rely on imperfectly known or doubtful kinetic data. A formalization of steady state and related notions is given in Schuster et al., 1996. For our paper, we need the following. The stoichiometric matrix N of a metabolic pathway with n metabolites and r
60
K. Voss et al. / PN Analysis of Metabolic Systems
reactions, is an n × r matrix where the element N ij denotes the flux from the i-th metabolite to the j -th reaction, i.e., the amount δci /δt of the metabolite concentration produced or consumed by that reaction. The stoichiometric matrix of a pathway precisely corresponds to the incidence matrix of its low-level Petri net model. A metabolic pathway is in steady state if and only if the reaction rates fulfill the condition δci /δt =
r
Nij . vj = 0, i = 1, . . . n, or, in matrix notation, N · v = O,
j=1
for an integer vector v = (v1 , . . . , vr )T , called flux vector, where a component v j is an integer weight factor of the j -th reaction. “Flux modes” constitute the core concept of the algebraic analysis for metabolic pathways that are assumed to reach a steady state. “An elementary flux mode is a minimal set of enzymes that could operate at steady state, with the enzymes weighted by the relative flux they carry. ‘Minimal’ means that if only the enzymes belonging to this set were operating, complete inhibition of one of these enzymes would lead to cessation of any steady-flux in the system” [Schuster et al., 2000a]. Before relating biochemical analysis methods to the corresponding Petri net algorithms, two particular questions have to be discussed. Firstly, in steady state analysis, those processes are of particular interest that start with the source substances of the investigated pathway and finish with its sink substances. For these external metabolites, constant concentrations have to be assumed to reach a steady state. Using METATOOL, this is achieved by excluding external metabolites from the stoichiometry matrix, but including the reactions affecting them: hence, their concentrations remain unchanged. In contrast, in our Petri net models, we include the external substances and introduce an extra transition StartEnd that closes the pathway to a cycle by supplying the initially needed substrates and consuming the finally produced ones. This measure enables us to also identify those internal metabolites requiring initial markings and to compute their amounts. Secondly, we have to take into account the reversibility of reactions. An obvious solution is to admit negative factors in the T-vector to denote the occurrences of reversible reaction transitions in the backward direction. This would lead to T-vectors x < 0, not satisfying the standard definition of T-invariants for Petri nets. However, instead of deviating from this definition, we introduce, for every reversible transition t, an additional complementary (reverse) transition t to the net (see Fig. 1). Doing that, we get – for each reversible reaction t – a potentially endless loop (t, t , t, t , . . . ). This slight disadvantage can be turned into an advantage in high-level nets where we can discriminate the directions of certain reactions according to the flux modes to which they belong. Thus we can model the situation in the cell, where the direction of a reaction depends on the need of the cell controlled by the metabolite concentrations. MODELS OF THE GLYCOLYSIS AND PENTOSE PHOSPHATE PATHWAY The glycolysis pathway (GP) is a sequence of reactions that converts glucose into pyruvate with the concomitant production of a relatively small amount of ATP. Then, pyruvate can be converted into lactate. The version chosen for this paper is that one for erythrocytes [see Stryer, 2006]. In the Petri net P (Fig. 2), the GP consists of the reactions l1 to l8. The pentose phosphate pathway (PPP), also called hexose monophosphate pathway, again starts with glucose and produces NADPH and ribose-5-phosphate (R5P) which then is transformed into glyceraldehyde-phosphate (GAP) and fructose-phosphate (F6P) and thus flows into the GP. In Fig. 2,
K. Voss et al. / PN Analysis of Metabolic Systems
61
Fig. 2. From the original Porig to the high-level net P .
the PPP consists of the reactions l1, m1 to m3, r1 to r5, and l3 to l8. From now on we use acronyms for most metabolic substances; their full names are listed in the Abbreviations appendix. It shall be noted that molecules like ADP, ATP, NAD + , Pi etc. play a somewhat special role in metabolic networks. They are called ubiquitous because they are found in sufficiently large amounts in the cell. For ease of distinction, the remaining substances from Gluc to Lac shall be named primary. When talking about reactions in the following, the involved ubiquitous molecules commonly are not mentioned because the primary substances are those that characterize the reactions and hence are of particular interest. Whereas the GP generates primarily ATP with glucose as a fuel, the PPP generates NADPH, which serves as electron donor for biosyntheses in cells. The interplay of the glycolytic and pentose phosphate
62
K. Voss et al. / PN Analysis of Metabolic Systems
pathways enables the levels of NADPH, ATP, and building blocks for biosyntheses, such as R5P and Pyr, to be continously adjusted to meet cellular needs. This interplay is quite complex, even in its somewhat simplified version that shall be discussed in this paper. In Reddy et al., 1996, this pathway has already been modeled as a low-level Petri net (place/transition net) and then – qualitatively – analyzed by means of the well known linear algebraic methods. The analysis in Reddy et al., 1996, is not completely correct, and it yields neither a full S-invariant (i.e., comprising all primary substances from the sources to the sinks) nor a non-trivial T-invariant. Hence, it reveals some deficiencies that we will overcome by switching to high-level (colored) net models. Additionally, our Petri net models allow not only to be analyzed but also to be executed (simulated). The construction of the low-level Petri net starts with a modest simplification of the original pathway model. The left branch of Fig. 2 represents the GP. Its second part is a strictly linear path, transforming BPS into Lac via TPG, BPG, PEP, and Pyr (depicted in the small box at the lower right hand of Fig. 2). This path can and shall be reduced to just one super-transition l8 (with the appropriate connections to ADP, ATP, NADH, and NAD+ ) without altering the crucial properties of the net. In Design/CPN, such a node is called a substitution transition: l8 stands for and equivalently represents the mentioned linear path. With this modification and disregarding for a moment the guards and replacing all arc labels by ‘1’, we get the place/transition net of Fig. 2, which is identical to the net found in Reddy et al., 1996. In organisms, the amount of the molecules is high enough to tolerate transient deviations from the theoretically postulated equilibrium concentrations. In the long run, a steady state is approximated: according to the kinetic equations, those reactions with higher reactant (lower product) concentrations are preferred to those with lower reactant concentrations (or higher product concentrations, resp.). In contrast, a qualitative analysis is confronted with small, even minimal amounts of molecules for any substrate. The crucial point of using colored instead of low-level Petri nets is the following. 1 Applying higherlevel places allows to discriminate between different molecules of the same metabolite via their identifiers (colors) C, D, F, . . . .2 This enables the designer to separate different branches of the compound pathway and to distinguish among molecules on the same place according to their origin and destination reaction. This distinction, it increases the potential of the qualitative analysis we have in mind (see section “Steady state pathways, elementary modes”). As will be shown in section “Defects, effects and invariants”, by choosing appropriate token colors we often get valuable information about the completeness and feasability of the chosen Petri net model. Moreover, it is a prerequisite for executing such models properly, e.g., without running into unexpected deadlocks or the like. We have simulated all models in this paper, clearly not to get new results about their kinetics but mainly to gain confidence in the chosen color specifications. 1 Legend for the Design/CPN nets in this paper: – All places have the colorset, CS = C, D, F, G, H, G , H . – The underlined inscription 1 ‘D inside the place StartEnd denotes its initial marking. – A place name in italics denotes a fusion place. All members of a fusion set are treated as the same place. Their names are numbered consecutively. – The places for NAD+ , NADP+ , Pi , CO2 , H2 O are named NADp, NADPp, Pi, CO2, H2O respectively. – A term in brackets [ ] is a guard (boolean expression) of the transition. If the value of the guard is true the transition may be enabled, if false it cannot be enabled. – A (dashed) transition t’ denotes the reverse counterpart of the reversible reaction t (see Fig. 1). 2 Note that the (theoretical) distinction by colors applies to chemically identical molecules, e.g., a token C on place G6P is distinguished from a token G on G6P. On the other hand, the substance that a molecule represents is unambiguously determined by the place it belongs to. Hence, a token C on G6P represents a different substance than a C on F6P.
K. Voss et al. / PN Analysis of Metabolic Systems
63
What is the strategy of attributing colors to the tokens (molecules) along a given pathway model? Starting with a primary source substance of the pathway, we look for conflicts on the way to the sink(s). By definition, p is a conflict place if it has more than one output transition, and all are enabled if p carries a suitable token. If one of these alternative transitions occurs, all remaining transitions are disabled. In our context, a conflict would cause no harm as long as all but one alternative paths starting at p would end up again at this p without any lasting marking change. In general however, this is not the case. When looking at the metabolism in one specific organism, alternative metabolic paths most often result in different metabolic overall reactions. Therefore, they shall be discriminated and must not be combined deliberately. This discrimination is performed by attributing different identifiers to the molecules and by additionally blocking certain transitions for particular molecules using guards. This shall be demonstrated by use of the sample net P , treated again as a low-level net by disregarding all arc labels and guards. Starting with the source Gluc, the first conflict is encountered at G6P which can be the reactant of either the reaction l2 or m1. A G6P-molecule with destination l2 gets a color, say C, and that one with destination m1 gets a different color (to be decided upon later). The guard [x C] prevents a C-token to be consumed by m1. Proceeding downwards the GP, we examine F6P 1 – a fusion place. As F6P 2 on the right-hand side has no outgoing arc, a conflict does not exist. The next conflict on the way down is found at GAP (the fused GAP 1 and GAP 2), a conflict among the three transitions l6, l7 and r 4. The reaction l6 is trivial, as the loop GAP 1 → l6 → DHAP → l5 → GAP 1 returns the token to GAP 1 without affecting any other places. The conflict between l7 and r 4 is difficult to discuss at this moment without knowledge about the situation in the PPP at GAP 2}. We postpone it to the end of this paragraph. Instead, the path from G6P into the PPP, in the middle and right part of the figure shall be inspected. Choosing a separate color F for the molecules of the middle part is not mandatory because there is no conflict. The next conflict occurs at Ru5P with the choice to continue via r 1 or r2. The colors of these two molecules must be different from each other and from C; so we choose G and H. The last conflict at GAP is the postponed one. However, because the color G has been maintained from Ru5P via R5P until GAP 2 and the tokens on GAP 1 have the (different) color D, their distinction is accomplished already: the G-molecules are removed by reaction r 4 and the D-molecules by l7. The resulting model is again P , but now regarded as a colored net by including the arc labels and transition guards of Fig. 2. DEFECTS, EFFECTS AND INVARIANTS The following calculations were made by use of an experimental software package SY, written by H. Genrich in Standard ML for the symbolic analysis of colored Petri nets. This package supports symbolic calculations based on the incidence matrix of an executable colored net in Design/CPN. It inspects and adopts the internal tables produced by the Design/CPN simulator for the graphical model and its data base. It allows, among others, to form symbolic dot products and matrix products, and to apply useful reduction rules and different formats for presenting the results. Defects and S-invariants We start with looking for S-invariants in the net P of Fig. 2. Obviously, there are four pairs of ubiquitous substrates which, if produced or consumed by a reaction, are transformed into each other, namely (ADP, ATP), (NADP+ , NADPH), (2 GSSG, GSH), and (NAD+ , NADH). This is verified by applying the function DEFECT of the package SY to the four S-vectors
64
K. Voss et al. / PN Analysis of Metabolic Systems
[ (ADP, → 1‘D), (ATP, → 1‘D) ], [ (NADP+ , → 1‘D), (NADPH, → 1‘D) ], [ (GSSG, → 2‘D), (GSH, → 1‘D) ], and [ (NAD+ , → 1‘D), (NADH, → 1‘D) ]. Doing this yields the null defect in all cases, which means that the S-vectors above constitute Sinvariants.3 Clearly, it is of much greater importance to deal with the full set of all primary (non-ubiquitous) substances. In steady state, there should exist an S-invariant comprising that full primary set. To find out the weight factors of a full S-vector, i.e., covering this set, we proceed step by step. First we observe that each molecule of FBP is transformed into two GAP-molecules by the reactions l4 and l5. We conclude that in any S-vector, to finally become an invariant, the place markings (number of molecules) of the glycolysis pathway GP from Gluc unto FBP must get a weight factor twice as high as GAP and the following places down to Lac. Trying to adopt the same principle to the pentose phosphate pathway PPP, however, leads to a non-null defect. To be more specific, the simulation of P ∗ and also its T-invariants computed in the next subsection show that, starting with 3 Gluc molecules, the GP produces 6, and the PPP 5 Lac molecules. Because these alternative paths share the metabolites G6P, F6P, FBP, GAP, and BPS, it is not possible to find integer weight factors for these substances to make the full S-vector an invariant. We conclude that it is impossible to find a full S-invariant, in model P rev , with ‘standard’ means like low-level Petri nets or METATOOL [Pfeiffer et al., 1999]. Hence, neither in Reddy et al., 1996, nor in Schuster et al., 1996, a full S-invariant has been reported. The use of high-level nets with individual tokens offers the possibility to distinguish the mentioned metabolites according to the paths along which they are produced and consumed. Constructing the desired S-invariant (not shown), we have to choose the weight 2 for all metabolites from Gluc unto FBP and E4P, irrespectively of the path on which they occur. For GAP, a threefold distinction has to be made: GAP-molecules produced by r 3 or r 4’ (x = G) or by r5 (x = H) get the weight 2, whereas those produced by l4 or l5 (x = D) get the weight 1. And this distinction is kept also for BPS and Lac. The result then is an S-invariant which, however, lacks a sensible biochemical interpretation because it is impossible to distinguish molecules of the same substance in organisms. Anyhow, this invariant lets us conclude that an essential product must be missed in the model of the PPP. Inspecting the PPP more carefully, shows that this product is the CO 2 -molecule produced by the reaction complex m1: G6P + 2NADP+ + H2 O → Ru5P + 2NADPH + 2H+ + CO2 . This means that the model P (and that of Reddy et al., 1996) have to be revised for our purposes. Introducing both CO2 and H2 O into the model means to add an input place H 2 O and an output place CO2 to m1 and an output place H 2 O to l8. The latter one originates in the H 2 O produced by the reaction DPG → PEP + H2 O. Of course, also places for H + could be added to the model, but we decided to refrain from this in order to keep the readability of the model. 3
Adopting the definitions at the beginning of section “Steady state pathways, elementary modes” and the conventions of the package SY in a simplified version, an S-vector is written as a list of pairs (place si, distribution of si), where the second element – in our case – degenerates to a mapping of a color variable (or the don’t-care symbol “ ” in case of a constant) to a linear combination over the standard colorset CS (cf. footnote 1). The defect of an S-vector σ, computed symbolically by the function DEFECT, is given as a list of members t: lico(CS), in which lico(CS) denotes a linear combination of tokens that has to be added to an input or output place of transition t to make σ an S-invariant. Dealing with the syntactic details of SY is far beyond the scope of this paper.
K. Voss et al. / PN Analysis of Metabolic Systems
65
For this augmented model, a stepwise construction leads to the following full S-invariant (in a drastically simplified notation, just writing i instead of → i‘D): σc = [ (Gluc, 6), (G6P, 6), (F6P, 6), (FBP, 6), (CO2, 1), (Ru5P, 5), (R5P, 5), (Xu5P, 5), (S7P, 7), (E4P, 4), (DHAP, 3), (GAP, 3), (BPS, 3), (Lac, 3)]. An inspection of σC reveals that the integer weight factor of any substance is equal to the number of C-atoms bound in it. Thus, the S-invariant σ c expresses the conservation rule that the sum of Catoms bound by all involved substrates is constant. And this clearly represents a sensible biochemical interpretation. Next we compute an S-invariant concerning the number of all O-atoms. In this case we obviously have to include also H 2 O which, of course, did not appear in σ C . We get σo = [(Gluc, 6), (G6P, 6), (F6P, 6), (FBP, 6), (CO2, 2), (H2O, 1), (Ru5P, 5), (R5P, 5), (Xu5P, 5), (S7P, 7), (E4P, 4), (DHAP, 3), (GAP, 3), (Pi , 1), (BPS, 4), (Lac, 3) ]. Interestingly, the weights of the substances reflect only the number of O-atoms outside the P-groups 2− PO2− 3 , if any. As Pi = HO-PO3 in our case, it contributes one O-atom to the total number and, consequently, appears with the factor 1 in the vector above. Hence, σ o represents the conservation rule that the number of all O-atoms (outside the P-groups) in the pathway is constant. If looking for the P-atoms (or P-groups) we cannot expect to get a full S-invariant, as some of the primary substances do not contain a P. The S-vector σp = [(ADP, 2), (ATP, 3), (G6P, 1), (F6P, 1), (FBP, 2), (Ru5P, 1), (R5P, 1), (Xu5P, 1), (S7P, 1), (E4P, 1), (DHAP, 1), (GAP, 1), (Pi , 1), (BPS, 2) ] is a partial S-invariant saying that the total number of P-atoms is constant. Moreover, σ P is already reported in Reddy et al., 1996. This is due to the fact that it contains neither CO2 nor H2 O which are missing in their net model, as we know. Finally, it should be mentioned that we also computed a full S-invariant concerning the sum of H-atoms, using a model that additionally includes all H + -ions (not shown). Effects and T-invariants T-vectors and T-invariants of a Petri net describe processes. This means that we have to take into account that reversible reactions may run in the backward direction. As mentioned in section “Steady state pathways, elementary modes”, in case of a reversible transition t, we add its complementary transition t to the net. We start with the sample net model P (Fig. 2), augmented by the places for CO 2 and H2 O. The reactions l1, l3, and m1 can be treated as irreversible, because we want to consider the GP and PPP, but not the gluconeogenesis. The linear path from BPS to Lac, replaced by the substitution transition l8, contains the irreversible reaction from PEP to Pyr. Thus, l8 is also treated as irreversible. Hence, we introduce the new complementary transitions l2’, l4’, l5’, l7’, and r 1’ to r 5’ to the augmented net P and thus obtain the model P rev in Fig. 3.4 The introduction of the reverse transitions in P rev may entail additional critical conflicts which have to be resolved when dealing with T-vectors and simulation. We observe: – l4’, l5’, l7’, and r 1’ to r 5’ merely create uncritical loops and can be deleted, – l2’ must not appear in the steady state GP. 4
Note 1. Transition l6 in the model P is identical to l5’ in Prev . Note 2. Transitions m2 and m3 are treated as irreversible as they merely restore the consumed NADP+ -molecules. Note 3. The S-invariants of Prev are identical to those computed in subsection “Defects and S-invariants” for P.
66
K. Voss et al. / PN Analysis of Metabolic Systems
Fig. 3. The Petri net P rev with reversible reactions.
Hence, l2’ must be prevented from occurring for tokens x = C. In the course of PPP, one G- and two H-molecules on G6P are transformed into one H-token on GAP 1 (via r5) and two H-tokens on F6P 2 (via r 4 and r 5). For the latter H-tokens there are three possibilities to be processed further: 1. both move to FBP via l3 and continue on the ‘normal’ way to Lac, 2. both move to G6P via l2’ and thus regenerate the initial two H-tokens there, 3. one of them is moved by l3 to FBP, and the second one by l2’ to G6P. This case is a combination of glycolysis and gluconeogenesis, which cannot occur in steady state.
K. Voss et al. / PN Analysis of Metabolic Systems
67
To distinguish the ‘normal’ PPP path (1) from the new reaction path (2) we have to introduce new token-colors, G and H say, for (2). With exception of l2’, the processing of G and H is identical to that of G and H. Therefore in the PPP-branch, the token color instances (identifiers) are replaced by the variables x (for G or G ) and y (for H or H ), and – a technicality – appropriate guards are attributed to the reactions r 1 to r4. Finally, because the molecules moved onto F6P by the glycolysis resp. the ‘normal’ PPP are C resp. H, the reaction l2’ may be enabled only for H -molecules. Hence, the arc pointing to l2’ gets the label H and the reaction l3 gets the guard [x <> H ]. This leads to P ∗ in Fig. 4 which is appropriate both for computing T-invariants and for simulation because all unreasonable processes and cyclic loops have been excluded. Looking for T-invariants which describe the feasible processes in the net we stay, for a moment, with P rev . Clearly, for each reversible reaction t and the reverse reaction t , the vector [(t, 1), (t , 1)] is a T-invariant. But, these T-invariants lack a sensible biochemical. For this reason all reverse reactions except l2’ have been omitted at the end of section “Models of the glycolysis and pentose phosphate pathway”. On the other hand, we observe that l2’ gives rise to a process that is different from both the GP and the PPP, namely the gluconeogenesis. The tokens of that extra process got the identifiers G and H , leading to the net model P ∗ . As with the S-vectors, also the T-vectors shall be established stepwise, i.e., not automatically but systematically. Apart from being able to see, from the effects computed at each step, the weight factor(s) of the transition(s) to be added successively to the T-vector, there is still another advantage of this approach. If we proceed along the causal chain of transitions, i. e., at each step selecting a subsequent transition which is enabled, we can compile knowledge about the amount of molecules which are needed in course of the run from Gluc to Lac and which have to be restored later during the run or after its end. This information is not provided by the effect of the complete T-vector. It only shows the overall effect of the vector. When looking for the possible processes in the GP/PPP system P ∗ we soon find out that there are (at least) three sorts of processes (modes) that can be run independently from each other. Therefore, we can attribute to each of them a characteristic parameter by which the weighted vector elements are multiplied additionally, namely – gly for the glycolysis pathway, – hex for the pentose (or hexose mono-) phosphate pathway, and – rev for the pathway including the reverse reaction l2’. During the stepwise construction of the T-vector(s) we gather, on the one hand, information about those molecules that are needed at the beginning or in the course of a run to reach its end at Lac. These molecules and their amounts are: The final result of the construction is the complete parameterized T-vector. 5 τ = [(l1, [(1, gly‘(x C)), (1, hex‘(x G)), (2, hex‘(x H)), (1, rev‘(x G ))]), (l2, [ (1, gly‘( )) ]), 5
Adopting the conventions of the package SY in a simplified version, T-vectors are written as lists of pairs (transition name, list of weighted substitutions). A weighted substitution consists of an integer weight, followed by an integer parameter, a multiplication sign “‘” and a substitution in parentheses ( ). A substitution, represented by “ ”, indicates which variable(s) of the arc labels have to be substituted by which color(s). If the arc labels are constant colors the don’t-care symbol ( ) is used. The effect of a T-vector τ is presented as a list of constructs s: lico(CS), where lico(CS) denotes a linear combination of tokens that has to be added to (or subtracted from) place s to make τ a T-invariant.
68
K. Voss et al. / PN Analysis of Metabolic Systems
Fig. 4. The net P ∗ with three flux modes.
(l3, [ (1, gly‘(x C)), (2, hex‘(x H)) ]), (l4, [ (1, (gly + 2 · hex)‘( )) ]), (l5, [ (1, (gly + 2 · hex)‘( )) ]), (l7, [ (1, (2 · gly + 5 · hex +rev)‘( )) ]), (l8, [ (1, (2 · gly + 5 · hex + rev)‘( )) ]), (m1, [ (1, hex‘(x G)), (1, rev‘(x G )), (2, hex‘(x H)), (2, rev‘(x H )) ]),
K. Voss et al. / PN Analysis of Metabolic Systems Consumed substances ADP: (2 · gly + 5 · hex + rev)‘D F6P: 2 · rev ‘H GSSG: (6 · hex + 6 · rev)‘D NAD+ : (2 · gly + 5 · hex + rev)‘D (2 · gly + 5 · hex + rev)‘D Pi :
ATP: Gluc: H2 O: NADP+ :
Produced substances ATP: (4 · gly + 10 · hex + 2 · rev)‘D F6P: 2 · rev‘H H2 O: (2 · gly + 5 · hex + rev)‘D NAD+ : (2 · gly + 5 · hex + rev)‘D
(G) (P) (R)
Parameters gly = 1 hex = 1 rev = 1
69
(2 · gly + 5 · hex + rev)‘D gly ‘C + hex ‘G + 2 · hex ‘H + rev‘G (3 · hex + 3 ∗ rev)‘D (6 · hex + 6 · rev)‘D
CO2 : GSSG: Lac: NADP+ :
(3 · hex + 3 · rev)‘D (6 · hex + 6 · rev)‘D 2 · gly‘C + 5 · hex‘H + rev ‘H (6 · hex + 6 · rev)‘D
Overall reaction 2 ADP + Gluc + 2 Pi = 2 ATP + 2 Lac + 2 H2 O 5 ADP + 3 Gluc + 5 Pi = 5 ATP + 5 Lac + 3 CO2 + 2 H2 O ADP + Gluc + Pi + 2 H2 O = ATP + Lac + 3 CO2
(m2, [ (6, (hex + rev)‘( )) ]), (m3, [ (6, (hex + rev)‘( )) ]), (r 1, [ (1, hex‘(x G)), (1, rev‘(x G’)) ]), (r 2, [ (2, hex‘(y H)), (2, rev‘(y H’)) ]), (G,H))), (1, rev‘((x,y) (G ,H ))) ]), (r 3, [ (1, hex‘((x,y) (r 4, [ (1, hex‘((x,y) (G,H))), (1, rev‘((x,y) (G ,H ))) ]), (r 5, [ (1, hex‘(y H)), (1, rev‘(y H )) ]), (l2 , [ (2, rev‘( )) ]) ]. The T-vector τ represents three vectors, one for each parameter, which correspond to the expected three elementary modes. They are identical to those computed by S. Schuster, using METATOOL. Applying the function EFFECT from the package SY to τ yields its effect, which is equal to the difference between Produced substances and Consumed substances, ADP: − 2 · gly‘D − 5 · hex‘D - rev‘D, ATP: 2 · gly‘D + 5 · hex‘D + rev‘D, CO2 : 3 · hex‘D + 3 · rev‘D, Gluc: − gly‘C - hex‘G − 2 · hex‘H − rev‘G , H2 O: 2 · gly‘D + 2 · hex‘D − 2 · rev‘D, Lac: 2 · gly‘C + 5 · hex‘H + rev‘H , Pi : − 2 · gly‘D − 5 · hex‘D − rev‘D. Neglecting the token colors, this leads to the parameterized equation for the effect of τ (2 · gly + 5 · hex + rev)‘ADP + (gly + 3 · hex + rev)‘Gluc + (2 · gly + 5 · hex + rev)‘P i + 2 · hex‘H2 O = (2 · gly + 5 · hex + rev)‘ATP + (2 · gly + 5 · hex + rev)‘Lac + (3 · hex + 3 · rev)‘CO 2 + (2 · gly + 2 · hex)‘H2 O yielding the three overall reaction equations for the elementary modes. T-invariants describe processes in a Petri net which restore the marking with which they started and thus can be executed cyclically. Clearly, τ is not a T-invariant. Because in P ∗ all paths containing Gluc and Lac are not cyclic, no full T-vector at all can be a T-invariant. Therefore, we modify the model P ∗ by glueing it with a subnet P se (depicted in Fig. 5) that closes the cycle from Lac to Gluc. This subnet contains a place StartEnd, initially marked by a dummy token D, and the transitions s1 for starting and s2 for ending a cyclic run. The transitions s1 and s2 are intended to compensate the non-null effects. To
70
K. Voss et al. / PN Analysis of Metabolic Systems
Fig. 5. The subnet P se completing P ∗ to form a cycle.
this end, we connect appropriate substances of P ∗ (as fusion places) with s1 and/or s2. At this stage, one of the advantages of the stepwise construction of the T-vector τ becomes clear. The tables Consumed Substances resp. Produced Substances, derived above, exactly inform about those substances and their amounts that have to be provided by s1 resp. removed by s2 to arrive at a (parameterized) T-invariant. Combining this subnet P se with P ∗ by means of the fusion places, yields the cyclic net model that we aimed at. Let denote the T-vector achieved by adding the elements (s1, [ (1, ( )) ]) and (s2, [ (1, ( )) ]) to τ . Then, this T-vector has no effect and hence is a parameterized T-invariant. The minimal T-invariants derived by setting one of the parameters gly, hex, or rev to 1 (and the remaining two to 0) are, in a short-hand notation, τG = [ (l1, C), (l2, D), (l3, C), (l4, D), (l5, D), (l7, 2 · D), (l8, 2 · D), (s1, D), (s2, D) ], τP = [ (l1, G + 2 · H), (l3, 2 · H), (l4, 2 · H), (l5, 2 · H), (l7, 5 · H), (l8, 5 · H), (m1, G + 2 · H), (m2, 6 · D), (m3, 6 · D), (r 1, G), (r2, 2 · H), (r 3, (G,H)), (r 4, (G,H)), (r 5, H), (s1, D), (s2, D) ], τR = [ (l1, G ), (l2 , 2 · H ), (l7, H ), (l8, H ), (m1, G + 2 · H ), (m2, 6 · D), (m3, 6 · D), (r 1, G ), (r 2, 2 · H ), (r 3, (G ,H )), (r 4, (G ,H )), (r5, H ), (s1, D), (s2, D) ].
The three T-invariants τG , τP , τR are linearly independent and hence form a basis.
K. Voss et al. / PN Analysis of Metabolic Systems
71
Biochemical evaluation of the T-invariants The software package METATOOL [Pfeiffer et al., 1999] allows to compute the elementary (flux) modes (corresponding to the ’non-cyclic portions’ of the minimal T-invariants) of a pathway. For each mode, it computes (1) the T-vector, determining which reactions have to occur how often to proceed from the sources to the sinks, and (2) the overall reaction equation. With the colored Petri net approach and applying the package SY, we get additional information not only about the T-invariants but also about the dynamics of the system. The symbolic treatment of the T-vectors yields as one crucial result the marking (amount of molecules), needed at the beginning and provided by the starting transition s1, to run the system without deadlock from its source to the sink. This initial marking is ‘appropriate’ because it is the minimum amount of molecules necessary for a simulation. Moreover, the stepwise construction of the symbolic parameterized T-invariants yields knowledge not only about the frequency of transition occurrences (during a run along the invariant) but also about the partial order in which these transitions have to occur. An interesting question arises concerning the independence of the three T-invariants. Theoretically, they are linearly independent because the transitions l2 and l2 are treated as not being related to each other. If however l2 and – l2 are identified, the T-vectors get linearly dependent. This corresponds to the observation that the overall reactions (G), (P), (R) are related to each other by the equation (P) = 2 · (G) + (R). The problem, however, lies in the fact that a steady state process including both a reaction (l2) and its reverse (l2 ) is biochemically not feasible. And on the other hand, T-vectors with negative elements cannot be T-invariants according to the definition given in section “Steady state pathways, elementary modes”. The construction of the compound net P ∗ can also be looked at from a different perspective, throwing more light on the nature of the token colors and the conflicts. Let us discuss the three independent modes identified in the previous subsection “Effects and T-invariants”, as separate net models. They are depicted in a simplified version as Fig. 6, omitting all ubiquitous molecules and the ’uncritical’ reactions m2 and m3. The first mode (G), glycolysis, contains no conflict. So, only one token color, C, is needed. The second mode (P) has two internal conflicts at Ru5P and GAP which are decided by use of the two colors G and H. The third mode (R) contains the same two conflicts as (P), now solved by G and H . These ‘mode specific’ conflicts describe (model) situations as happening in reality, with a great number of molecules of every substance involved. From the definition of steady state follows that no molecule inserted by the source may get stuck on its way to the sink. If it would, the concentration of an intermediate substance would be increased, contradicting the definition. Looking at the right hand branch of (P) in Fig. 6, the tokens entering that branch at Ru5P can leave it only as F6P- or GAP-molecules by means of r 4 and r 5. The reaction r4 needs one G and one H, and r 5 one additional H. The one G or two H tokens, resp., can only be provided by r 1 occurring once or by r 2 occurring twice, respectively. In organisms, the molecules of one substance cannot be distinguished and cannot be forced to choose one out of more alternative paths. Yet, the transitory increase of a substance concentration leads to a slowing down of reactions producing it and an acceleration of reactions consuming it. The opposite happens in case of a concentration decrease. So, in the long run, a relative occurrence ratio of 1:2 will be established among r 1 and r 2. In contrast to the mode specific conflicts, the remaining ones are consequences of glueing the mode nets (G), (P), and (R) into one single model P ∗ . As these three processes are independent from each
72
K. Voss et al. / PN Analysis of Metabolic Systems
Fig. 6. The (simplified) three modes of P ∗ .
other, performing linear independent T-invariants, the parameters gly, hex and rev can, in principle, be chosen arbitrarily. This implies that the relative frequency among this kind of conflicting reactions, for example l2 and m1, depends merely on the choice of the parameters and not on a biochemical law that would require a constant frequency ratio. In an organism, these reaction ratios are controlled mainly by the current needs of the cell, governing the relative activities of the respective enzymes. The flow of G6P or Gluc depends on the need for NADPH, R5P, and ATP in the cell. Based on experimental observation, biochemists distinguish between four ’modes’ (which we will call T-modes, in order to not mistake them for the elementary flux modes) of the combined GP/PPP [see Stryer, 2006]. We finish this section with a short discussion of these T-modes and their relationships to our results, neglecting again H + .
K. Voss et al. / PN Analysis of Metabolic Systems
73
T-Mode 1 is adopted when more R5P than NADPH is required, for example in rapidly dividing cells needing R5P for the synthesis of nucleotide precursors of DNA. Most of G6P is converted into F6P and GAP by the GP (l2, l3, l4). Transaldolase (r 4 ) and transketolase (r 3 ) then convert 2 F6P- and 1 GAPinto 3 R5P-molecules. The reaction reads 5 G6P + ATP → 6 R5P + ADP. First, we recognize that we have to return to the model P rev which contains all reverse reactions needed. Secondly, we observe that the process (reaction path) does not transform Gluc into Lac, hence, does not represent a full T-vector. As a consequence, the chosen token colors are no longer appropriate. Bearing this in mind, we construct the T-vector [(l2, [ (5, ( )) ]), (l3, [ (1, ( ) ]), (l4, [ (1, ( ) ]), (l5, [ (1, ( ) ]), (r 1, [ (4, ( ) ]), (r 5 , [ (2, ( ) ]), (r 4 , [ (2, ( ) ]), (r 3 , [ (2, ( ) ]), (r 2, [ (4, ( ) ]) ] and compute its effect, yielding ADP: D, ATP: – D, F6P: 5 ‘C – 2 ‘H – 3‘x, G6P: – 5 ‘C, GAP: 2 ‘D – 2 ‘H, R5P: 6 ‘G, Ru5P: – 4 ‘G + 4 ‘H. Identifying all token colors with D, i.e., the effects for F6P, GAP, and Ru5P disappear, leads to the desired overall effect ADP: D, ATP: – D, G6P: – 5 ‘D, R5P: 6 ‘D which exactly reflects the reaction formula above. T-mode 2 is adopted when the needs for NADPH and R5P are balanced. Then the oxidative branch of the PPP is executed, converting G6P into NADPH and R5P via m1 and r 1. The reaction formula is G6P + 2 NADP+ + H2 O → R5P + 2 NADPH + CO2 . For the T-vector [(m1, [ (1, (x ← G)) ]), (r 1, [ (1, ( ) ])] we verify the corresponding effect CO2 : D, G6P: –G, H2 O: –D, NADPH: 2‘D, NADP+ : –2‘D, R5P: G. T-modes 3 and 4 are adopted when much more NADPH than R5P is required. T-mode 3 reads G6P + 12 NADP+ + 7 H2 O → 6 CO2 + 12 NADPH + Pi . It includes, apart from l4 and l2 , a reaction l3*: FBP → F6P, catalyzed by fructose-1,6-biphosphatase, which is part of the gluconeogenesis and hence outside the scope of the GP/PPP system covered by the models of this paper. Note that both l3* and l3: F6P → FBP are irreversible. T-mode 4, according to Stryer, 2006, is characterized by the reaction formula 3 G6P + 6 NADP+ + 5 NAD+ + 5 Pi + 8 ADP →5 Pyr + 3 CO2 + 6 NADPH + 5 NADH + 8 ATP + 2 H 2 O. If this process is expanded to start with Gluc (instead of G6P) and to end with Lac (instead of Pyr), the result corresponds precisely to the T-invariant τ P derived in the previous subsection “Effects and T-invariants”.
74
K. Voss et al. / PN Analysis of Metabolic Systems
CONCLUSIONS This paper applies higher-level Petri nets to the design, qualitative analysis, and execution of metabolic steady state system models. Compared to low-level Petri nets and to algebraic methods and tools from biochemistry, this approach renders important new results about the invariants and the processes of (sufficiently complex) metabolic pathways. The crucial point of using high-level nets is the ability to discriminate metabolites, if necessary, according to their topological environment, i.e., the reaction chains in which they are involved. On this basis, models can be developped which can be simulated smoothly and can be subjected to a rigorous symbolic analysis. This has been demonstrated for the rather complex sample of the combined glycolysis and pentose phosphate pathways. Our main results are the following. Firstly, some full S-invariant of the sample net were found that represent interesting, non-trivial preservation laws for the total amounts of certain atoms or molecules in the system. Additionally, their incremental construction may reveal inconsistencies or deficiencies of the examinated model. Secondly, the elementary modes (and the corresponding T-invariants) and their overall reaction equations as computed by METATOOL have been verified. These three T-invariants have been represented as one parameterized vector. Moreover, not only the number of reaction occurrences related to a T-invariant, but also their partial order has been determined. Thirdly, the sample net model can be simulated cyclically, restoring the initial system state at the end of each cycle, avoiding deadlocks, and respecting the inherent concurrency. Fourthly, a biochemical interpretation of high-level Petri net models of steady state pathways and their invariants may enhance the understanding of metabolic processes. A most interesting topic for further research is the question whether or to which extent the search for and the construction of S- and T-invariants can be automated. Moreover, the significance of (full) S-invariants and defects deserves an increased attention. Finally, the application of symbolic analysis to less understood metabolic systems is expected to lead to valuable new results. ACKNOWLEDGEMENTS We thank for the financial support by the BMBF (Federal Ministry of Education and Research of Germany), BCB project number 0312705D. Further we would like to thank Hartmann Genrich and Stefan Schuster for fruitful discussions. Abbreviations
ADP BPS DPG FBP GAP GSH G6P
Metabolites/Compounds Adenosine diphosphate ATP Adenosine triphosphate 1,3-Biphosphoglycerate DHAP Dihydroxyacetone phosphate 2-Phosphoglycerate E4P Erythose-phosphate Fructose biphosphate F6P Fructose-phosphate Glyceraldehyde-phosphate Gluc Glucose Glutathione GSSG Glutathionedisulfide Glucose-phosphate Lac Lactate
K. Voss et al. / PN Analysis of Metabolic Systems NADH NAD+ NADPH NADP+ PEP Pyr R5P TPG
l1 l3 l5 l7 l8 m1 m3 r2 r4
75
Nicotinamide adenine dinucleotide, reduced form NADp, Nicotinamide adenine dinucleotide, oxidized form Nicotinamide adenine dinucleotide phosphate, reduced form NADPp, Nicotinamide adenine dinucleotide phosphate, oxidized form Phosphoenolpyruvate Pi Orthophosphate, ionic form Pyruvate Ru5P Ribulose-phosphate Ribose-phosphate S7P Sedoheptulose-phosphate 3-Phosphoglycerate Xu5P Xylulose-phosphate
Correspondence between Petri net transitions and enzymatic reactions Hexokinase l2 Phosphoglucose isomerase Phosphofructokinase l4 Aldolase Triosephosphate isomerase (forw.) l6 Triosephosphate isomerase (backw.) GAP dehydrogenase Reaction path consisting of: phosphoglycerate kinase, phosphoglycerate mutase, enolase, pyruvate kinase, and lactate dehydrogenase G6P oxidation reactions m2 Glutathione reductase Glutathione oxidation reaction r1 Ribulose-phosphate isomerase Ribulose-phosphate epimerase r3 Transketolase Transaldolase r5 Transketolase
REFERENCES • Design/CPN. http://www.daimi.au.dk/designCPN/. • Genrich, H. (2002). Dynamical Quantities in Net Systems. Formal Aspects of Computing 14, 55-89. • Genrich, H., Kueffner, R. and Voss, K. (2001). Executable Petri Net Models for the Analysis of Metabolic Pathways. Int. J. STTT 3, 394-404. • Goss, P. J. E. and Peccoud, J. (1998). Quantitative modeling of stochastic systems in molecular biology by using stochastic Petri nets. Proc. Natl. Acad. Sci. USA 95, 6750-6755. • Goss, P. J. E. and Peccoud, J. (1999). Analysis of the stabilizing effect of Rom on the genetic network controlling ColE1 plasmid replication. Pac. Symp. Biocomp. 4, 65-76. • Heinrich, R. and Schuster, S. (1996). The Regulation of Cellular Systems. Chapman and Hall, New York. • Heinrich, R. and Schuster, S. (1998). The modeling of metabolic systems. Structure, control and optimality. BioSystems 47, 61-77. • Hofestaedt, R. (1994). A Petri Net Application of Metabolic Processes. Journal of System Analysis, modeling and Simulation 16, 113-122. • Hofestaedt, R. and Thelen, S.(1998). Quantitative Modeling of Biochemical Networks. In Silico Biol. 1, 0006. • Koch, I., Heiner, M. (2008) Petri nets in biological network analysis. In: Junker, B., Schreiber, F. (eds), Analysis of Biological Networks. Wiley and Sons Book Series on Bioinformatics, Chapter 7, pp. 139-179. • Koch, I, Junker, B. H. and Heine, M. (2005) Application of Petri net theory for modeling and validation of the sucrose breakdown pathway in the potato tuber. Bioinformatics. 21(7), 1219-1226. • Koch, I., Schuster, S. and Heiner, M. (2000). Using time-dependent Petri nets for the analysis of metabolic networks. In: DFG-Workshop: Informatikmethoden zur Analyse und Interpretation grosser genomischer Datenmengen, Hofestaedt, R., Lautenbach, K. and Lange, M. (eds.), Magdeburg, pp. 15-21. • Matsuno, H., Doi, A., Nagasaki, M. and Miyano, S. (2000). Hybrid Petri net representation of gene regulatory network. Pac. Symp. Biocomput. 5, 338-349. • Pfeiffer, T., S´anchez-Valdenebro, I., Nu˜no, J. C., Montero, F. and Schuster, S. (1999). METATOOL: For studying metabolic networks. Bioinformatics 15, 251-257. • Reddy, V. N., Mavrovouniotis, M. L. and Liebman, M. N. (1993). Petri Net Representation in Metabolic Pathways. In Proc. First Intern. Conf. on Intelligent Systems for Molecular Biology, Hunter, L. et al. (please, complete list of editors!) (eds.), AAAI Press, Menlo Park, pp. 328-336 • Reddy, V. N., Liebman, M. N. and Mavrovouniotis, M. L. (1996). Qualitative analysis of biochemical reaction systems. Comput. Biol. Med. 26, 9-24. • Schuster, S. and Hilgetag, C. (1994). On elementary flux modes in biochemical reaction systems at steady state. J. Biol. Syst. 2, 165-182.
76
K. Voss et al. / PN Analysis of Metabolic Systems • Schuster, S., Hilgetag, C., Woods, J. H. and Fell, D. A. (1996). Elementary Modes of Functioning in Biochemical Networks. In: Computation in Cellular and Molecular Biological Systems, Cuthbertson, R., Holcombe, M. and Paton, R. (eds.), World Scientific, Singapore, pp. 151-165. • Schuster, S., Fell, D. A. and Dandekar, T. (2000a). A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks. Nature Biotechnol. 18, 2000, 326-332. • Schuster, S., Pfeiffer, T., Moldenhauer, F., Koch, I. and Dandekar, T. (2000b). Structural Analysis of metabolic Networks: Elementary Flux Modes, Analogy to Petri Nets, and Application to Mycoplasma pneumoniae. In: Bauer, E.-B., Rost, U., Stoye, J., Vingron, M. (eds.), Proc. Germ. Conf. Bioinf., Heidelberg, Logos Verlag Berlin, pp. 115-120. • Stryer, L., Berg, J.M. and Tymoczko, J.L. (2006). Biochemistry. W. H. Freeman and Co., New York.
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2003, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-77
77
Biopathways Representation and Simulation on Hybrid Functional Petri Net Hiroshi Matsunoa,∗, Yukiko Tanakaa , Hitoshi Aoshimaa , Atsushi Doia , Mika Matsuib and Satoru Miyanoc a
Faculty of Science, Yamaguchi University, Yamaguchi, Japan Oshima National College of Maritime Technology, Oshima, Japan c Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan b
ABSTRACT: The following two matters should be resolved in order for biosimulation tools to be accepted by users in biology/medicine: (1) remove issues which are irrelevant to biological importance, and (2) allow users to represent biopathways intuitively and understand/manage easily the details of representation and simulation mechanism. From these criteria, we firstly define a novel notion of Petri net called Hybrid Functional Petri Net (HFPN). Then, we introduce a software tool, Genomic Object Net, for representing and simulating biopathways, which we have developed by employing the architecture of HFPN. In order to show the usefulness of Genomic Object Net for representing and simulating biopathways, we show two HFPN representations of gene regulation mechanisms of Drosophila melanogaster (fruit fly) circadian rhythm and apoptosis induced by Fas ligand. The simulation results of these biopathways are also correlated with biological observations. The software is available to academic users from http://www.GenomicObject.Net/. KEYWORDS: Petri net, modeling, simulation, circadian rhythms, apoptosis, Genomic Object Net
INTRODUCTION Considerable attention has been paid to the biopathway representation and simulation in the literature. The most traditional approach is to employ ordinary differential equations (ODEs) such as MichaelisMenten equations and to represent biochemical reactions as a systems of ODEs. This approach provides mathematically well-founded and fine interpretations of biopathways, especially for enzyme reactions. Gepasi [1] is a software package based on this approach for modeling biochemical systems and it aims at assisting users in translating reaction processes to matrices and ODEs. E-Cell [2] is a system for representation and simulation with GUI and, with this tool, a model of a hypothetical cell with only 127 genes sufficient for transcription, translation, energy production and phospholipid synthesis has been constructed. As is stressed in [3,4], in order for software tools to be accepted by users in biology/medicine for biopathway modeling, we consider the following two matters should be resolved, at least: (1) Remove issues which are biologically irrelevant; otherwise, users might be unnecessarily burdened with special ∗
Corresponding author. E-mail:
[email protected].
78
H. Matsuno et al. / Biopathways Representation and Simulation on Hybrid Functional Petri Net
Fig. 1. Elements of hybrid (functional) Petri net.
notions in mathematics, physics and computer science which are irrelevant to biology/medicine. (2) Allow users to represent biopathways intuitively and understand/manage easily details of the representation and simulation mechanism; otherwise, users could not have confidence that the understanding and knowledge in their minds coincides with the object represented with the software tools. From these criteria, in this paper, we firstly define a novel notion of Petri net called hybrid functional Petri net (HFPN) by extending the notions of hybrid Petri net [5] and functional Petri net [6] so that the notion will be suited for modeling biopathways. Then, we introduce a software tool, Genomic Object Net (GON), for representing and simulating biopathways, which we have developed by employing the architecture of HFPN. GON has an editor and a simulator of HFPN with a graphical user interface (GUI) which shall resolve the matters (1) and (2). In order to demonstrate the effectiveness of GON for representing and simulating biopathways, we will present two HFPN models: the circadian rhythm of Drosophila as an example of a gene regulatory mechanism and apoptosis induced by Fas ligand as an example of signal transduction. HYBRID FUNCTIONAL PETRI NET Ordinary differential equations (ODEs) are widely accepted to express biological phenomena such as biochemical reactions. But in this approach, it is rather difficult to observe the whole system intuitively such as a picture if the system constitutes a large network of cascades. Although the discrete Petri net model allows very intuitive graphical representation, the mechanism of ODEs cannot be directly realized because the discrete Petri net model deals with only integers as the contents of places. For sophisticated dynamic systems in which control mechanisms of genes and chemical reactions with enzymes are concurrently performed, it is more reasonable to use real numbers for representing the amounts of some objects, e.g. the concentrations of a protein, mRNA, complex of proteins, metabolites, etc. The hybrid Petri net model (HPN) [5] has been introduced as an extension of the discrete Petri net model so that it can handle real numbers in a continuous way and it allows us to express explicitly the relationship between continuous values and discrete values while keeping the good characteristics of discrete Petri net soundly. Drath [7, 8] has also enhanced this notion to define the hybrid dynamic net model (HDN) for modeling more complex systems. In HPN/HDN model, two kinds of places and transitions are used, discrete/continuous places and discrete/continuous transitions. A discrete place and a discrete transition are the same notions as used in the discrete Petri net model. A continuous place holds a nonnegative real number as its content. A continuous transition fires continuously in the HPN/HDN model and its firing speed is given as a function of values in the places in the model. For graphical notations, discrete transition, discrete place, continuous transition and continuous place are drawn as shown in Fig. 1.
H. Matsuno et al. / Biopathways Representation and Simulation on Hybrid Functional Petri Net
79
Fig. 2. Model for reaction decomposing dimers to monomers. (a) Reaction decomposing dimers to monomers. (b) Hybrid Petri net. In continuous places Pd and Pm , concentrations of the dimer and the monomer are stored, respectively. At continuous transitions T1 and T2 , same firing speeds are assigned as a reaction speed. Integers “1” represent weights of arcs. (c) hybrid functional Petri net model for the reaction. Note that, at the transition T, the reaction speed is assigned. Pd and Pm are the same as ones in (b). Different weights “1” and “2” are assigned to two arcs.
From the definition of HPN/HDN, the firing speed of a continuous place must be the same as the consuming speed through each arc from its source place and the contents of all source places are consumed with the same speed. This speed is also the same as the production speed through each arc from the transition. This is the unfavorable feature of HPN/HDN for biopathway simulation. For example, consider a reaction in which a dimmer is cleaved to two monomers (Fig. 2(a)). This reaction in the HDN model could be represented as shown in Fig. 2 (b) by using a test arc and a transition for amplification (note that the amounts consumed and produced in places by continuous transition firing are the same by definition while the amount of monomers is twice as large as that of dimers). But it is neither intuitive nor natural at all. It may be obvious that this feature of HPN/HDN is a severe drawback in modeling biopathways. On the other hand, some favorable features have been also introduced in Petri net theory. In addition to normal arc explained so far, inhibitory arc and test arc have been defined for convenience (Fig. 1). An inhibitory arc with weight r enables the transition to fire only if the content of the place at the source of the arc is less than or equal to r . For example, an inhibitory arc can be used to represent the function of “repress” in gene regulation. A test arc does not consume any content of the place at the source of the arc by firing. For example, test arcs can be used to represent the transcription process since nothing is consumed by this process except for degradation. Definition 1 A hybrid functional Petri net (HFPN) is defined by extending the notion of transition of HPN/HDN [5,7,8] in the following way: HFPN has five kinds of arcs; discrete input arc, continuous input arc, test input arc, discrete output arc, and continuous output arc. A discrete input arc (continuous input arc) is directed to a discrete transition (continuous transition) from a discrete/continuous place (continuous place) from which it consumes the content of the source place by firing. A test input arc is directed from a place of any kind to a transition of any kind. It does not consume the content of the source place. These three arcs are called input arcs. A discrete output arc is directed from a discrete transition to a place of any kind. A continuous output arc is directed from a continuous transition to a continuous place. These two arcs are called output arcs. 1. Continuous transition: A continuous transition T of HFPN consists of continuous/test input arcs a1 , . . . , ap from places P1 , . . . , Pp to T and continuous output arcs b 1 , . . . , bq from T to continuous places Q1 , . . . , Qq . Let m1 (t), . . . , mp (t) and n1 (t), . . . , nq (t) be the contents of P1 , . . . , Pp and Q1 , . . . , Qq at time t, respectively. The continuous transition T specifies the following: a. The firing condition is given by a predicate c(m 1 (t), . . . , mp (t)). As long as this condition is true, T fires continuously.
80
H. Matsuno et al. / Biopathways Representation and Simulation on Hybrid Functional Petri Net
b. For each input arc ai , T specifies a function f i(m1 (t), . . . , mp (t)) > 0 which defines the speed of consumption from Pi when it is firing. If ai is a test input arc, then we assume f i 0 and no amount is removed from Pi . Namely, d[ai ](t)/dt = fi (m1 (t), . . . , mp (t)), where [ai ](t) denotes the amount removed from Pi at time t through the continuous input arc a i during the period of firing. c. For each output arc b j , T specifies a function g j (m1 (t), . . . , mp (t)) > 0 which defines the speed of amount added to Q j at time t through the continuous output arc b j when it is firing. Namely, d[bj ](t)/dt = gj (m1 (t), . . . , mp (t)), where [bj ](t) denotes the amount of the contents added to Qj at time t through the continuous output arc b j during the period of firing. 2. Discrete transition: A discrete transition T of HFPN consists of discrete/test input arcs a 1 , . . . , ap from places P1 , . . . , Pp to T and discrete output arcs b 1 , . . . , bq from T to places Q1 , . . . , Qq . Let m1 (t), . . . , mp (t) and n1 (t), . . . , nq (t) be the contents of P1 , . . . , Pp and Q1 , . . . , Qq at time t, respectively. The discrete transition T specifies the following: a. The firing condition is given by a predicate c(m 1 (t), . . . , mp (t)). If this is true, T gets ready to fire. b. The delay function given by a nonnegative integer valued function d(m 1 (t), . . . , mp (t)). If the firing condition gets satisfied at time t, T fires in delay d(m 1 (t), . . . , mp (t)). However, if the firing condition is changed during this delay time, the transition T looses the chance of firing and the firing condition will be reset. c. For each input arc a i , T specifies a nonnegative integer valued function f i (m1 (t), . . . , mp (t)) > 0 which defines the number of tokens (integer) removed from P i through arc ai by firing. If ai is a test input arc, then we assume f i 0 and no token is removed. d. For each output arc b j , T specifies a nonnegative integer valued function g j (m1 (t), . . . , mp (t)) > 0 which defines the number of tokens (integer) are added to Q j through arc bj by firing. In Fig. 3, examples of continuous transition and discrete transition are shown. From the above definition, it may be obvious that in the HFPN model, the dimer-to-monomers reaction can be intuitively represented as Fig. 2 (c). Not only this simple example but also more complex interactions can be easily and intuitively described with HFPN. The software GON is developed and implemented based on this HFPN architecture. CIRCADIAN RHYTHMS IN DROSOPHILA The control mechanism of autoregulatory feedback loops of Drosophila circadian rhythms has been intensively studied [9–13,16] and some fine modelings by ODEs with detailed coefficients have also been reported [14,15]. These ODE-based models can be easily described with HFPNs with GON. Highly appreciating such fine modelings, we first show an HFPN realization of the model due to Ueda et al. [15]. Moreover, we also show that an HFPN can be designed with GON easily and intuitively by interpreting the biological facts and observations given in [9–13,16]. GON is intended to be a naive platform where we can create hypotheses and evaluate them by simulation. This feature is especially important when only rough modeling is enough or enough information is not available for fine modeling. Figure 4 shows the scheme of the regulatory mechanism of five genes contributing to the Drosophila circadian rhythms; period (per), timeless (tim), Drosophila Clock (dClk), cycle (cyc) and double-time
H. Matsuno et al. / Biopathways Representation and Simulation on Hybrid Functional Petri Net
81
Fig. 3. Continuous and discrete transitions of hybrid functional Petri net. (a) An example of continuous transition. Four input arcs are attached to continuous transition TC : two continuous input arcs from continuous places P 1 and P4 , and two test input arcs from continuous place P2 and discrete place P3 . ai is the weight of arc from place Pi for i = 1, 2, 3, 4. Two continuous arcs are headed from the transition TC to continuous places Q1 and Q2 , respectively. Variables b1 and b2 are assigned to these arcs as weights. (b) An example of discrete transition. Four input arcs are attached to discrete transition TD : two discrete input arcs from discrete place P1 and continuous place P3 , and two test input arcs from discrete place P2 and continuous place P4 .ai is the weight of arc from place Pi for i = 1, 2, 3, 4. Two output arcs are headed from the transition TD to discrete place Q1 and continuous place Q2 . Variables b1 and b2 are assigned to these arcs as weights.
Fig. 4. The gene regulation in the Drosophila circadian oscillator is schematized.
(dbt). It is known that the Drosophila circadian feedback system is composed of two interlocked negative feedback loops [10]. Roughly speaking, PER and TIM proteins collaborate in the regulation of their own expression in Drosophila, assembling in PER-TIM complexes that permit nuclear translocation, inactivation of per and tim transcription in a cycling negative feedback loop, and activation of dClk transcription which participates in dCLK-CYC negative feedback loop. The dCLK and CYC form heterodimers that activate per and tim transcriptions and inhibitdClk transcription. Among these five genes, three genes, per, tim, and dClk, are rhythmically expressed: per and tim mRNA levels begin to rise in the subjective day and to peak early in the subjective evening, and dClk mRNA level peaks late at night to early in the morning. Although per and tim mRNAs reach peak levels in the evening, PER and TIM levels do not peak until late evening. It is considered that this delay results from the initial destabilization of PER by DBT-dependent phosphorylation followed by the stabilization of PER
82
Fig. 5. dP erm dt
H. Matsuno et al. / Biopathways Representation and Simulation on Hybrid Functional Petri Net
A HFPN realization of the circadian rhythm model due to Ueda et al. [15].
= C1 + S1
1+
CCn a +B1 A1 P Tn r CCn a + R1 A1
+B1
A series of ten ODEs, e.g.
erm − D1 L1P+P − D0 P erm are realized in this network, where Perm (CCn , erm
PTn ) represents the concentration of per mRNA (dCLK-CYC complex in the nucleus, PER-TIM complex in the nucleus) and C1 = 0 nM/h, S1 = 1.4 nM/h, A1 = 0.45 nM/h, B1 = 0, L1 = 0.3 nM/h, D0 = 0.012 nM/h, D1 = 0.94 nM/h, R1 = 1.02 nM/h.
Fig. 6. A naive HFPN representation of Drosophila circadian mechanism in which five genes per, tim, dClk, cyc, and dbt participate. An HFPN file including all transition parameters can be downloaded from the URL http://www.GenomicObject.Net/.
by dimerization with TIM [12,13]. The details of the mechanism are surveyed in [9,11,16]. Ueda et al. [15] have modeled the two interlocked negative feedback loop system [10] with ODEs and made extensive simulation and mathematical analysis. We have translated it into an HFPN as in Fig. 5 and further computational experiments based on this model are possible on GON with this HFPN file. By using GON, we also designed an HFPN from scratch by interpreting the facts and observations in [9–13,16]. Figure 6 is a naive representation of the gene regulatory mechanism of Drosophila circadian oscillator, where continuous places are introduced and the functions for continuous transitions are defined and tuned so that the simulation results will coincide with the facts and observations.
H. Matsuno et al. / Biopathways Representation and Simulation on Hybrid Functional Petri Net
83
Fig. 7. (a) Behaviors of concentrations of four mRNAs obtained from simulation by GON. (b) By reference to scale markings of time, time difference around four and half hours is observed between the peaks of concentrations of per mRNA and PER.
Complex forming rate of dCLK/CYC at transition T1 is realized by the function m2*m4/20, where m2 and m4 are amounts in places dCLK and CYC, respectively. Complex forming rates of PER/TIM and PER/DBT at the transitions T2 and T3 are realized similarly. Transitions T4 , T5 , and T6 represent the degradation rates of complexes of the corresponding proteins. Figure 7a is the simulation result of the HFPN in Fig. 6. It indicates that this HFPN model representing two negative feedback loops, the PER-TIM feedback and the dCLK-CYC feedback, successfully produce periodic oscillations of per mRNA (m6), tim mRNA (m8), and dClk mRNA (m1), while the concentration of cyc mRNA (m3) keeps constant expression. It is known that the protein TIM stabilizes phosphorylated PER by dimerizing with it. This phenomenon is reflected to the firing speed of transition T 5 , that is, the firing speed of transition T5 (m13/15) is set to be slower than the one of transition T 7 (m7/10). Moreover, it is suggested in [13] that the normal function of protein DBT is to reduce the stability and thus the level of accumulation of monomeric PER proteins. This function is realized in Fig. 6 in transition T 3 . It is clearly expressed in Fig. 7b that there is time difference around four and half hours between the peaks of concentrations of per mRNA and PER which is believed to be arisen from the two facts mentioned above. This indicates that the result of simulation is in good agreement with the experimental observation reviewed in [11]. Price et al. [13] discussed properties of dbt L and dbtS which are mutants of the gene dbt and showed that transcription of the gene per is affected by these mutants. That is, period ofper mRNA in dbt L mutant (dbtS mutant) is longer (shorter) than the one in wild type. The behavior of per mRNAs in these two mutants and wild type is described in Fig. 8. It is obtained by changing the formula at transition T 3 . These simulation results suggest that circadian rhythm is controlled by the complex forming rate of PER and DBT proteins, which is affected by the mutants dbt L and dbtS .
84
H. Matsuno et al. / Biopathways Representation and Simulation on Hybrid Functional Petri Net
Fig. 8. Concentration behaviors of per mRNA; (a) dbtL mutant (b) wild type (c) dbtS mutant. Formula such as m7*m12/85 for the firing speed of transition T3 is described at each graph. The firing speed in dbtS (dbtL ) case is faster (slower) than the firing speed in wildtype case. The firing speed of the transition T3 represents complex forming rate of two proteins PER and DBT.
APOPTOSIS INDUCED BY FAS LIGAND The purpose of this section is to demonstrate that signal transduction pathways can be modeled with GON. We considered a biopathway known for the apoptosis induced by Fas ligand and made a computational experiment for evaluating the effect of an autocatalytic process. Apoptosis, programmed cell death, is known to participate in various biological processes such as development, maintenance of tissue homeostasis and elimination of cancer cells [17,18]. Malfunctions of apoptosis have been implicated in many forms of human diseases such as neurodegenerative diseases, AIDS and ischemic stroke. Reportedly, apoptosis is caused by various inducers such as chemical compounds, proteins or removal of NGF. The biochemical pathways of apoptosis are complex and depend on both the cells and the inducers. Fas-induced apoptosis has been studied in detail and its mechanism has been proposed as shown in Fig. 9 [19]. Fas ligands, which usually exist as trimers, bind and activate their receptors by inducing
H. Matsuno et al. / Biopathways Representation and Simulation on Hybrid Functional Petri Net
85
Fig. 9. Proposed steps of apoptosis induced by Fas ligand.
receptor trimerization. Activated receptors recruit adaptor molecules such as Fas-associating protein with death domain (FADD), which recruit procaspase 8 to the receptor complex where it undergoes autocatalytic activation. Activated caspase 8 activates caspase 3 through two pathways; the complex one is that caspase 8 cleaves Bcl-2 interacting protein (Bid) and its COOH-terminal part translocates to mitochondria where it triggers cytochrome c release. The released cytochrome c binds to apoplectic protease activating factor-1 (Apaf-1) together with dATP and procaspase 9 and activates caspase 9. The caspase 9 cleaves procaspase 3 and activates caspase 3. The other pathway is that caspase 8 cleaves procaspase3 directly and activates it. The caspase 3 cleaves DNA fragmentation factor (DFF) 45 in a heterodimeric factor of DFF40 and DFF45. Cleaved DFF45 dissociates from DFF40, inducing oligomerization of DFF40 that has DNase activity. The active DFF40 oligomer causes the internucleosomal DNA fragmentation, which is an apoptotic hallmark indicative of chromatin condensation. We generated an HFPN model for this mechanism with GON. The pathways consist of several
86
H. Matsuno et al. / Biopathways Representation and Simulation on Hybrid Functional Petri Net Table 1 Functions assigned to continuous transitions in the simulation of apoptosis induced by Fas ligand, where mA and mB represent contents of the corresponding continuous places Rate Self-effacement Oligomer Monomer Enzyme binding Enzyme reaction
Unimolecular reaction MA/200 mA/20 mA/10 mA/5 MA * 10
Bimolecular reaction mA * mB/10000 mA * mB/5000 mA * mB/2500
steps where two different pathways from caspase 8 are assumed and many molecules including Fas receptors, caspase family which includes aspartic acid-dependent cysteine proteases and produced from their zymogens, Bcl-2 family which includes pro- and anti-apoptotic proteins, cytochrome c and DNA fragmentation factor. The apoptosis starts from the Fas ligand binding to Fas receptors and ends in the fragmentation of genomic DNA, which is used as a hallmark of apoptosis. Thus the amount of DNA fragmentation can be assumed to be proportional to the cell death. We have designed an HFPN by using the facts about the Fas-induced apoptosis pathways shown in Fig. 9 and biochemical knowledge about reactions. Figure 10 shows the whole HFPN model that we have described with GON. All places/transitions are continuous and parameters are roughly tuned by hand. For Bid (m11), procaspase-9 (m21), procaspase-3 (m25), DFF (m30), DNA (m37), the initial concentration of each compound is assumed to be 100. On the other hand, for FADD (m4), procaspase-8 (m5), Apaf-1 (m17), dATP/ADP (m18), when two compounds react together without the stimulation of apoptosis, the initial concentrations and the rate are assumed to be 39.039 and m1 ∗ m2/5000, respectively to keep the stable state condition. Each compound is assumed to be produced by the rate of 0.5 (represented by a transition without any incoming arc) and to degrade by the rate of its concentration divided by 200 (represented by a transition without any outgoing arc), which will keep its concentration at 100 under the stable state condition. This degradation rate also applies to other compounds in the network. The rate of other processes are determined roughly by following Table 1. Synthesis and catabolism processes are added in the model for all proteins. Autocatalytic processes are also added in the model to all caspases since they exist as proenzymes. The pathway from caspase 8 to caspase 3 is assumed when the caspase 8 concentration is over 30. Protease is often synthesized as a proenzyme (zymogen) and changed to active form by other enzymes or by itself. So an autocatalytic process is added to every caspase reaction. By using the apoptosis scheme modeled as an HFPN, we simulated the DNA fragmentation amount by varying the Fas ligand concentration; Figure 11 shows the simulated relationship. It shows that under very weak stimulation (very low amount of Fas ligand), DNA fragmentation does not occur since the stimulation stops at the intermediate point because of the assumption of degradation processes. With the increase of the stimulation, the reaction proceeds to the backward intermediates and DNA fragmentation (cell death) occurs finally, which increases with the increase of the Fas ligand concentration. There are two pathways from activated caspase 8 to caspase 3, one through several steps including the cytochrome c release from mitochondria when the concentration of activated caspase 8 is low, and the direct one to caspase 3 when the concentration of activated caspase 8 is high [20]. We assume arbitrarily that the direct pathway starts when the concentration of activated caspase 8 is larger than 30. Reportedly the removal of Bid by gene knockout method increases the resistance of liver cell apoptosis by Fas ligand, while it does not affect the apoptosis of thymus and embryonic cells. If the second pathway is included to the scheme, DNA fragmentation increases slightly, especially when the Fas ligand concentration is high (Fig. 11). However the detailed mechanism of the selection of these two pathways from caspase 8 is still unclear and necessary to be studied in future in the laboratory.
H. Matsuno et al. / Biopathways Representation and Simulation on Hybrid Functional Petri Net
87
Fig. 10. An HFPN representing the Fas-induced apoptosis obtained from Fig. 9. Autocatalytic processes (Fig. 12) are surrounded by bold dotted lines. An HFPN file including all transition parameters can be downloaded from the URL http:// www.GenomicObject.Net/.
88
H. Matsuno et al. / Biopathways Representation and Simulation on Hybrid Functional Petri Net Table 2 DNA fragmentation at four autocatalytic rates of caspases DNA Fragmentation Stop time rate0 rate1 rate2 rate3 10 0 0 0 1169 15 251 442 746 1862 20 417 581 885 2048 The autocatalytic rates are: rate0 = 0, rate1 = mA*mB/80000, rate2 = mA*mB/40000, and rate3 = mA*mB/25000. They are assigned to the transition TA in Fig. 12. The stop time represents the period after that Fas ligand stimulation is stopped. The initial Fas ligand concentration is set to be n = 600. Variables mA and mB represent the contents of the continuous places going into TA .
Fig. 11. Simulated relationship between the DNA fragmentation amount and the Fas ligand concentration: At higher concentration of Fas ligand, the direct pathway from caspase 8 to caspase 3 contributes to the fragmentation. To examine the effect of the autocatalytic process of caspases, DNA fragmentation is simulated for both cases of the presence and absence in this process.
Since the presence of an autocatalytic process is proposed in caspase reactions [21], it has been included in our model (Fig. 12), which increases the DNA fragmentation as shown in Fig. 11. However, if a high rate of the autocatalytic process is assumed in the caspase reaction, the DNA fragmentation becomes independent of the Fas ligand concentration, which does not coincide with the experimental results. Therefore, we can guess that autocatalytic processes must be slow if they are present. To examine the effect of autocatalytic processes of caspases on the apoptosis induced by Fas ligand, DNA fragmentation is simulated when the stimulation by Fas ligand stopped after a short period. Table 2 shows a simulation result that the apoptosis proceeds more with the increase of the autocatalytic rate of caspases even for a short period stimulation. Figure 13 shows simulated time courses of the HFPN in Fig. 10 with GON. Some intermediates during apoptosis at three levels of Fas ligand concentrations are measured. These time courses might be useful to plan new experiments such as addition of inhibitors to some step. However, it is necessary to estimate the realistic rates of each reaction by the comparison with the experimental data. It is also necessary to
H. Matsuno et al. / Biopathways Representation and Simulation on Hybrid Functional Petri Net
89
Fig. 12. An HFPN representation of autocatalytic process in Fig. 10.
Fig. 13. Simulated time courses of some intermediates during apoptosis for the Fas ligand concentration n = 210, 450, 600.
add other pathways through Bcl-2 family or p53 to describe the real apoptosis occurring in various cells and by various inducers. CONCLUSIONS The effectiveness of HFPN based biopathway modeling is demonstrated by two examples, the circadian rhythm in Drosophila and the apoptosis induced by Fas ligand. Simulations of these models are performed with the software tool Genomic Object Net and some observations in biological aspects are obtained. We have developed Genomic Object Net Visualizer (GON Visualizer) based on XML technology [22]. GON Visualizer allows us to visualize simulation results of GON, which are exported as CSV files. With this tool users in biology/medicine can perform visualizations of simulation results of the aimed biological phenomenon by creating XML documents in which CSV files produced from GON are included as basic
90
H. Matsuno et al. / Biopathways Representation and Simulation on Hybrid Functional Petri Net
data for simulations. Visualizations of the biopathways introduced in this paper will be reported in the future. Most of existing biopathways simulation tools only compute time courses of concentration behaviors of biological objects such as proteins and mRNAs. However, in general, distributions of these biological objects are not uniform because of compartmentalization. Thus, for more precise simulation, more complex information such as localization of biological objects and molecular level cell-cell interactions should be included in simulation models. In order to address this problem, we introduced the concept of hybrid functional Petri net with extension (HFPNe) and developed “Genomic Object Net ver.1.0” based on the notion of HFPNe (http://www.GenomicObject.Net/). One of the features of HFPNe is that places of the HFPNe can have several types of data such as integer, real, Boolean, string, and vector. With this feature, the HFPNe allows us to include the more complex biological information in computational biopathway models. We will demonstrate modeling methods for describing the computational biopathway model with Genomic Object Net ver.1.0 in the near future. ACKNOWLEDGEMENTS This work was partially supported by the Grand-in-Aid for Scientific Research on Priority Areas “Genome Information Science” from the Ministry of Education, Culture, Sports, Science and Technology in Japan. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]
Mendes, P. (1993). GEPASI: a software for modeling the dynamics, steady states and control of biochemical and other systems. Comput. Appl. Biosci. 9, 563-571. Tomita, M., Hashimoto, K., Takahashi, K., Shimizu, T., Matsuzaki, Y., Miyoshi, F., Saito, K., Tanida, S., Yugi, K., Venter, J. C. and Hutchison, C. A. 3rd. (1999). E-CELL: Software environment for whole-cell simulation. Bioinformatics 15, 72-84. Stelling, J., Kremling, A. and Gilles, E. D. (2000). Towards a virtual biological laboratory. Foundations of Systems Biology, Kitano, H. (eds.). MIT Press, pp. 189-212. Stokes, C. L. (2000). Biological systems modeling: powerful discipline for biomedical e-R&D. AIChE. J.46, 430-433. Alla, H. and David, R. (1998). Continuous and hybrid Petri nets. J. Circ. Syst. Comp. 8, 159-188. Valk, R. (1978). Self-modifying nets, a natural extension of Petri nets. Lecture Notes in Computer Science 62 (ICALP ’78), 464-476. Drath, R. (1998). Hybrid object nets: An object oriented concept for modeling complex hybrid systems. Proc. Hybrid Dynamical Systems, 3rd International Conference on Automation of Mixed Processes, ADPM’98, 437-442. Drath, R., Engmann, U. and Schwuchow, S. (1999). Hybrid aspects of modeling manufacturing systems using modified Petri nets. In: 5th Workshop on Intelligent Manufacturing Systems. Granado, Brasil. Dunlap, J. C. (1999). Molecular bases for circadian clocks. Cell 96, 271-290. Glossop, N. R., Lyons, L. C. and Hardin, P. E. (1999). Interlocked feedback loops within the Drosophila circadian oscillator. Science 286, 766-768. Hardin P. E. (2000). From biological clock to biological rhythms. Genome Biol. 1, REVIEWS1023. Kloss, B., Price, J. L., Saez, L., Blau, J., Rothenfluh, A., Wesley, C. S. and Young, M. W. (1998). The Drosophila clock gene double-time encodes a protein closely related to human casein Kinase Iepsilon. Cell 94, 97-107. Price, J. L., Blau, J., Rothenfluh A., Adobeely, M., Kloss, B. and Young, M. W. (1998). double-time is a novel Drosophila clock gene that regulates PERIOD protein accumulation. Cell 94, 83-95. Leloup, J. C. and Goldbeter, A. (1998). A model for circadian rhythms in Drosophila incorporating the formation of a complex between the PER and TIM proteins. J. Biol. Rhythms 13, 70-87. Ueda, H. R., Hagiwara, M. and Kitano, H. (2001). Robust oscillations within the interlocked feedback model of Drosophila circadian rhythm. J. Theor. Biol. 210, 401-406. Young, M. W. (2000). Circadian rhythms. Marking time for a kingdom. Science 288, 451-453.
H. Matsuno et al. / Biopathways Representation and Simulation on Hybrid Functional Petri Net
91
Jacobson, M. D., Weil, M. and Raff, M. C. (1997). Programmed cell death in animal development. Cell 88, 347-354. Thompson, C. B. (1995). Apoptosis in the pathogenesis and treatment of disease. Science 267, 1456-1462. Nijhawan, D., Honarpour, N. and Wang, X. (2000). Apoptosis in neural development and disease. Annu. Rev. Neurosci. 23, 73-87. [20] Kuwana, T., Smith, J. J., Muzio, M., Dixit, V., Newmeyer, D. D. and Kornbluth, S. (1998). Apoptosis induction by caspase-8 is amplified through the mitochondrial release of cytochrome c. J. Biol. Chem. 273, 16589-16594. [21] Hugunin, M., Quintal, L. J., Mankovich, J. A. and Ghayur, T. (1996). Protease activity of in vitro transcribed and translated Caenorhabditis elegans cell death gene (ced-3) product. J. Biol. Chem. 271, 3517-3522. [22] Matsuno, H., Doi, A., Hirata, Y. and Miyano, S. (2001). XML documentation of biopathways and their simulations in Genomic Object Net. Genome Inform. 12, 54-62. [17] [18] [19]
92
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2004, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-92
Constructing Biological Pathway Models with Hybrid Functional Petri Nets Atsushi Doia , Sachie Fujitaa , Hiroshi Matsunoa,∗ , Masao Nagasakib and Satoru Miyanob a Faculty of Science, Yamaguchi University, Japan E-mail:
[email protected] b Human Genome Center, Institute of Medical Science, University of Tokyo, Japan
ABSTRACT: In many research projects on modeling and analyzing biological pathways, the Petri net has been recognized as a promising method for representing biological pathways. From the pioneering works by Reddy et al., 1993, and Hofest¨adt, 1994, that model metabolic pathways by traditional Petri net, several enhanced Petri nets such as colored Petri net, stochastic Petri net, and hybrid Petri net have been used for modeling biological phenomena. Recently, Matsuno et al., 2003b, introduced the hybrid functional Petri net (HFPN) in order to give a more intuitive and natural modeling method for biological pathways than these existing Petri nets. Although the paper demonstrates the effectiveness of HFPN with two examples of gene regulation mechanism for circadian rhythms and apoptosis signaling pathway, there has been no detailed explanation about the method of HFPN construction for these examples. The purpose of this paper is to describe method to construct biological pathways with the HFPN step-by-step. The method is demonstrated by the well-known glycolytic pathway controlled by the lac operon gene regulatory mechanism. KEYWORDS: Hybrid functional Petri net, lac operon, biological pathway, simulation, Genomic Object Net
INTRODUCTION A Petri net [Reisig, 1985] is method to describe and model concurrent systems. It has been mainly used so far to model artificial systems such as manufacturing systems [Proth, 1997] and communication protocols [Wheeler, 1999]. The first attempt to use Petri nets for modeling biological pathways was made by Reddy et al., 1993, giving a method to represent metabolic pathways. Hofest a¨ dt expanded this method to model metabolic networks [Hofest¨adt, 1994]. Subsequently, several enhanced Petri nets have been used to model biological phenomena. Genrich et al., 2001, modeled metabolic pathways with a colored Petri net by assigning enzymatic reaction speeds to the transitions, and simulated a chain of these reactions quantitatively. Voss et al., 2003, used the colored Petri net in a different way, accomplishing a qualitative analysis of steady state in metabolic pathways [Voss et al., 2003]. The stochastic Petri net has been applied to model a variety of biological pathways; the ColE1 plasmid replication [Goss and Peccoud, 1998], the response of the σ 32 transcription factor to a heat shock [Srivastava et al., 2001], and the interaction kinetics of a viral invasion [Srivastava et al., 2002]. On the other hand, we have shown that the gene regulatory network of λ phage can be more naturally modeled as a hybrid ∗
Corresponding author.
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
93
system of “discrete” and “continuous” dynamics [Matsuno et al., 2000] by employing a hybrid Petri net (HPN) architecture [Alla and David, 1998; Drath, 1998]. It has also been observed [Ghosh and Tomlin, 2001] that biological pathways can be handled as hybrid systems. For example, protein concentration dynamics, which behave continuously, being coupled with discrete switches. Another example is protein production that is switched on or off depending on the expression of other genes, i.e. the presence or absence of other proteins in sufficient concentrations. Recently, by extending the notion of HPN, Matsuno et al., 2003b, introduced the hybrid functional Petri net (HFPN) in order to give a more intuitive and natural modeling method for biological pathways than the existing Petri nets. Although the paper demonstrates the effectiveness of an HFPN with two examples, gene regulation mechanism for circadian rhythms and apoptosis signaling pathway, it only gives the constructed HFPN models for these examples. The purpose of this paper is to demonstrate how biological pathways can be created with an HFPN describing step-by-step the process to construct a model of the lac operon gene regulatory mechanism and glycolytic pathway. The constructed HFPN model is verified by simulations of five mutants of lac operon on the Genomic Object Net (GON) software package (http://www.GenomicObject.Net/), [Matsuno et al., 2003b]. This software was developed based on the notion of the HFPN together with a GUI specially designed for biological pathway modeling. MODELING BIOLOGICAL PATHWAY WITH HYBRID FUNCTIONAL PETRI NET In this chapter, first we give a brief definition of the hybrid functional Petri net and give a summary for the modeling method of biological pathways with the hybrid Petri net. Hybrid functional Petri net: An extended hybrid Petri net for modeling biological reactions Petri net is a network consisting of place, transition, arc, and token. A place can hold tokens as its content. A transition has arcs coming from places and arcs going out from the transition to some places. A transition with these arcs defines a firing rule in terms of the contents of the places where the arcs are attached. Hybrid Petri net (HPN) [Alla and David, 1998] has two kinds of places discrete place and continuous place and two kinds of transitions discrete transition and continuous transition. A discrete place and a discrete transition are the same notions as used in the traditional discrete Petri net [Reddy et al., 1993]. A continuous place can hold a nonnegative real number as its content. A continuous transition fires continuously at the speed of a parameter assigned at the continuous transition. The graphical notations of a discrete transition, a discrete place, a continuous transition, and a continuous place are shown in Fig. 1, together with three types of arcs. A specific value is assigned to each arc as a weight. When a normal arc is attached to a discrete/continuous transition, w tokens are transferred through the normal arc, in either of normal arcs coming from places or going out to places. An inhibitory arc with weight w enables the transition to fire only if the content of the place at the source of the arc is less than or equal to w. For example, an inhibitory arc can be used to represent repressive activity in gene regulation. A test arc does not consume any content of the place at the source of the arc by firing. For example, a test arc can be used to represent enzyme activity, since the enzyme itself is not consumed. Hybrid dynamic net (HDN) [Drath, 1998] has a similar structure to the HPN, using the same kinds of places and transitions as the HPN. The main difference between HPN and HDN is the firing rule of continuous transition. As we can know from the description about HPN above, for a continuous transition of HPN, the different amounts of tokens can be flowed through the two types of arcs, coming
94
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
Fig. 1. Basic elements of HPN, HDN, and HFPN.
from/going out the continuous transition. In contrast, the definition of HDN does not allow to transfer different amount through these two types of arcs. However, HDN has the following firing feature of continuous transition which HPN does not have; “the speed of continuous transition of HDN can be given as a function of values in the places”. From the above discussion, we can know that each of HPN and HDN has its own feature for the firing mechanism of continuous transition. As a matter of fact, both of these features of HPN and HDN are essentially required for modeling common biological reactions. (See the example of Fig. 6 in which four monomers compose one tetramer.) This motivated us to propose hybrid functional Petri net (HFPN) [Matsuno et al., 2003b] which includes both of these features of HPN and HDN. Moreover, HFPN has the third feature for arcs, that is, a function of values of the places can be assigned to any arc. This feature was originated from the idea in the paper [Hofest¨adt and Thelen, 1998] which was introduced in order to realize the calculation of dynamic biological catalytic process on Petri net based biological pathway modeling. In fact, these three features are realized in HFPN by introducing a new type transition called a functional continuous transition which allows us to assign any functions to arcs and transitions for controlling the speed/condition of consumption/production/firing. An example to use the functional continuous transition is given in the beginning of section “HFPN modeling of the lac operon gene regulatory mechanism and glycolytic pathway”. Usage of discrete and continuous elements Biological pathways essentially consist of discrete parts such as a genetic switch control and continuous parts such as a metabolic reaction. These discrete and continuous parts can be represented by discrete elements (discrete place and discrete transition) and continuous elements (continuous place and continuous transition) of HFPN. For example, a control system turning on or off the gene expression with operator site can be represented by discrete elements. That is, if the discrete place has a token, the protein necessary for activating the operator site has bound to the operator, that means the gene expression turn on. In addition, using the delay concept of the discrete transition, wean easily describe the transcription which happens after a certain time. On the other hand, biological phenomenon such as transcription, translation, and enzymic and metabolic reactions have been treated as events whose conditions change continuously. By modeling the transcription and the translation with continuous elements in the following way, expression levels of mRNA and proteins can be simulated. Continuous places are used for storing concentrations of mRNA and protein. Reaction speeds of transcription and translation are assigned at parameters of continuous
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
95
Fig. 2. The lac operon: The enzyme β-galactosidase, produced by the lacZ gene, hydrolyzes lactose to glucose and galactose. lacY encodes the permease that brings lactose into the cell, and lacA encodes an acetylase that is believed to detoxify thiogalactosides, which, along with lactose, are transported into cells by lacY . The “operator” lies within the “promoter”, and the “CAP site” lies just upstream of the promoter.
Fig. 3. The HFPN representation of the control mechanism of the lac operon transcription switch. Only discrete elements are used for representing the switching mechanism.
transitions. In addition, common formulas for biological reactions such as Michaelis-Menten’s equation can be modeled almost directly with assigning concentrations of substrate and product to continuous places and the formula for the reaction of them to the continuous transition between these two places. HFPN MODELING OF THE LAC OPERON GENE REGULATORY MECHANISM AND GLYCOLYTIC PATHWAY This section demonstrates how we can model the lac operon gene regulatory mechanism and glycolytic pathway in E. coli with an HFPN. Biological facts used for constructing this model are described in the biological literature [Alberts et al., 1994; Lewin, 1997; Watson et al., 1987]. The HFPN modeling will start with a “transcription control switch” (Fig. 3), then it will be enhanced gradually by adding “positive regulation” (Fig. 4), “negative regulation” (Fig. 5), and “hydrolyzing lactose to glucose and galactose” (Fig. 7). This step-by-step explanation based on the well-known biological facts helps readers to understand how HFPNs are created according to the biological knowledge. All the parameters in the transitions of the HFPN model are summarized in Tables 1 and 2. In this example, we shall show a rough guideline for usage of continuous and discrete transitions. In accordance with the guideline below, readers can understand the modeling manner of biological pathway with HFPN described in the subsections from “Transcription control switch” to “Hydrolyzing lactose to glucose and galactose”. For the detail explanation about firing rules, refer to the paper [Matsuno et al., 2003b].
96
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
Fig. 4. Positive regulation mechanism: This figure contains the HFPN model of Fig. 3.
Continuous transition The firing speed of a continuous transition is described by a simple arithmetic formula such as mX/a, mX × mY , (mX × mY )/(mX + a× mY ), etc., where mX , mY are variables representing the contents of the input places going into the transition, and a is a constant parameter to be tuned manually. With this speed, the contents of input places will be consumed and simultaneously each output place will receive the corresponding amount through the arc. For example, there is a description about the reaction of the protein LacZ, which says that the degradation speed of LacZ is much slower than its production rate. Based on this description (biological knowledge), we used the formula m20/10000 as the degradation speed (transition T 8) and m19 as the production speed of LacZ. Basically, all transition speeds are given in the same way. That is, in our modeling method, a speed of biological reaction is defined relatively without a unit. To each continuous place representing the concentration of some substance, a continuous transition is attached to model the degradation of the substance. The degradation rate is set to be mX /10 for high speed degradation (e.g. T 1) and mX /10000 for low speed degradation (e.g. T 8). Following the definition, no weight is assigned to the arc going out from a continuous transition. The default weight of the arc going into a continuous transition is set to be zero. If necessary, an appropriate weight is assigned to the arc according to the underlying biological knowledge. (See the explanation in the subsection “Hydrolyzing lactose to glucose and galactose” for the transitions T 67, T 70, T 73). Discrete transition The default delay time of a discrete transition is set to be one and the default weight of an arc from/to a discrete transition is one. If possible, the delay time and the weight are appropriately chosen according
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
97
Table 1 Transitions in Figs 7 and 9 Name T1 T2
Type C C
Delay/Speed m4/10 m14/10
Variable m4 m14
From Weight 0 0
Type N N
To Variable Weight – – – –
T3 T4
C C
m15/10 m17/10
m15 m17
0 0
N N
– –
– –
T5
C
m18/10
m18
0
N
–
–
T6
C
m7/10
m7
0
N
–
–
T7 T8 T9 T10 T11 T12 T13
C C C C C C C
m19/10 m20/10000 m23/10 m24/10 m27/10 m28/10000 m29/10000
m19 m20 m23 m24 m27 m28 m29
0 0 0 0 0 0 0
N N N N N N N
– – – – – – –
– – – – – – –
T14 T15 T16 T17 T18 T19 T20 T21 T42 T43 T45 T46 T57 T58
C C C C C C C C D D C D D C
m9/10000 m8/2 m30/10000 m17/10000 m10/10000 m5/10000 m11/10000 m12/10 1 1 1 1 1.082 m14
m9 m8 m30 m17 m10 m5 m11 m12 m2 m3 – – m13 m14
0 0 0 0 0 0 0 0 1 1 – – 1 0
N N N N N N N N N N N – N N
– – – – – – – – – – m4 m13 m14 m15
– – – – – – – – – – – 1 1 –
T59 T60 T61
C C C
– 96× m16/100 399× m16/10000
m15 m16 m16
– 0 0
C N N
m16 m7 m17
– – –
T62
C
m16/1000
m16
0
N
m18
–
T63
D
1
m4
1
T
m2
1
T64
D
1
m5 m7
100 1
T T
m3
1
binding rate of repressor to the operon
T65
D
1
m8 m2
4 1
I T
m1
1
logical operation of the places “CAP site” and “operator”
T66
D
3.075
m3 m1
1 1
I T
C D
m19 0.051
m19 m21
1 1
T N
1 1 – 1
transription rate of lacZ
T67 T68
m19 m21 m20 m22
Comment degradation rate of CAP degradation rate of mRNA repressor degradation rate of repressor degradation rate of repressor binding to DNA degradation rate of repressor not binding to DNA degradation rate of repressor binding to operator region degradation rate of lacZ mRNA degradation rate of LacZ degradation rate of lacY mRNA degradation rate of LacY degradation rate of lacA mRNA degradation rate of LacA degradation rate of lactose outside of cell of lactose outside of cell degradation rate of lactose degradation rate of arolactose degradation rate of galactose degradation rate of glucose degradation rate of complex degradation rate of cAMP degradation rate of AMP degradation rate of ADP CAP releasing rate repressor releasing rate CAP production rate activation of repressor gene transcription rate of repressor translation rate of repressor mRNA conformation rate of repressor repressor binding rate to operator repressor binding rate to the DNA other than repressor site rate of repressor which does not bind any DNA binding rate of CAP to the CAP site
translation rate of lacZ moving rate of RNA polymerase
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
98
Table 1, continued From Variable Weight m22 1
Name T69
Type D
Delay/Speed 1.254
T70 T71 T72 T73 T74
C D D C C
T75
C
T76
C
m23/2 0.065 0.682 m27/5 m24 × m29 m29 + m24 × 10 m20 × m9 m9 + m20 × 10 m9/5
m23 m25 m26 m27 m29 m24 m9 m20 m9
1 1 1 1 0 2.5 0 5 1
T N N T N T N T T
T77
D
0.5
m3
1
N
m8 m16 m5 m11 m6 m11 m12 m29
1 1 0 0 5 0 0 0
N N N N I N N T
T79 T80
C C
m5/10 m11/10
T81 T82 T94
C C C
m11/10 m12/10 m29/10
Type N
To Variable Weight m23 1 m25 1 m24 – m26 1 m27 1 m28 – m9 – m30 m6 m8
– – –
m10
1
m11 m5
– –
m12 m11 m8
– – –
Comment transription rate of lacY translation rate of lacY moving rate of RNA polymerase transription rate of lacA translation rate of lacA transforming rate of into a cell decomposing rate of lactose to galactose and glucose producing rate of allolactose from lactose inside of a cell conformation rate of repressor and all llolactose reaction rate: cAMP to AMP reaction rate: AMP to cAMP
reaction rate: AMP to ADP reaction rate: ADP to AMP producing rate of allolactose from lactose outside of a cell All transitions in this figure are listed in the “Name” column. The symbol D or C in “Type” column represents the type of transition, discrete transition or continuous transition, respectively. In the “Delay/Speed” column, the firing speed of continuous transition or the delay time of discrete transition is described according to the type of transition. The column “From”, which represents incoming arc(s) to a transition, is divided into three sub-columns, “variable” (variable names of the places attached to the incoming arcs), “weight” (weight of the incoming arcs), and “type” (N, T, and I represent normal, test, and inhibitory arcs, respectively). The column “To”, which represents outgoing arcs from a transition, is divided into two sub-columns, “variable” (variable names of the places attached to the outgoing arcs) and “weight” (weight of the outgoing arcs).
to the underlying biological knowledge. (See the explanation in the subsection “Hydrolyzing lactose to glucose and galactose” for the transitions T66–T73). Functional continuous transition A functional continuous transition is used for T 59, which describes a reaction converting four monomers to one tetramer (Fig. 6(a)). Recall that HPN can not have functions as parameters of the transitions. We can not model this type of reaction with HPN, since, a reaction speed of this type is always assigned to a transition as a function of values of the places as shown in the above. In contrast, it is possible to model this reaction with the HDN as shown in Fig. 6(b). However, as is shown in Fig. 6(b), the constructed HDN model is not natural or intuitive. Recall that a functional continuous transition of HFPN allows us to assign any functions to arcs and transitions for controlling the speed/condition of consumption, production, or firing. From Fig. 6(c), we can see that this reaction can be modeled naturally and intuitively with a functional continuous transition. At an arc of HFPN, two kinds of parameters “threshold” and “speed” are assigned. The continuous transition attached at the head of the arc can not fire unless the content of the place attached at the tail of the arc exceeds the “threshold” of the arc. On the other hand, “speed” is a value or a function, defining the amount flowing through the arc.
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
99
Table 2 Places having non-zero initial values in Fig. 7 Name CAP cAMP glucose AMP ADP LacZ LacY LacA lactose outside of a cell
Variable m4 m5 m6 m11 m12 m20 m24 m28 m29
Initial value 5 100 50 200 200 5 2.5 1 50
Comment concentration of CAP concentration of cAMP concentration of glucose concentration of AMP concentration of ADP concentration of LacX concentration of LacY concentration of LacA lactose outside of a cell
Fig. 5. Negative regulation mechanism: This figure contains the HFPN model of Fig. 4.
Initial values of places The places which have initial values greater than zero are listed in Table 3. Transcription control switch We will sketch how places, transitions and arcs are determined in the process of modeling. Figure 2 shows the structure of the lac operon. In the absence of lactose, a repressor is bound to the operator. The repressor prevents RNA polymerase from starting RNA synthesis at the promoter. On the other hand, in the presence of lactose and the absence
100
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets Table 3 Mapping between the numbers in Fig. 8 and the metabolites in reactions Index 1 3 5 7 9 11
Enzyme/Reaction hexokinase phosphofructokinase triosephosphate isomerase phosphoglycerate kinase enolase lactate dehydrogenase
Index 2 4 6 8 10
Enzyme/Reaction phosphoglucose isomerase aldolase glyceraldehyde-3-phosphate dehydrogenase phosphoglycerate mutase pyruvate kinase
Fig. 6. HDN and HFPN representations of the reaction composing monomers to a tetramer. The speed of v1 is four times as fast as the speed of v2 . v01 is the firing speed of the transition T 01. [I1 ] ([I2 ]) represents the content of the place I1 (I2 ). HPN can not model this type of reaction, since, any reaction speed should be realized by assigning a function of values of the places to the continuous transition.
of glucose, the catabolite activator protein (CAP) is bound to the CAP site. Since the CAP helps RNA polymerase to bind to the promoter, the transcription of the lac operon can begin. This regulation mechanism can be expressed by the HFPN of Fig. 3, consisting of only discrete elements. The place “promoter” (m1) represents the status of the transcription of the lac operon. If this place contains token(s), the lac operon is being transcribed, whereas if this place contains no tokens, transcription of the operon does not begin. The rates of releasing CAP and repressor from the DNA are assigned to the transitions T 42 and T 43 as the delay times, respectively. The production rates of CAP and repressor are assigned to the transitions T 63 and T 64, respectively, as the delay times. Each time the transition T 65 fires, the place “promoter” receives one token. This transition fires when both of the following conditions hold: (1) The place “CAP site” (m2) contains tokens (this is the case in which the protein CAP is bound to the CAP site). (2) The place “operator” (m3) has no token (this is the case in which the repressor is not bound to the operator).
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
101
Positive regulation The DNA binding of CAP is regulated by glucose to ensure that the transcription of the lac operon begins only in the absence of glucose. In fact, glucose starvation induces an increase in the intracellular levels of cyclic AMP (cAMP). Transcription of the lac operon is activated with the help of CAP, to which cAMP binds. When glucose is plentiful, cAMP levels drop; cAMP therefore dissociates from CAP, which reverts to an inactive form that can no longer bind DNA. This regulatory mechanism by CAP and cAMP is called positive regulation of the lac operon. Figure 4 shows an HFPN model representing positive regulation of the lac operon. Continuous places are used for representing the concentrations of the substances CAP, cAMP, AMP, ADP, and glucose. Tokens in the places “CAP” (m4) and “cAMP” (m5) should not be consumed by the firing of the transition T 63, since both CAP and cAMP are not lost when forming a complex. Accordingly, two test arcs are used from the places “CAP” and “cAMP” to the transition T 63. The weight of the arc from the place cAMP to the transition T 63 is set to 100, while the weight of the arc from the place “CAP” is 1, which was determined by manual tuning and referring to the simulation results. After the concentrations both of CAP and cAMP exceed the thresholds which are given to these two arcs as weights, the transition T 63 can fire, transferring a token from the transition T 63 to the place “CAP site”. In general, reactions among cAMP, AMP, and ADP are reversed. The transition T 80 (the transition T 82) between the places “cAMP” (m5) and “AMP” (m11) (the places “AMP” and “ADP” (m12)) represents the reversible reaction together with the transition T 79 (the transition T 81). To the places “cAMP”, “AMP”, and “ADP”, 1, 200, and 200 are assigned as initial values, respectively. Recall that when glucose is plentiful, cAMP levels drop. This phenomenon is represented by the inhibitory arc from the place “glucose” (m6) to the transition T 80. When the concentration in the place “glucose” exceeds the threshold given at this inhibitory arc, the transition T 80 stops firing. In this model, since we suppose that CAP is produced continuously, its production mechanism is modeled with the place “CAP” and the transitions T 45 and T 1. The initial amount of the place “CAP” is set to 5, since CAP is produced by a production mechanism independent of the mechanism being described here. Finally, the transitions T 1, T 17, T 19, T 20, and T 21 represent the natural degradation of the corresponding substances. Since these transitions represent only degradation and no production, no arcs are going out from these transitions. Negative regulation In the presence of lactose, a small sugar molecule called allolactose is formed in a cell. Allolactose binds to the repressor protein, and when it reaches a sufficiently high concentration, transcription is turned on by decreasing the affinity of the repressor protein for the operator site. The repressor protein is the product of the lacI gene, which is located upstream of the lac operon. Actually, after forming a tetramer, the repressor protein can bind to the operator site. By adding this negative regulation mechanism to Fig. 4, Fig. 5 is obtained. Since it is known from the literature [Lewin, 1997] that repressor should be produced sufficiently prior to the production of other substances, parameters relating to this negative regulation are set to faster values than other parameters. In our model, a discrete place is used for representing the promoter site of a gene. When the discrete place “repressor promoter”(m13) receives a token, the transcription of the lacI gene begins. We can determine the transcription frequency by the delay rate of the transition T 46. Transcription and
102
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
translation mechanisms are modeled by the places “mRNA repressor” (m14) and “repressor” (m15) and the transitions T 57, T 2, T 58, and T 3. The reaction forming a tetramer from monomers (Fig. 6(a)) can be represented by the HDN as is shown in Fig. 6(b). (HPN can not model this type of reaction, since, any reaction speed should be realized by assigning a function of values of the places to the continuous transition.) By comparing this representation and the representation of Fig. 6(c), we recognize that an HFPN allows us to represent such reaction naturally and intuitively. Tetramer formation is represented in the HFPN by places
m15 “repressor” is assigned and “4 repressor” (m16) and three transitions T 3, T 59, and T 60. The function 4×m15 5 5 for the input (output) arcs to (from) the transition T 59 as a flow speed. Note that the speed of the input arc is four times faster than the speed of the output arc. For the repressor forming tetramer, we determined that about 96% of them bind to the operator site, about 3.99% of them bind to the other DNA sites, and about 0.01% of them do not bind to DNA. These percentages are determined from the description in the literature [Lewin, 1997]. The places “operator bind” (m7), “DNA bind” (m17), and “not bind” (m18) represent the amount of these repressors. According to these binding rates, the firing speeds of the transitions T 60, T 61, and T 62 were given by 96×m16 399×m16 m16 100 , 10000 , and 10000 respectively. We separate the concentration of lactose to two places “lactose outside of a cell” (m29) and “lactose” (m9) for the convenience of describing the function of the lacY gene in the next subsection. Concentration of the allolactose is represented by the place “allolactose” (m8) whose accumulation rates are given at the transitions T 94 and T 76. It is known [Hofest¨adt, 1994] that allolactose is produced from the lactose existing outside of a cell as well as the lactose inside a cell. Since it is natural to consider that a production speed of allolactose is faster than the passing rate of allolactose through the cell membrane, the speed of transition T 76 (m9/5) is set to be faster than the speed of transition T 94 (m29/10). The negative regulation in Fig. 5 was realized in the following way: The place “operator” receives tokens if – the concentration of the place “operator bind” exceeds the threshold 1 given at the test arc to the transition T 64 as a weight, and – the concentration of the place “allolactose” does not exceed the threshold 4 given at the inhibitory arc to the transition T 64. Four molecules of allolactose need to bind with one molecule of tetramer repressor. Accordingly the function of input arc from the place “allolactose” to the transition T 78 should be four times faster than that of input arc from the place “4 repressor”. In summary, we gave formulas, – 4 × m8 × m16 to the arc from the place “allolactose” to T78, – m8 × m16 to the arc from the place “4 repressor” to T 78, and – m8 × m16 to given at the arc from the T 78 to the place “complex”. The transition T 78 gives the complex forming rate of the allolactose and the tetramer of repressor proteins. The place “complex” (m10) represents the concentration of the complex. Allolactose can also release the tetramer of repressor proteins from the operator site by forming a complex with it. The transition T 77 and the arcs from/to the transition realize this mechanism. Discrete transitions are used for the transitions T 77, since only a discrete transition is available for the arc from the discrete place (a continuous amount can not be removed from a discrete place). In order to realize a smooth removal of allolactose, a small delay time (0.5) is assigned to the transition T 77. The transitions T 4, T 5, T 6, T 13, T 14, T 15 and T 18 represent the natural degradation of the corresponding substances.
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
103
Hydrolyzing lactose to glucose and galactose The lac operon transcription and translation mechanisms are described in Fig. 7 together with the effects of two products of the genes lacZ and lacY on hydrolyzing lactose to glucose and galactose. The effect of the gene lacA is not included in this figure, since the lacA gene does not play a role in either the lac operon gene regulatory mechanism or the glycolytic pathway. The places “mRNA lacZ” (m19), “mRNA lacY” (m23), and “mRNA lacA” (m27) represent the concentrations of mRNAs transcribed from the genes lacZ , lacY , and lacA, respectively, and the transcription rates are given at the discrete transitions T 66, T 69, and T 72 as the delay times. The places “LacZ” (m20), “LacY” (m24), and “LacA” (m28) represent the concentrations of proteins translated from the lacZ , lacY , and lacA mRNAs, and the translation rates are given at the continuous transitions T 67, T 70, and T 73. As is shown in Table 3, we set the initial values of proteins LacZ, LacY, and LacA to 5, 2.5, 1, respectively. These values are chosen according to the production ratios of LacZ, m27 LacY, and LacA proteins [Lewin, 1997]. Actually, the formulas m19, m32 2 , and 5 are assigned to the transitions T 67, T 70, and T 73 as the speeds, according to the fact that the proteins of lacZ, lacY, and lacA are produced in the ratio 1: 12 : 15 . Degradation rates of mRNAs (proteins) are assigned to the transitions T 7, T 9, and T 11 (T 8, T 10, and T 12). In this model, to represent the lac operon DNA, only discrete elements, discrete transitions T 66, T 68, T 69, T 71, and T 72, and discrete places X1 (m21), X2 (m22), X3 (m25), and X4 (m26), are used. The discrete places X1 (X3) represents the Boolean status of the transcription of lacZ gene (lacY gene). That is, each time transcription of lacZ(lacY ) is finished, the place X1 (X3) receives a token. At the discrete transition T 68 (T 71), the delay time 0.051 (0.065), which is required for the RNA polymerase moving from the end of the lacZ gene to the beginning of the lacY gene (the end of the lacY gene to the beginning of the lacA gene), is assigned. The delay times 3.075 and 1.254 are assigned to the transitions T 66 and T 69, respectively, according to the fact that the length of the lacZ gene (lacY gene) is 3075 bp (1254 bp). The delay time 0.682 at the transition T 72 represents the length of the lacA gene (the length of the lacA gene is 682 bp). The lengths of the genes are obtained from the website http://genolist.pasteur.fr/Colibri/genome.cgi. Note that we can know the transcription status of the gene lacY (the gene lacA) by observing whether the discrete places X2 (X4) contains tokens or not. Recall that the product of the gene lacZ is an enzyme which hydrolyzes lactose to glucose and galactose. This reaction is modeled by using the places “lactose”, “galactose” (m30), and “glucose”, and the transitions T 75, T 16, and T 17. In our model, 20 is assigned to each of the places “lactose” and “lactose outside of a cell” as an initial value. A test arc is used from the place “lacZ” to the transition T 75, since the enzyme is not consumed. We consider that the production rates of glucose and lactose depend on both the concentration of m24×m29 lactose and the concentration of the product of the lacZ gene. The formula m29+m24×10 representing the speed of the transition T 74 reflects this idea. We mentioned that the gene lacY encodes the permease that brings lactose into the cell. In Fig. 7, this function is realized with the places “LacY”, “lactose outside of a cell”, and “lactose”, and the transitions T 74, T 13, and T 14. Since the product of lacY gene is an enzyme, test arc is used from the place “LacY” m20×m9 similarly above. to the transition T 74, and the speed of this transition is given by the formula m9+m20×10 The weight 2.5 (5) of the arc from the place LacY (m24) (LacZ (m20)) to the transition T 74 (T 75) corresponds to the basal concentration of LacY (LacZ) presented in Table 3.
104
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
Fig. 7. HFPN modeling of lac operon gene regulatory mechanism. This figure contains the HFPN model in Fig. 5.
Glycolytic pathway In the glycolytic pathway, glucose is converted into two molecules of pyruvate. The enzymatic reactions shown in Fig. 8 and Table 3 are involved in this pathway. More details about this pathway are explained in the following modeling process. Figure 9 shows the HFPN model of the glycolytic pathway.
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
105
Fig. 8. A part of the glycolytic pathway consisting of glucose (Gluc), glucose 6-phosphate (G6P), fructose 6-phosphate (F6P), fructose 1,6-diphosphate (FBP), dihydroxyacetone phosphate (DHAP), glyceraldehyde 3-phosphate (GAP), 1,3-diphosphoglycerate (1,3-BPG), 3-phosphoglycerate (3PG), 2-phosphoglycerate (2PG), phosphoenolpyruvate (PEP), and pyruvate (Pry).
Main pathway from glucose to pyruvate acid First we create continuous places corresponding to glucose (m6), intermediates (m31 − m39), and pyruvate (m40). Default continuous places are introduced at this step though these places are meant to represent the concentrations of the corresponding substrates. Then, by following the pathway in Fig. 8, we put continuous transitions (T 83 − T 92) together with normal arcs between two consecutive places in the pathway. These transitions and arcs shall represent the reactions, but default transitions and arcs are initially introduced without any parameter tuning. By considering the natural degradation of substrates, we put continuous transitions T 17 for glucose and T 23 − T 31 for intermediates with normal arcs. By taking into account the fact that natural degradation is very slow in glycolysis, the firing speed of these transitions is given by the formula mX/10000, where X = 17, 23, 24, . . . , 31. Production of ATP from ADP and NADH from phosphoric acid Next we consider ADP, ATP, Pi (phosphoric acid) and NADH. In the pathway shown in Fig. 8, two ADP molecules and two Pi’s are invested to produce four ATP molecules and two NADH molecules. Continuous places “ADP” (m12), “ATP” (m51), “Pi” (m52, initial value=200), and “NADH” (m53)
106
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
Fig. 9. HFPN model of glycolytic pathway. Michaelis-Menten’s equation is applied to the reactions in the main pathway from glucose to pyruvic acid. Test arcs are used to represent enzyme reactions.
are created to represent ADP, ATP, Pi and NADH. We attach the continuous transitions T 21 and T 22 representing the natural degradation of ADP and ATP. Their firing speeds are set to be very slow (T 21: m21/10000, T 22: m22/10000), similarly as for the intermediates above. For the process of ATP→ADP
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
107
in the reaction (1), the normal arc from the place “ATP” to the transition T 83 and the normal arc from the transition T 83 to the place “ADP” are introduced. In the same way, normal arcs connected to T 85 are introduced for the process of ATP→ ADP in the reaction (3). Similarly, to represent reactions (7) and (10), transitions T 89 and T 92 are used, respectively. Reaction (6) (Pi→NADH) is represented by the transition T 88. Since we consider ATP and ADP to degrade slowly, the formulas m22/10000 and m21/10000 are assigned to the transitions T 22 and T 21 as degradation speed, respectively. Including enzyme reactions Places having variables m41, m42, . . ., m50 represent enzyme concentrations. Initial values of these variables are set to 5. The production rates of these enzymes are assigned to the transitions T 47, T 48, . . ., T 56 whose speeds are set to 1. The transitions T 32, T 33, . . . , T 41 have the firing speeds mX/10(X =41, 42, . . . , 50), which represent the speed of natural degradation of these enzymes. Test arcs are used for enzyme reactions since an enzyme itself is not consumed in the reaction. To each of these test arcs, the value 3 is chosen as a weight of the arc. Reaction speeds in the main pathway To represent the reaction speeds in the main pathway, we adopt the Michaelis-Menten equation: Vmax [S] , Km + [S]
where [S] is the substance concentration, V max is the maximum reaction speed, and K m is a Michaelis constant. In our model, we let V max = 1 and Km = 1. The two independent places “Pv” and “Pk” in Fig. 9 represent these variables V max and Km . The values of Vmax and Km can easily be manipulated by changing the contents of the places Pv and Pk, respectively. The Michaelis-Menten equation is used for representing each of the firing speeds of the transitions T 83, T 84, . . . , and T 92. For example, for Vmax m31 is used. the transition T 84, the formula K m +m31 ADP molecules and two Pi’s are invested to produce four ATP molecules and two NADH molecules.
SIMULATIONS BY GENOMIC OBJECT NET In the previous section, we demonstrated how a biological pathway is modeled with HFPN through an example of the lac operon gene regulatory mechanism and glycolytic pathway. This section verifies the HFPN model by analyzing simulation results obtained from GON. GON is the software based on the notion of the HFPN and has a GUI specially designed for biological pathway modeling. GON was developed in Java so that it can run on any platforms. The modeled biological pathways is saved as an original XML format. One of the interesting function of GON is that any place or transition of HFPN can be replaced to an image having biological meaning. Figure 10 is a screen shot of GON after applying this function several times. In this screenshot, for example, the discrete places “CAP site”, “operator”, and “promoter” in Fig. 3 were replaced to the corresponding images.
108
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
Fig. 10. A screenshot of GON.
Simulation Results of five mutants of the lac operon The five mutants taken into consideration in this simulation are listed below together with the realization methods of the mutants in the HFPN model of Fig. 8. lacZ − • lacY − • lacI − • lacI s •
a mutant which can not produce β -galactosidase, delete the transition T 67, a mutant which can not produce β -galactoside permease, delete the transition T 70, a mutant in which 4 repressor monomers can not constitute one active repressor tetramer, delete the transition T 59, a mutant to which allolactose can not bind, delete the transitions T 77 and T 78 together with the inhibitory arc from the place “allolactose” to the transition T 64, −d lacI a mutant which can not bind to the DNA, • delete the transitions T 60 and T 61. Behavior of the wild type In both of Figs 11 and 12, the concentration behaviors of lactose (outside of a cell), lactose, glucose, LacZ (β -galactosidase), and LacY (β -galactoside permease) are shown. From the beginning, glucose is degraded, since it is consumed in the glycolytic pathway. At time point 55, the glucose was consumed, the transcription of lac operon begins, producing the LacZ protein (time = 60), and successively, LacY
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
109
Fig. 11. Simulation results of the lac repressor mutant. Concentrations of proteins LacZ and LacY keep growing, since the mutants lacI − and lacI −d lose the ability to bind at the operator site. In the mutant lacI s , the transcription of the lac operon does not begin, since the repressor can not be removed from the operator site.
protein begins to be produced. By comparing the concentration behavior of lactose and lactose (outside of a cell), we can know that the LacY protein works well. At time point 65 the concentration of LacZ exceeds 10, decomposition of lactose to glucose and galactose starts, increasing the concentration of glucose again. Just after the glucose is once again completely consumed, the transcription of the lac operon is stopped, keeping the concentration of LacZ and LacY proteins at some levels (from the assumption that the degradation speed of these proteins is very slow) [Alberts et al., 1994; Lewin, 1997]. Behavior of the lac repressor mutant Figure 11 shows the simulation results of the mutants lacI − , lacI s , and lacI −d obtained from GON. In the lacI − and lacI −d mutants, LacZ protein and LacY protein are produced, while these proteins are not produced in the lacI s mutant. Furthermore, in the lacI − and lacI −d mutants, the concentrations of LacZ and LacY proteins keep growing (except during the period of glucose re-production), even after the decomposition of lactose has ended. Note that these simulation results support the biological experimental results [Alberts et al., 1994; Lewin, 1997].
110
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
Fig. 12. Simulation results of lacZ − and lacY − mutants. We can see that the glucose is not produced in the lacZ− mutant and the lactose can not enter a cell in the lacY − mutant.
Behavior of the lac operon mutant Figure 12 shows the simulation results of the mutants lacZ − and lacY − obtained from GON. From this figure, we can observe that, in the lacZ − mutant, glucose is never produced again once it is completely consumed. On the other hand, in the lacY − mutant, the concentration of lactose (inside the cell) never grows, since lactose can not pass a cell membrane in this mutant. Note that these observations from the simulation results support the experimental results [Alberts et al., 1994; Lewin, 1997]. CONCLUSION In this paper, we demonstrated how to construct HFPNs in modeling biological pathways with the example of the lac operon gene regulatory mechanism and glycolytic pathway. This example was simulated by GON and the results of five mutants of lac operon gene were shown, which correspond
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
111
well to the facts described in the literature. The purpose of this paper was to describe method to construct biological pathways with the HFPN. Therefore we introduced the well-known glycolytic pathway controlled by the lac operon gene regulatory mechanism. Although this paper deals with a known biological phenomenon, with GON, we succeeded in discovering one unknown biological phenomenon in multicellular systems [Matsuno et al., 2003a]. We analyzed the mechanism of Notch-dependent boundary formation in the Drosophila large intestine, by experimental manipulation of Delta expression and computational modeling and simulation by GON. Boundary formation representing the situation in the normal large intestine was shown in the simulation. By manipulating Delta expression in the large intestine, a few types of disorder in boundary cell differentiation were observed, and similar abnormal patterns were generated by the simulation. Simulation results suggest that parameter values representing the strength of cell-autonomous suppression of Notch signaling by Delta are essential for generating two different modes of patterning: lateral inhibition and boundary formation, which could explain how a common gene regulatory network results in two different patterning modes in vivo. We should emphasize that, essentially, any type of differential equations can be modeled with HFPN. This means that GON has the potential to simulate biological pathway models for other biosimulation tools such as E-Cell and Gepasi. More concretely, if the users have a skill of programming, they can develop original functions as the Java class files, which can be called from the transitions or the arcs of HFPN. This means that, any function of E-Cell or Gepasi can be included as Java class file in GON. With GON, we have succeeded in modeling many kinds of biological pathways [http://www.Genomic Object.Net/]. However, at the same time, we recognized that the current notion of HFPN is still insufficient to model more sophisticated biological pathways, including more complex information such as localization, cell interaction, etc. This is one of the reasons to motivate us to create a “hybrid functional Petri net with extension (HFPNe)” by enhancing the concept of HFPN [Nagasaki et al., 2003]. The HFPNe allows more “types” for places (integer, real, boolean, string, vector) with which complex information can be handled. In other words, the definition of these types allow us to treat other existing Petri nets as subsets of the HFPNe. For example, a traditional Petri net (with only discrete elements) is treated as the HFPNe using only “integer type”. Furthermore, HFPNe can define a hybrid system of continuous and discrete events together with a hierarchization of objects for an intuitive creation of complex objects. Genomic Object Net is developed based on the notion of an HFPNe. The biological pathway models constructed with the HFPNe will be reported in the near future. ACKNOWLEDGEMENTS This work was partially supported by the Grand-in-Aid for Scientific Research on Priority Areas “Genome Information Science” from the Ministry of Education, Culture, Sports, Science and Technology in Japan. REFERENCES • Alberts, B., Bray, D., Lewis, J., Raff, M,. Roberts, K. and Watson J. (1994). The Molecular Biology of the Cell, Third Edition. Garland Publishing, Inc, New York. • Alla, H. and David, R. (1998). Continuous and hybrid petri nets. Journal of Circuits, Systems, and Computers 8, 159-188.
112
A. Doi et al. / Constructing Biological Pathway Models with Hybrid Functional Petri Nets
• Drath, R. (1998). Hybrid object nets: an object oriented concept for modeling complex hybrid systems. Proc. Hybrid Dynamical Systems, 3rd International Conference on Automation of Mixed Processes, ADPM’98, pp. 437-442. • Genrich, H., K¨uffner, R. and Voss, K. (2001). Executable Petri net models for the analysis of metabolic pathways. International Journal on Software Tools for Technology Transfer 3, 394–404. • Ghosh, R. and Tomlin, C. J. (2001). Lateral inhibition through delta-notch signaling: A piecewise affine hybrid model. In: Hybrid Systems: Computation and Control, 4th International Workshop, Di Benedetto, M. D. and Sangiovanni-Vincentelli, A. L. (eds.), Lecture Notes in Computer Science 2034, Springer, pp. 232-246. • Goss, P. J. E. and Peccoud, J. (1998). Quantitative modeling of stochastic systems in molecular biology by using Stochastic Petri nets. Proc. Natl. Acad. Sci. USA 95, 6750-6755. • Hofest¨adt, R. (1994). A Petri net application to model metabolic processes. SAMS 16, 113-122. • Hofest¨adt R. and Thelen, S. (1998). Quantitative modeling of biochemical networks. In Silico Biol. 1, 0006. • Lewin, B. (1997). Genes VI. Oxford University Press and Cell Press, Oxford, UK. • Matsuno, H., Doi, A., Nagasaki, M. and Miyano, S. (2000). Hybrid Petri net representation of gene regulatory network. Pac. Symp. Biocomput. 5, 338-349. • Matsuno, H., Murakami, R., Yamane, R., Yamasaki, N., Fujita, S., Yoshimori, H. and Miyano, S. (2003a). Boundary formation by notch signaling in Drosophila multicellular systems: experimental observations and a gene network modeling by Genomic Object Net. Pac. Symp. Biocomput. 8, 152-163. • Matsuno, H., Tanaka, Y., Aoshima, H., Doi, A., Matsui, M. and Miyano, S. (2003b). Biopathways representation and simulation on hybrid functional Petri net. In Silico Biol. 3, 0032. • Nagasaki, M., Doi, A., Matsuno, H. and Miyano, S. (2003). Genomic Object Net: a platform for modeling and simulating biopathways. Applied Bioinformatics 2, 181-184. • Proth, J.-M. (1997). Petri nets for modelling and evaluating deterministic and stochastic manufacturing systems. Proc. 6th International Workshop on Petri Nets and Performance Models (PNPM ’97), pp. 2-15. • Reddy, V. N., Mavrovouniotis, M. L. and Liebman, M. N. (1993). Petri net representations in metabolic pathways. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1, 328-336. • Reisig, W. (1985). Petri Nets. Springer-Verlag, Berlin. • Srivastava, R., Peterson, M. S. and Bentley, W. E. (2001). Stochastic kinetic analysis of the Escherichia coli stress circuit using σ32 -targeted antisense. Biotechnol. Bioeng. 75, 120-129. • Srivastava, R., You, L., Summers, J. and Yin, J. (2002). Stochastic versus deterministic modeling of intracellular viral kinetics. J. Theor. Biol. 218, 309-321. • Voss, K., Heiner, M. and Koch, I. (2003). Steady state analysis of metabolic pathways using Petri nets. In Silico Biol. 3, 0031. • Watson, J., Hopkins, N., Roberts, J., Steitz, J. and Weiner, A. M. (1987). Molecular Biology of the Gene, Fourth edition. The Benjamin/Cummings Publishing Company Inc., Menlo Park, CA. • Wheeler, G. (1999). The modelling and analysis of IEEE802.6’s configuration. In: Application of Petri Nets to Communication Networks, Billington, J., Diaz, M. and Rozenberg, G. (eds.), Lecture Notes in Computer Science 1605, Springer, pp. 69-92.
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2005, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-113
113
STEPP – Search Tool for Exploration of Petri net Paths: A New Tool for Petri Net-Based Path Analysis in Biochemical Networks Ina Kocha,∗ , Markus Sch¨ulera and Monika Heinerb a Technical University of Applied Sciences Berlin, FBV, WG Bioinformatics, Seestr. 64, 13347 Berlin, Germany Tel.: +49 (0)30 4504 3972; Fax: +49 (0)30 4504 3959; E-mail: {ina.koch, markus.schueler}@tfh-berlin.de b Brandenburg University of Technology at Cottbus, Department of Computer Science, 03013 Cottbus, Germany Tel.: +49 (0)355 69 3884; Fax: +49 (0)355 69 3830; E-mail:
[email protected]
ABSTRACT: To understand biochemical processes caused by, e.g., mutations or deletions in the genome, the knowledge of possible alternative paths between two arbitrary chemical compounds is of increasing interest for biotechnology, pharmacology, medicine, and drug design. With the steadily increasing amount of data from high-throughput experiments new biochemical networks can be constructed and existing ones can be extended, which results in many large metabolic, signal transduction, and gene regulatory networks. The search for alternative paths within these complex and large networks can provide a huge amount of solutions, which can not be handled manually. Moreover, not all of the alternative paths are generally of interest. Therefore, we have developed and implemented a method, which allows us to define constraints to reduce the set of all structurally possible paths to the truly interesting path set. The paper describes the search algorithm and the constraints definition language. We give examples for path searches using this dedicated special language for a Petri net model of the sucrose-to-starch breakdown in the potato tuber. Availability: http://sanaga.tfh-berlin.de/˜stepp/ KEYWORDS: Petri nets, systems biology, biochemical networks, metabolic networks, signal transduction networks, path search with constraints, graph theory, sucrose-to-starch breakdown, potato tuber
INTRODUCTION Petri net theory provides a definite description formalism and various analysis techniques for systems with concurrent processes. Since more than ten years Petri nets are also applied to model and analyse biochemical systems qualitatively [Reddy et al., 1993; Reddy et al., 1996; Voss et al., 2003; Heiner et al., 2004] as well as quantitatively [Hofest¨adt, 1994; Hofest¨adt and Thelen, 1998; Matsuno et al., 2003]. The results show that Petri net theory forms a useful theoretical basis for qualitative model validation and analysis [Heiner and Koch, 2004]. The search for alternative paths and their analysis are a crucial point in the understanding of biochemical networks. Changes in the system behaviour, caused by mutations in the genomic sequence or by absence ∗
Corresponding author.
114
I. Koch et al. / STEPP – Search Tool for Exploration of Petri net Paths: A New Tool
of special enzymes, are more likely to be comprehensible with the knowledge and deep understanding of alternative paths. The next step would be to answer the question, under which conditions which paths could be active. Methods for path searches in graphs are well-established techniques in computer science. There exist several algorithms to search for paths in undirected and directed arbitrary graphs, such as breadth-first search (BFS) and depth-first search (DFS), and other specialised variants as the Bellman-Ford algorithm, the Dijkstra algorithm, the Floyd-Warshall algorithm, and others; for an overview see Cormen et al., 2001. The direct application of these methods to biochemical networks could result in an enormous amount of possible paths. Typically, the complete amount of paths is not necessary. Even worse, its manual management is hardly possible. To reduce the solution set to the in general much smaller subset of interest, we suggest the definition of special constraints, which have to be fulfilled by all paths in the solution set. Existing Petri net tools, e.g., INA [Starke and Roch, 1999], search for paths in the reachability graph, i.e. for paths from one state to another state of the system, to answer the question whether a special system state is reachable from another given system state. In KEGG [Kanehisa and Goto, 2000] and other network databases it can be searched for special pathways, but not for all paths between two arbitrary chemical compounds, satisfying additional constraints. For these reasons, we have implemented a stand-alone basic tool for path search between two arbitrary vertices in the net using constraints in order to get those paths we are interested in. In this paper we explain the search algorithm and the language definition to formulate constraints. In the result section we show how to use the language for constraint definition and give several examples for specific path searches for the sucrose-to starch breakdown in the potato tuber. METHODS A classical place/transition Petri net (P/T net) forms the basis for our considerations. The places represent the metabolites or chemical compounds, while transitions stand for enzymes catalysing the underlying chemical reactions. For further information on Petri nets and their application to biochemical systems see Murata, 1982, or Heiner and Koch, 2004, respectively. For the path search we specify two arbitrary vertices, which can be transitions or places, as start vertex and as end vertex, for which all possible paths in between have to be computed. A path of length k is defined as a sequence of k vertices in the graph, which are connected by edges. The vertices are enumerated along the path, beginning with the start vertex as number 1. The position of a vertex in a path corresponds to the vertex number. No vertex occurs twice in a path, which is necessary to avoid loops. Forward and backward reactions are considered to be the same vertex to circumvent loops between them. The method consists of two main parts: firstly, the exhaustive path search by traversing the graph representing our Petri net, secondly, the reduction of the paths, resulting from the exhaustive search by evaluation of constraints. Exhaustive Path Search A first step is the creation of a special graph structure, which serves as basis for a fast path search. Petri nets are converted using BFS as a fast algorithm to visit all vertices, which takes O(V + E) time, whereby V is the number of vertices and E the number of edges.
I. Koch et al. / STEPP – Search Tool for Exploration of Petri net Paths: A New Tool
115
Fig. 1. Path search from vertex A to vertex I in a graph. When reaching vertex I, the corresponding path will be read out.
The second step is the path search. For a correct and fast implementation we perform an exhaustive search by traversing our graph structure. Starting with the start vertex we visit recursively all vertices until the end vertex is reached, see Fig. 1. In this way a vertex can be visited more than once. This ensures us to find all paths, but leads to an exponential running time in the worst-case. Despite the fact that the running time for our case study is still acceptable (much less than a second), an improved implementation is under development. Because of the density and the large amount of reversible reactions in biochemical networks, the exhaustive search, as introduced above, results generally in a huge amount of possible loop-free paths; see the result section for examples. Using dedicated constraints the search becomes more purposeful in order to get only the paths of interest. The Constraint Concept The definition of a constraint language is the main contribution of the approach presented in this paper. The use of constraints gives us the possibility to characterise biological requirements in metabolic paths search. Constraints could be used to model situations such as loss-of-function mutations of enzymes or absence of substrates, and to perform specific path searches fulfilling these requirements. Furthermore, a concentration on special paths of interest, e.g., paths through specific vertices, paths with a given length, etc., should be supported. A basic objective of our constraint definition language is high user acceptance by an intuitive, but powerful user interface. To reach these aims, we reduced the language elements to a small number of basic constraints (such as “should include a specific vertex”, “should be smaller than ten vertices”, etc). Starting from these simple atomic constraints, more powerful composite constraints of any complexity can be constructed using logical operators as AND, OR, and NOT. In the following, we explain our constraint definition language. Constraints are defined as Strings ("") using brackets and logical operators. Brackets: { } stand for the specification of constraints, e.g., i { ATP }, and ( ), [ ] for logically combined constraints, e.g., [ (i { ATP }) | (i { ADP }) ]. These two types of brackets are introduced to improve the readability.
116
I. Koch et al. / STEPP – Search Tool for Exploration of Petri net Paths: A New Tool
Logical operators: & stands for the AND operator (conjunction), | stands for the OR operator (disjunction), and ˜ stands for the NOT operator (negation). Constraint Definition Possible constraints are classified as vertex-dependent (include, exclude, position), pathdependent (length, score), and result-dependent constraints (shortest, longest, lowest, highest). The names of vertices are not case sensitive. Vertex-dependent constraints: Include(i) ensures the occurrence of a given vertex in the path. This means in biological sense that the path has to include a special enzyme or metabolite. Exclude(e) ensures the absence of a given vertex in the path. Special enzymes or metabolites are excluded explicitly, e.g., for knockout simulations. P osition(p) ensures the occurrence of a given vertex at a specified position in the path. This constraint is much stronger than the include constraint. Syntax: i{ x } finds paths containing vertex x, e.g., i{ ATP }. i{ x, y } is equivalent to ( i{ x } & i{ y }). It finds paths containing vertices x and y, e.g., i{eSuc, Suc}. Analogously, the same syntax is valid for exclude(e). p{ x, n } finds paths containing vertex x at position n, e.g., p{ ATP, 4 }. p{ x, end } finds paths containing vertex x at the last position, e.g., p{ eSuc, end }. p{ x, end-n } finds paths containing vertex x at the last the position minus n, e.g., p{ Suc, end-2}. Path-dependent constraints: These constraints refer to the whole path length. Length (k) compares the length of the computed path with the previously defined length k using the relations greater than, smaller than, and equals to. Score (s) compares the score s, defined as the sum of the edge weights along the path, with the previously defined score s using the relations greater than, smaller than, and equals to. Syntax: l{ >, n } as well as lmin( n ) give paths with equal or more vertices than n, e.g., lmin{4}. l{ <, n } as well as lmax( n ) give paths with equal or less vertices than n, e.g., lmax{15}. l{ =, n } as well as l{ n } give paths with exactly n vertices, e.g. l{10}. Analogously, the same syntax is valid for score(s). These vertex- and path-dependent constraints can be combined by the given logical operators (AND, OR, and NOT). Because of the possibility of combining constraints by these operators and of using brackets, each other logical standard operator, e.g., NAND, NOR, XOR, and others can be described. Result-dependent constraints: These constraints are not evaluated on one path, but on the set of all computed paths. This type of constraints can not be connected by logical operators. Each defined constraint string can contain only
I. Koch et al. / STEPP – Search Tool for Exploration of Petri net Paths: A New Tool
117
one result-dependent constraint. Using these constraints optimal paths can be calculated. If there are several optimal paths, all optimal paths will be provided. Shortest or longest gives the paths of shortest or longest length, respectively. Lowest or highest gives the paths with lowest or highest score, respectively. Syntax: shortest finds shortest paths, e.g., i{ ATP } | e{ UTP }, shortest. longest finds longest paths, e.g., i{ ATP } & e{ UTP }, longest. lowest finds paths with lowest score, e.g., i{ ATP } | e{ UTP }, lowest. highest finds paths with highest score, e.g., i{ ATP } & e { UTP }, highest.
User-defined constraints: The user has the possibility to define own constraints and save them in a separate file. They can also be defined as standard constraints, but must be named. In the current implementation there exist the following pre-defined constraints. Syntax: nopi, e{ Pi } finds paths without phosphates as vertices. nopp, e{ PP } finds paths without diphosphates as vertices. no pp, nopi & nopp finds paths without phosphates and diphosphates. dijkstra, lowest finds the path with the lowest score. Implementation The software is implemented in Java version 1.4.2. Thus, the program runs under Windows and Linux/Unix.
RESULTS AND DISCUSSION To demonstrate the functionality of the program we consider the Petri model of the sucrose-to-starch breakdown in the potato tuber. The Petri net model is not very large, but, because of the many reversible reactions and the graph density, complicated enough to show the usefulness of constraint definitions to search for alternative paths. For a more detailed description and explanation of the modelling and analysis approach see Koch et al., 2004. The net is depicted in Fig. 2. Abbreviations for enzymes, metabolites, and reaction types are compiled in Table 1. In the following we give examples for path searches. For this purpose, we have to give the start and end vertex, followed by the constraint string. Search without Constraints eSuc-> starch "": The search for all possible paths from eSuc to starch without constraints yields 171 paths.
118
I. Koch et al. / STEPP – Search Tool for Exploration of Petri net Paths: A New Tool
Fig. 2. The Petri net model of the sucrose-to-starch breakdown in the potato tuber.
Search with Vertex-dependent Constraints eSuc-> starch “i{ UTP }”: The search for all paths from eSuc to starch including UTP yields 32 possible paths. eSuc-> starch “e{ ATP } & i{ UTP }”: The search for all paths from eSuc to starch excluding ATP and including UTP yields one possible path.
I. Koch et al. / STEPP – Search Tool for Exploration of Petri net Paths: A New Tool
119
Table 1 Abbreviations used for enzymes, metabolites, summarised reactions, and environment reactions Abbr. Adk FK HK Inv NDPkin PGI PGM PPase SPP SPS SucTrans SuSy UGPase
Enzyme name adenylate kinase fructokinase hexokinase invertase NDP kinase phosphoglucoisomer phosphoglucomutase pyrophosphatase sucrose phosphate phosphatase sucrose phosphate synthase sucrose transporter sucrose synthase UDP-glucose pyrophosphorylase
Abbr. ATPcons(b) Glyc(b) StaSy(b) Abbr. geSuc rStarch
Summarised reactions ATP consumption glycolysis starch synthesis Environmental reactions generate eSuc remove starch
Abbr. ADP AMP ATP eSuc F6P Frc G1P G6P Glc Pi PP S6P starch Suc UDP UDPglc UTP
Metabolite name adenosine diphosphate adenosine monophosphate adenosine triphosphate external sucrose fructose 6-phosphate fructose glucose 1-phosphate glucose 6-phosphate glucose phosphate ion pyrophosphate sucrose 6-phosphate starch sucrose uridine diphospate uridine diphosphate-glucose uridine triphosphate
eSuc-> starch “i{ Glc } & e{ HK }”: There is no path from eSuc to starch including Glc and excluding HK. This is obvious, because Glc can only be converted into G6P through HK. eSuc -> starch “l{ <, 11 }” searches for all paths with at most eleven vertices and finds sixteen possible paths, see Table 2. The search for the shortest paths: eSuc -> starch “shortest” yields one result: SucTrans, Inv, HK, StaSy(b). Biologically Motivated Path Search Mutation of sucrose synthase The question is to get all alternative paths, when sucrose synthase is not available, because, e.g., of a loss-of-function mutation. Start and end vertices are sucrose and starch. As mentioned earlier, 171 paths exist. Now, we have to exclude sucrose synthase by the constraint “e {SuSy}”. The result consists of 51 paths, which are far too much to be evaluated manually. One possibility for path reduction is the exclusion of all PP/Pi bridges, formulated as constraint by “e{ SuSy } & no ppi”, which results into 39 paths. To restrict the path length, because many very long paths occur, we add the constraint “e {SuSy} & no ppi & lmax{ 11 }”, whereby the length 11 is chosen heuristically. Using length nine, we get only one path, using length thirteen, we get fourteen paths, while for the length eleven we get the following nine paths: 1: 2: 3: 4: 5: 6: 7: 8: 9:
Suc, Inv, Frc, FK, ADP, Glyc(b), ATP, StaSy, starch, Suc, Inv, Frc, FK, ADP, NDPkin rev, ATP, StaSy, starch, Suc, Inv, Frc, FK, ADP, AdK rev, ATP, StaSy, starch, Suc, Inv, Frc, FK, F6P, Glyc(b), ATP, StaSy, starch, Suc, Inv, Frc, FK, F6P, PGI rev, G6P, StaSy, starch, Suc, Inv, Frc, Glc, HK, ADP, Glyc(b), ATP, StaSy, starch, Suc, Inv, Frc, Glc, HK, ADP, NDPkin rev, ATP, StaSy, starch, Suc, Inv, Frc, Glc, HK, ADP, AdK rev, ATP, StaSy, starch, Suc, Inv, Frc, Glc, HK, G6P, StaSy, starch.
120
I. Koch et al. / STEPP – Search Tool for Exploration of Petri net Paths: A New Tool Table 2 The results of searching for all paths from eSuc to starch with at most eleven vertices 1 SucTrans, SuSy, UGPase, NDPkin rev, StaSy(b) 2 SucTrans, SuSy, UGPase, PGM rev, StaSy(b) 3 SucTrans, SuSy, FK, Glyc(b), StaSy(b) 4 SucTrans, SuSy, FK, NDPkin rev, StaSy(b) 5 SucTrans, SuSy, FK, Adk rev, StaSy(b) 6 SucTrans, SuSy, FK, Glyc(b), StaSy(b) 7 SucTrans, SuSy, FK, PGI rev, StaSy(b) 8 SucTrans, Inv, FK, Glyc(b), StaSy(b) 9 SucTrans, Inv, FK, NDPkin rev, StaSy(b) 10 SucTrans, Inv, FK, Adk rev, StaSy(b) 11 SucTrans, Inv, FK, Glyc(b), StaSy(b) 12 SucTrans, Inv, FK, PGI rev, StaSy(b) 13 SucTrans, Inv, HK, Glyc(b), StaSy(b) 14 SucTrans, Inv, HK, NDPkin rev, StaSy(b) 15 SucTrans, Inv, HK, Adk rev, StaSy(b) 16 SucTrans, Inv, HK, StaSy(b) In the results the transitions are given.
Considering the paths, it is conspicuous that a large number of paths go through ADP and ATP in the same order. The ADP, which is formed by fructokinase by consuming fructose and ATP or by hexokinase by consuming glucose and ATP, is converted again into ATP by glycolysis, NDPkin rev, and adenylate kinase. This ATP is used for starch synthesis. Another path goes from F6P through glycolysis to starch synthesis. But, these paths are not of interest for our question, because during the starch synthesis process G6P is converted into starch, whereas ATP gives the necessary energy. Thus, we extend our constraints by excluding ATP “e {SuSy} & no ppi & lmax{ 11 } & e{ ATP }” or “e{ SuSy, ATP } & no ppi & lmax{ 11 }”. ADP must not be excluded explicitly, because ADP and ATP occur together on each path. Now, the path search yields the following two paths with a sensible biological interpretation: 1: eSuc, SucTrans, Suc, Inv, Frc, FK, F6P, PGI rev,G6P, StaSy, starch, 2: eSuc, SucTrans, Suc, Inv, Glc, HK, G6P, StaSy, starch. These two paths are the known paths through invertase. Sucrose-(re)-synthesis Here, all paths to re-synthesise sucrose should be found. One search starts from UDP-glucose and the other from glucose 6-phosphate. The search UDPglc->Suc “e{ ADP, ATP } & e{ UDP, UTP }” yields the three known correct paths: 1: UGPase, PGM rev, PGI, SPS, SPP, 2: SPS, SPP, 3: SuSy rev. The search G6P->Suc “e{ ADP, ATP } & e{ UDP, UTP }” yields the following paths: 1: PGM, UGPase rev, SPS, SPP, 2: PGM, UGPase rev, SuSy rev, 3: PGI, SPS, SPP, There are no paths through invertase, which is correct, because the invertase driven reaction is irreversible. The other paths go through PGM and PGI, which both continue through SPS and SPP, or
I. Koch et al. / STEPP – Search Tool for Exploration of Petri net Paths: A New Tool
121
SuSy rev, respectively, which reflects the biological behaviour correctly. Thus, this path search can also be used to validate Petri net models of biochemical networks. CONCLUSIONS We have introduced a new tool to search for paths with constraints in biochemical networks. We have explained the algorithm consisting of an exhaustive path search and the application of additional constraints defined by a special language. This language is described in the paper using various examples. We provided a case study for the sucrose-to-starch breakdown in the potato tuber to show the correctness and usefulness of the tool. Biological aspects related to special pathways of our case study are also discussed. The presented tool has been designed for application in path searches in biochemical systems to find alternative paths in the system. We have implemented a Web interface, which includes a Graphical User Interface and supports the definition of constraints. So far, there exists an interface to Petri nets, which can be extended easily to other formats. In order to enhance the performance we will implement an improved algorithm, which considers constraints during the path search. ACKNOWLEDGEMENTS We would like to thank the anonymous referee for helpful advices.This work was partly supported by the Federal Ministry of Education and Research of Germany (BMBF), BCB project 0312705D. REFERENCES • Cormen, T. H., Leiserson, C. E., Rivest, R. L. and Stein, C. (2001). Introduction to algorithms. The MIT Press, Cambridge, Masschusetts, London, England. • Heiner, M., Koch, I. and Will, J. (2004). Model validation of biological pathways using Petri nets – demonstrated for apoptosis. Biosystems 75, 15-28. • Heiner, M. and Koch, I. (2004). Petri Net Based Model Validation in Systems Biology Proc. IACATPN, Springer Verlag, Berlin, LNCS 3099, 216-237. • Hofest¨adt, R. (1994). A Petri net Application of Metabolic Processes. J. System Analysis, Modelling, and Simulation 16, 113-122. • Hofest¨adt, R. and Thelen, S. (1998). Quantitative Modelling of Biochemical Networks. In Silico Biol. 1, 0006. • Kanehisa, M. and Goto, S. (2000). KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27-30. • Koch, I., Junker, B. H. and Heiner, M. (2004). Application of Petri net theory for modelling and validation of the sucrose breakdown pathway in the potato tuber. Bioinformatics, Advance Access, Nov. 16. • Matsuno, H., Fujita, S., Doi, A., Nagasaki, M. and Miyano, S. (2003). Towards Pathway Modelling and Simulation. In: Applications and Theory of Petri Nets 2003, Proc. IACATPN, van der Aalst, W. and Best, E. (eds.), Springer Verlag, Berlin, LNCS 2679, 3-22. • Murata, T. (1989). Petri nets: properties, analysis, and applications. Proc. of the IEEE 77, 541-580. • Reddy, V. N., Mavrovouniotis, M. L. and Liebman, M. N.(1993). Petri Net Representation in Metabolic Pathways. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1, 328-336. • Reddy, V. N., Mavrovouniotis, M. L. and Liebman, M. N.(1996). Qualitative analysis of biochemical reaction systems. Comput. Biol. Med. 26, 9-24. • Starke, P. H. and Roch, S. (1999). Manual: INA – The Intergrated Net Analyzer, http://www.informatik.huberlin.de/˜starke/ina.html, Humboldt University, Berlin. • Voss, K., Heiner, M. and Koch, I. (2003). Steady State Analysis of metabolic pathways using Petri nets. In Silico Biol. 3, 0031.
122
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2005, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-122
Short Communication
Ontology Based Standardization of Petri Net Modeling for Signaling Pathways Takako Takai-Igarashi Graduate School of Information Science and Technology, University of Tokyo, 7-3-1, Hongou, Bunkyou-ku, Tokyo 113-0033, Japan E-mail:
[email protected]
ABSTRACT: Taking account of the great availability of Petri nets in modeling and analyzing large complicated signaling networks, semantics of Petri nets is in need of systematization for the purpose of consistency and reusability of the models. This paper reports on standardization of units of Petri nets on the basis of an ontology that gives an intrinsic definition to the process of signaling in signaling pathways. KEYWORDS: Petri net, signaling pathway, ontology, knowledge representation Abbreviations: AP-1, activator protein-1; CRM1, Chromosomal Region Maintenance 1; CSNO, Cell Signaling Networks Ontology; IL-3, interleukin-3; imp-b, importin-beta; MH2 domain, MAD Homolog 2 domain; NPC, nuclear pore complex; Ran, GTP-binding nuclear protein Ran; Smad, Mothers against decapentaplegic homolog; Smurf, Smad ubiquitination regulatory factor; TGFb, TGF-beta, transforming growth factor-beta; TGFbR, transforming growth factor-beta receptor
INTRODUCTION Petri nets are a popular formalization used for modeling and verifying non-deterministic discrete event systems, focusing on causal relationships between two of the discrete events [1]. A Petri net is a bipartite graph composed of two classes of nodes called places and transitions, normally associated with systems conditions for occurrence of events and actual occurrence of the events, respectively. Topological connections of nodes in the graph concisely indicate static relations of conditions and events. Dynamical behaviors of the systems are indicated by distributions of tokens changed progressively along individual fulfillments of conditions at places and succeeding firing the events at transitions. Application of Petri nets to biological pathways began with metabolic pathways. The application was extended to metabolic pathways including gene regulation, and then to gene regulation including signaling pathways. Petri nets have been studied for the framework of quantitative simulations that qualitative representation such as KEGG [2] was applied to. In Petri net models of biological pathways, places, transitions, and tokens account for ‘conditions of reactants for occurrence of biological reactions’, ‘actual occurrence of the biological reactions’, and ‘concentration of the reactants’, respectively.
T. Takai-Igarashi / Short Communication
123
Recently, Petri nets have been applied also to conceptual modeling of biological pathways. Knowledge representation has been reported in either Petri nets [3,4] or bipartite graphs [5] for qualitative behavior of signaling pathways. Knowledge representation in Petri nets has the advantage of intuitive and concise representation of complicated networks, as well as of readiness for quantitative systems analysis of the networks based on mathematical background. However, all of the existing Petri net representations include inconsistencies, which prevents us from reusing, sharing, and scaling up the accumulating Petri net models. The inconsistency is revealed in the representation of catalytic reactions in molecular complexes. In Fas-induced apoptosis pathway [3], catalytic reactions in molecular complexes are omitted from the Petri net representation. Every molecular complex is regarded as a whole, so that individual states of components of the complex are disregarded. In the IL-1-induced NF-κB pathway [4], a catalytic reaction in a molecular complex is represented in the Petri net. However, special places without any labels are used to represent the catalytic reaction, even though all the other places are explicitly labeled. The authors did not explain what the unlabeled places account for. The unlabeled places seem to account not for biological entities but imaginary ones. In a model for the TGF-β pathway [5], the authors invented a ‘virtual decomposition’ arc in order to represent a catalytic reaction in a molecular complex. The ‘virtual decomposition’ arc connects a place representing a molecular complex as a whole to a place representing a component of the complex. However, we cannot find any existing event accounting for the ‘virtual decomposition’ arc, nor can we find semantic consistency between this arc and the others. While metabolic pathways consist of catalytic reactions uniformly, signaling pathways consist of combinations of catalytic reactions, complex formations, and transportations. The chemical view draws a distinction between the three types of reactions. However, in the context of signaling pathways, every kind of reactions ought to play a common role of the process of signaling. While an entire Petri net is obviously modeled with the intention of representing a certain signaling process, individual reactions are modeled with the intention of the process of chemical reactions. One sees semantic discrepancy between the entire net and the individual reactants. If both these semantics become consistent, then the representation will be uniformed. The individual reactions should be represented on the basis of not only chemical reactions but also the common essence unique to the process of signaling. The essence is in need of ontological explication. We have developed an ontology for signaling pathway, Cell Signaling Networks Ontology (CSNO), which gives an intrinsic definition to the process of signaling [6,7]. CSNO defines the process of signaling as ‘the transmission of activity through molecular recognition’. Based on this definition, I propose units of Petri nets that standardize the representation of signaling pathways in Petri nets. The units make us free of ambiguity in constructing Petri net models of signaling pathways. Owing to the intuitive and concise graphical representation of biological concepts as well as the quantitatively analytical notation of dynamical behaviors of biological entities, Petri nets will become more popular with biologists for their modeling of biological pathways. More and more Petri net models will be accumulated rapidly. Our final goal is about modeling the entire cellular networks, which will be impossible without reusing and scaling up existing models. The more complicated and expanded Petri nets become, the more consistent the nets should be. Therefore standardization of Petri net units will make for foundations of modeling the entire cellular system. METHODS Ontological Definitions of the Process of Signaling Signal in general is defined as something that is coded into a certain variation of certain physical
124
T. Takai-Igarashi / Short Communication
quantity so as to carry certain information. In the case of sound signal transmission by FM (frequency modulation) modulation, electromagnetic wave carries sound information that is coded into its frequency, for instance. Accordingly one can specify the process of signaling by accounting for ‘what information the carrier carries’ and ‘what the information is coded into’. The process of signaling carries two kinds of information on ‘how to cause a signaling pathway’ and on ‘how to orient a signaling pathway’ [7]. The former information is coded into ‘activity’, which is reified as ‘phosphorylation activity’ and ‘binding activity’, for instance. The latter information is coded into ‘molecular recognition’, which is reified as ‘substrate-enzyme recognition’ and ‘MH2-MH2 domain recognition’ for instance. Accordingly, CSNO defines ‘the process of signaling’ as ‘the transmission of activity through molecular recognition’. Activity is a physical quantity in reaction kinetics indicating the efficiency of a reactant in causing a target reaction [6]. Activity is regarded as an abstraction of a functional change from inactivated state to activated state of a target molecule. For example, an ontological description of ‘transmission of activity to a target molecule’ is comparable to a natural description of ‘activation of a target molecule’. Abstraction of the common essence is in need of specification, so that every reaction can be founded on that in its representation. In contrast, molecular recognition is a common concept in biology, because the conceptualization has been needed for standardization of common sequence motifs required in molecular recognition. Ontological Specification of two Roles Carried by Reactants in Signaling Pathways CSNO defines two roles that a reactant plays in signaling pathways [6, 7]. One role is of transmitting activity through molecular recognition. The other role is of constituting a cellular entity in order to keep its structure and location in a cell. When reactants belong to a complex, the former role is assigned to the individual components, whereas the latter role is assigned to the complex as a whole. Accordingly, one ought to focus not on a complex as a structural object but on components carrying molecular recognition and activity in representation of signaling pathways. Only if molecular recognition is carried by a complex as a whole, the complex ought to be focused. Such a case is found in molecular recognition between a transporter and a transported complex in signaling pathways. RESULTS Definitions of Units of Petri Nets on the Basis of CSNO Ontological explication of the process of signaling enables us to specify Petri net units (Table 1), which standardize Petri net representation. Each unit coincides with a unit of the process of signaling. Places are associated with reactants holding activity and molecular recognition. Transitions are associated with reactions transmitting activity. Tokens are associated with the concentration of reactants holding activity and molecular recognition. The four units in Table 1 are disjoint and cover all the reactions in signaling pathways, as is indicated in is-a conceptual hierarchy for signaling reactions in Fig. 1(A). The primary classification is productive reaction or regulatory reaction, in which topological connections of nodes in the units are different. The classification accounts for the productivity of the reactions. While productive reactions can produce products reiteratively, regulatory reactions can produce products only once. The difference is indicated by an additional backward arc from a transition to a place associated with a reactant playing a catalyst.
T. Takai-Igarashi / Short Communication
125
Table 1 Petri net units for signaling pathways Type of signaling reaction
Productive reaction
Regulatory reaction
(Any)
Catalytic reaction in complex
Type of signaling reaction
Regulatory reaction
Regulatory reaction
Subtype of signaling reaction
Complex formation for allosteric effect
Complex formation for transportation
Subtype of signaling reaction Petri net unit
Petri net unit
Petri nets are Condition Event nets and tokens are positioned to be ready for firing the places. See Denotation of molecules in signaling pathways for legends of labels of places. See Fig. 1(A) for types and subtypes of signaling reactions. Cofactors are omitted from the representation because cofactors are considered to be represented with no ambiguity in existing models [3–5]. Table 1 illustrates each unit with a reaction holding two input places and one output place. There can be more than two input places as well as more than one output place, according to the number of focused species carrying activity and molecular recognition. Productive reaction: Reactant Y changes a state of reactant X to a certain state X(*) reiteratively, inducing activation or inactivation of the reactant X (*: one of the external states indicated in Fig. 1B). Complex formation for allosteric effect: Complex formation between X and Y changes a state of a focused component X to a bound state X(B), inducing activation or inactivation of the component X, with the consequence that a complex [X:Y ] is formed. Complex formation of transportation: Complex formation between X and Y produces a complex [X:Y ], which is recognized as a whole in a succeeding transportation reaction. Catalytic reaction in complex: Reactant X and reactant Y forms a complex in advance (X[X:Y ], Y[X,Y ]) and reactant Y changes a state of reactant X to a certain state X(*) only once, inducing activation or inactivation of the reactant X (*: one of the external states indicated in Fig. 1B).
A: Is-a conceptual hierarchy of signaling reactions, which are used in labels of transitions. A symbol ‘+’ indicates an is-a relation between an upper concept and a lower concept. B: A list of external states of reactants and symbols standing for the external states, which are used in labels of places. Productive reaction includes catalytic reaction in solution, catalytic reaction in transient complex, transportation, and gene expression, while regulatory reaction includes complex formation and catalytic reaction in complex. A catalytic reaction that occurs in a complex does not belong to productive reaction but to regulatory reaction, because the catalytic reaction occurs only once owing to the steric hindrance.
126
T. Takai-Igarashi / Short Communication
Fig. 1. Classification of concepts about the process of signaling.
I name the catalytic reaction catalytic reaction in complex, where the catalyst is used only once without being recycled any more. The Petri net unit accounting for catalytic reaction in complex includes no backward arc from a transition to a place associated with the catalyst. The steric hindrance effect has not been taken into account properly in any representation of signaling pathways so far. Petri net modeling is in need of careful involvement of the steric hindrance effect so as to represent physical quantity of reactants correctly. Complex formation is divided into two groups: complex formation for allosteric effect and complex formation for transportation. Although both the units have the same topological connections of the nodes, focused reactants are opposite. It is a state of a component that is focused in complex formation for allosteric effect. In contrast, it is a state of a complex as a whole that is focused in complex formation for transportation, because the complex as a whole is recognized by a transporter in a succeeding reaction. Catalytic reaction is also divided into two groups: catalytic reaction in solution and catalytic reaction in transient complex. The two groups are divided according to involvement of molecular recognition in the reactions. Catalytic reaction in solution involves only molecular recognition at a catalytic site, whereas catalytic reaction in transient complex involves its particular molecular recognition in addition to molecular recognition at a catalytic site. However, both the reactions have no difference in their productivity, so that both the reactions are associated with the same Petri net units. Accordingly all the units consist with units of ‘the transmission of activity through molecular recognition’. Because the units are disjoint and cover all the signaling reactions, one can construct Petri net models without ambiguity based on the units.
T. Takai-Igarashi / Short Communication
127
Fig. 2. An example of Petri net representation of signaling pathways on the basis of the Petri net units. I take the TGF-β pathway [9] as an example. See Denotation of molecules in signaling pathways for legends of labels of places. See Fig. 1(A) and Definitions of units of Petri nets on the basis of CSNO for legends of labels of transitions. A double circle indicates a place of input or output. Tokens represent a distribution of activated molecules before the extracellular stimulus (TGFb) arrives (before firing). See Tokens in Petri nets for signaling pathways for the semantics of tokens.
Denotation of Molecules in Signaling Pathways For the purpose of consistent representation of signaling pathways in Petri nets, a new way of denotation of molecules is introduced, which is included in labels attached to places. The labels include ‘external properties’ of reactants. CSNO defines ‘internal properties’ as the properties accounting for an intrinsic definition of the process of signaling. The other properties are defined as ‘external properties’. A label consists of a name of the focused reactant holding activity and molecular recognition, external states of the reactant, and components of a complex if the reactant belongs to the complex. External states are listed in Fig. 1(B). In a label, every external state is indicated by a symbol related to the external state and in parentheses. For example, ‘TGRbR2(BP)’ indicates ‘bound and phosphorylated TGFbR2’. The symbols are in order of occurrence in the signaling pathway. A molecular complex, which is another external property of a focused reactant, is indicated by a list of names of all the components of the complex, separated by colons, in square brackets, and in italics. A description of a complex follows after a description of external states in a label. For example, ‘TGRbR1(BP)[TGFb:TGFbR1:TGFbR2]’ indicates that ‘bound and phosphorylated TGFbR1 in a complex composed of TGFb, TGFbR1, and TGFbR2’ (Fig. 2). Some inconsistency appears in the Petri net representation of catalytic reactions in molecular complexes. Based on the Petri net units, one can consistently represent any catalytic reaction in molecular
128
T. Takai-Igarashi / Short Communication
complexes. An example is shown in Fig. 2, which includes the pathway modeled in bipartite graphs in [5]: the TGF-β pathway. In [5] the authors introduced in their graph representation ‘virtual decomposition’ arc whose semantics is different from that of other arcs, whereas in Fig. 2 all the arcs account for the consistent semantics: the transmission of activity through molecular recognition. The Petri net in Fig. 2 does not describe all the states of all the components of the receptor complex. For example, it omits TGFbR2 in a complex composed of Smad7, Smurf1, TGFb, TGFbR1, and TGFbR2. The state of TGFbR2 can be omitted because it is regarded as not to hold activity and molecular recognition but to serve for keeping the structure and location of the other focused components. Such simplification is upheld by the ontological explication that activity and molecular recognition accounts for the common essence of the process of signaling. Thus the Petri net units prevent the Petri net graphs from being too complicated and unmanageable as well as from being inconsistent. Tokens in Petri Nets for Signaling Pathways In the Petri net representations of biological pathways reported so far, tokens in a place have accounted for the concentration of molecules in a certain state that the place indicates. Because the Petri net units coincide with units of transmitting activity, the tokens in the units account for the concentration of molecules carrying activity. If activated molecules are free in a cell, then the tokens represent concentration of the activated molecules themselves. If activated molecules belong to complexes, then the tokens represent the concentration of the activated components in complexes. Petri nets in Tables 1 and Fig. 2 are Condition Event nets: the most fundamental class of Petri nets, where every place takes a token of one or zero. Therefore the tokens come to represent existence of molecules of carrying activity. In other words, the tokens represent activation or inactivation of individual molecules alternating progressively in a signaling pathway. Accordingly tokens in the initial state of a Petri net represent a distribution of activated molecules before receiving extracellular stimuli (Fig. 2). Tokens in the final state of a Petri net represent a distribution of activated molecules as a consequence of response to extracellular stimuli. It is the remarkable advantage of Petri nets that a progression of activations is concisely visualized by execution of a token game in the nets. The Petri net units can represent correct stoichiometry of every reaction at every time point, even when they are applied to higher classes of Petri nets. When applied, one should make firing rules satisfy the condition that places should carry the same numbers of tokens if the places are labeled external states involving the same members of complexes. For example in Fig. 2, the number of tokens in the place of TGFb(B)[TGFb:TGFbR2] and that of TGFbR2(B)[TGFb:TGFbR2] should be the same at any time point in the token game. DISCUSSION When Petri nets are used for a conceptual formalization of causal sequences of events, the formalization should be based on consistent semantics accounting for the common essence of the causal sequences. CSNO provides us with such consistent semantics for signaling pathways. I would like to propose standardization of Petri net units based on CSNO, in which every place, transition, and token is specified on the basis of ontological definitions of the process of signaling. On the other hand, semantics of existing Petri net models is left discrepant, so that some places and transitions might be unspecified and inconsistent.
T. Takai-Igarashi / Short Communication
129
Petri nets are of advantage to applying the nets to simulation studies. Although an example of Condition Event nets is described in this paper, they can be applied to higher classes of Petri nets such as Hybrid Petri nets soundly [10]. A final goal is to reconstruct comprehensive knowledge of cellular functions into a computer system on the basis of Petri net representation and ontology. For this goal, I have to further investigate (1) practical transformation of existing data into a standardized formalization I propose, (2) development of an algorithm to generate drawings of Petri net graphs, (3) specification of consistent semantics for interrelation between signaling and metabolic pathways in cells, and (4) integration of systematized cellular functions such as Gene Ontology [11] with biological pathways in Petri nets. ACKNOWLEDGEMENTS The author is grateful to Prof. Riichiro Mizoguchi (Osaka University) and anonymous referees for insightful comments on this manuscript. The author also thanks to Prof. Hiroshi Matsuno (Yamaguchi University) for valuable discussions. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]
Reisig, W. and Rozenberg, G. (eds.) (1998). Lectures on Petri Nets I: Basic Models. Lecture Notes in Computer Science 1491, Springer-Verlag. Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y. and Hattori, M. (2004). The KEGG resources for deciphering the genome. Nucleic Acids Res. 32, D277-D280. Matsuno, H., Tanaka, Y., Aoshima, H., Doi, A., Matsui, M. and Miyano, S. (2003). Biopathways representation and simulation on hybrid functional Petri net. In Silico Biol. 3, 0032. Lee, D., Zimmer, R., Lee, S., Hanisch, D. and Park, S. (2004). Knowledge representation model for systems-level analysis of signal transduction networks. Genome Informatics 15, 234-243. Choi, C., Crass, T., Kel, A., Kel-Margoulis, O., Krull, M., Pistor, S., Potapov, A., Voss, N. and Wingender, E. (2004). Novel consistent modeling of signaling pathways and its implementation in the TRANSPATH database. Genome Informatics 15, 244-254. Takai-Igarashi, T. and Mizoguchi, R. (2004). Ontological integration of data models for cell signaling pathways. Genome Informatics 15, 255-265. Takai-Igarashi, T. and Mizoguchi, R. Ontological explication of cooperation functions of molecular complexes in signaling pathways. Manuscript submitted. Guarino, N. (1998). Some Ontological Principles for Designing Upper Level Lexical Resources. In: Proceedings of the First International Conference on Language Resources and Evaluation. Granada, Spain. Shi, Y. and Massagu´e, J. (2003). Mechanisms of TGF-β signaling from cell membrane to the nucleus. Cell 113, 685-700. Matsuno, H., Doi, A., Nagasaki, M. and Miyano, S. (2000). Hybrid Petri net representation of gene regulatory network. Pac. Symp. Biocomput. 5, 338-349. Harris, M. A., et al.; Gene Ontology Consortium (2004). The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258-D261.
130
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2006, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-130
Simulation-Based Validation of the p53 Transcriptional Activity with Hybrid Functional Petri Net Atsushi Doia , Masao Nagasakia , Hiroshi Matsunob,∗ and Satoru Miyanoa a Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan b Faculty of Science, Yamaguchi University, 1677-1 Yoshida, Yamaguchi, 753-8512, Japan
ABSTRACT: MDM2 and p19ARF are essential proteins in cancer pathways forming a complex with protein p53 to control the transcriptional activity of protein p53. It is confirmed that protein p53 loses its transcriptional activity by forming the functional dimer with protein MDM2. However, it is still unclear that protein p53 keeps its transcriptional activity when it forms the trimer with proteins MDM2 and p19ARF. We have observed mutual behaviors among genes p53, MDM2, p19ARF and their products on a computational model with hybrid functional Petri net (HFPN) which is constructed based on information described in the literature. The simulation results suggested that protein p53 should have the transcriptional activity in the forms of the trimer of proteins p53, MDM2, and p19ARF. This paper also discusses the advantages of HFPN based modeling method in terms of pathway description for simulations. KEYWORDS: Hybrid functional Petri net, p53, biological pathway, simulation
INTRODUCTION Molecular interactions are usually summarized in a picture composed of figures of various shapes (e.g. circles and rectangles) and several types of arrows. Graphical images in the picture are important since they reflect the knowledge in biology and medicine. Biological pathway databases such as KEGG [Kanehisa and Goto, 2000] and TRANSPATH [Krull et al., 2003], (http://www.biobase.de/) have compiled many biological molecular interactions, providing invaluable information to researchers in the forms of pictures. However, with such databases, it is not easy to grasp the information about quantitative interactions of molecules, since such databases focus on providing qualitative information of molecular interactions. On the other hand, the computer simulation has received attentions of researchers in biology and medicine as a useful method to understand biological mechanisms in molecular level of their interests. It is natural to have an idea to use computer simulations for obtaining quantitative information of molecular interactions. However, it is impossible to conduct simulations with only molecular interaction ∗
Corresponding author. E-mail:
[email protected].
A. Doi et al. / Simulation-Based Validation of the p53 Transcriptional Activity with Hybrid Functional Petri Net
131
maps stored in these pathway databases because of the lack of information for constructing simulatable computational models. We have conducted some simulations of biological phenomena including apoptosis signaling pathway [Matsuno et al., 2003b], cell cycles [Fujita et al., 2004; Matsui et al., 2004], and circadian rhythms [Matsuno et al., 2003b], etc. [Matsuno et al., 2000; Doi et al., 2003; Matsuno et al., 2003a; Doi et al., 2004]. Hybrid functional Petri net (HFPN) [Matsuno et al., 2003b] is adopted to construct these computational models for the simulations. These HFPN models are constructed, being based on pictures in the biological literature. Thereafter, parameters of reactions such as the transcription speeds of genes and degradation rates of proteins shall be tuned so that input/output concentration behaviors of substances such as mRNAs and proteins are matched with biological facts which have been obtained from experiments and/or written information in the literature. With this method, we can include information for simulating molecular reactions in the HFPN model while keeping graphical images of the original biological picture. Proteins p53, MDM2, and p19ARF are proteins closely related to cancer. The protein p53 is a protein which suppresses the formation of tumors, and the protein MDM2 promotes the formation of tumors by decreasing the activity of the protein p53. Understanding of control mechanism of these proteins connects to development of an effective medicine for suppressing the tumor. In this paper, we present a new HFPN model of a cancer pathway including a tumor suppressor gene p53. As the genes related to p53, genes MDM2 and p19ARF have been identified [Zhang and Xiong, 2001; Iwakuma and Lozano, 2003]. MDM2 works as an inhibitor for p53, and MDM2 is further inhibited by p19ARF. These interactions of these three genes have been described in the existing biological pathway databases including KEGG [Kanehisa and Goto, 2000] and TRANSPATH [Krull et al., 2003] (http://www.biobase.de/). This paper presents an HFPN model of the interaction of p53, MDM2, and p19ARF, and gives simulation results on the HFPN model using Cell Illustrator [Nagasaki et al., 2003], (http://genomicobject.net/˜gon/p53/, http://www.fqspl.com.pl/?a=product view&id=20&lang=en). We also indicate that the facts and data in the biological literature can be interpreted into the HFPN model to construct this dynamic pathway model, while the conventional pathway databases screen out some helpful information for system dynamics. It is known that protein p53 works as a transcription factor for many genes [el-Deiry, 1998] and its transcriptional activity is controlled by a complex formed with proteins MDM2 and p19ARF [Zhang and Xiong, 2001; Iwakuma and Lozano, 2003]. However, it is still unclear whether protein p53 keeps its transcriptional activity in the form of the trimer with proteins p53, MDM2 and p19ARF. With our HFPN model, we have simulated mutual behaviors between genes p53, MDM2, p19ARF, and their products. The simulation results suggested that protein p53 should have transcriptional activity in the forms of the trimer of proteins p53, MDM2, and p19ARF. HFPN MODEL OF PROTEIN INTERACTIONS OF p53, MDM2, AND p19ARF Hybrid functional Petri net A Petri net is a network consisting of places, transitions, arcs, and tokens. A place (depicted as a circle) can hold tokens as its content. At a transition (depicted as a filled rectangle), arcs coming from places and those going out from the transition to some places can be connected. A transition with these arcs defines a firing rule with regard to the contents of the places to which the arcs are attached.
132
A. Doi et al. / Simulation-Based Validation of the p53 Transcriptional Activity with Hybrid Functional Petri Net
Fig. 1. Basically, Petri nets are constructed using three kinds of symbols for places, transitions, and arcs. In Cell Illustrator, both sets of places and transitions are classified into discrete and continuous types, and places and transitions can be replaced with pictures reflecting the biological images. This replacement makes the HFPN model of a biological pathway more comprehensible for biologists.
HFPN was defined by Matsuno et al. [Matsuno et al., 2003b] as an extension of a hybrid Petri net [Alla and David, 1998]. HFPN has two kinds of places, namely, discrete and continuous (depicted as a double circle) and two kinds of transitions, discrete and continuous (depicted as an unfilled rectangle). The concepts of discrete place and discrete transition are the same as those in the traditional discrete Petri net.1 A continuous place can hold a real number as its content. A continuous transition fires continuously at the speed of the parameter assigned to the continuous transition. The traditional symbols of these places and transitions are shown in Fig. 1. Three types of arcs are used in HFPN. A specific value is assigned to each arc as a weight. When a normal arc (a solid arc in Fig. 1) with weight w is attached to a discrete/continuous transition, a certain number of tokens are transferred through the normal arc only if the content of the place at the source of the normal arc exceeds the weight w. The firing rule of a test arc is the same as that of a normal arc in terms of the weight, but the content of the place at the source of the test arc is not consumed by firing. A test arc (a dashed line arc in Fig. 1) can be used to represent enzyme activity since the enzyme itself is not consumed. An inhibitory arc (a line terminated with the small bar in Fig. 1) with weight w enables the transition to fire only if the content of the place at the source of the arc is less than or equal to w. For example, an inhibitory arc can be used to represent repressive activity in gene regulation (inhibitory arcs are not used in this paper). HFPN model construction based on the literature Figure 2 shows an HFPN model which has been constructed by compiling and interpreting the information of p53-MDM2 interactions in the literature [Barak et al., 1993; Miyashita and Reed, 1995; Honda et al., 1997; Kamijo et al., 1998; Pomerantz et al., 1998; Zhang et al., 1998; Tao and Levine, 1999; Zhang and Xiong, 1999]. We changed the symbols of “place” and “transition” to biological images. Although these changes have no effect on mathematical meaning, it is helpful for biologists to 1
A discrete place and a discrete transition are represented by symbols of a single circle and a filled rectangle, respectively. These symbols are not used in the HFPN of Fig. 2.
A. Doi et al. / Simulation-Based Validation of the p53 Transcriptional Activity with Hybrid Functional Petri Net
133
Fig. 2. HFPN model of interactions of genes p53, MDM2, and p19ARF and their products. For places and transitions, pictures reflecting biological images are used (see Fig. 1). Biological meanings of transitions T1 , . . . , T20 are summarized in Table 1.
understand the pathway. Each substance such as protein or mRNA corresponds to an HFPN element “place” (originally a double circle, but it is changed to a picture reflecting the biological meaning of the place: see Fig. 1 ), which holds the concentration of the substance. In Fig. 2 , each place is labeled with the name of the substance (e.g. p53 mRNA, p19ARF). The name of a complex of two proteins A and B is represented as A B, where places for proteins A and B are labeled with A and B. An additional name (C) or (N) is attached at the tail of a substance name, when we need to distinguish locations of the substances in the cytoplasm or in the nucleus. Twenty-one biological events related to p53-MDM2 interactions are summarized in the second column of Table 1, which have been extracted from the literature [Barak et al., 1993; Miyashita and Reed, 1995; Honda et al., 1997; Kamijo et al., 1998; Pomerantz et al., 1998; Zhang et al., 1998; Tao and Levine, 1999; Zhang and Xiong, 1999]. Each of these events is represented by an HFPN element “transition” (originally an unfilled rectangle, but it is changed to a picture reflecting the biological meaning of the
134
A. Doi et al. / Simulation-Based Validation of the p53 Transcriptional Activity with Hybrid Functional Petri Net Table 1 Biological facts extracted from the literature and assignments to transitions in the HFPN model of Fig. 2 Biological phenomena on the literature
1 2 3 4 5 6 7 8 9 10 11 12 13
(obtained by experiments) p53(N) is bound to MDM2(N), forming complex p53 MDM2(N). MDM2(N) is bound to p19ARF(N), forming complex MDM2 p19ARF. p53 MDM2(N) is bound to p19ARF(N), forming complex p53 MDM2(N) p19ARF. MDM2 p19ARF is bound to p53(N), forming complex p53 MDM2(N) p19ARF. Transcription of injected gene p53, producing p53 mRNA. p53 mRNA is translated to p53(C). p53 MDM(N) is exported from the nucleus to the cytoplasm (p53 MDM(C)). p53 is marked with ubiquitin (multiubiquitin chain) (p53[Ub]). Polyubiquitinated p53 (p53[Ub]) is destroyed by proteasome. Protein MDM2 (MDM2(C)) is imported from the cytoplasm to the nucleus (MDM2(N)). Protein p53 (p53(C)) is imported from the cytoplasm to the nucleus (p53(N)). Transcription of injected gene MDM2, producing MDM2 mRNA. MDM2 mRNA is translated to MDM2(C).
14 Transcription of injected gene p19ARF, producing p19ARF mRNA. 15 p19ARF mRNA is translated to p19ARF(C). 16 Protein p19ARF (p19ARF(C)) is imported from the cytoplasm to the nucleus (p19ARF(N)). 17 Protein p53 (p53(N)) activates transcription of gene Bax, producing Bax mRNA. 18 Protein p53 (p53(N)) activates transcription of gene MDM2, producing MDM2 mRNA. (endogenous). 19 Stabilizing p53 complex (p53 MDM2 p19ARF) activates transcription of gene Bax, producing Bax mRNA. 20 Stabilizing p53 complex (p53 MDM2 p19ARF) activates transcription of gene MDM2, producing MDM2 mRNA. 21 p19ARF could not affect to p53 transactivation without Protein MDM2.
#1
#2
T1
Type of biological process m1*m2*0.01 binding
T2
m2*m4*0.01 binding
T3
m4*m5*0.01 binding
T4
m1*m6*0.01 binding
T5
1
T6
m10*0.1
translation
T7
m5*0.1
nuclear export
T8
transcription
m7*m8*0.01 ubiquitination
Literature
[Kamijo et al., 1998, Zhang et al., 1998] [Kamijo et al., 1998, Pomerantz et al., 1998, Zhang et al., 1998] [Kamijo et al., 1998, Zhang et al., 1998] [Kamijo et al., 1998, Zhang et al., 1998] − [Kamijo et al., 1998, Pomerantz et al., 1998] [Tao and Levine, 1999, Zhang and Xiong, 1999] [Honda et al., 1997]
T9
m9*0.5
T10
m12*0.1
T11
m11*0.1
T12
1
T13
m13*0.1
T14
1
T15
m15*0.1
T16
m16*0.1
[Honda et al., 1997, Pomerantz et al., 1998] nuclear import [Tao and Levine, 1999, Zhang and Xiong, 1999] nuclear import [Tao and Levine, 1999, Zhang and Xiong, 1999] transcription [Kamijo et al., 1998, Pomerantz et al., 1998] translation [Kamijo et al., 1998, Pomerantz et al., 1998] transcription [Kamijo et al., 1998, Pomerantz et al., 1998] translation [Kamijo et al., 1998, Pomerantz et al., 1998] nuclear import [Tao and Levine, 1999]
T17
m1*0.1
transcription
[Miyashita and Reed, 1995]
T18
m1*0.1
transcription
[Barak et al.,1993]
T19
m3*0.1
transcription
−
T20
m3*0.1
transcription
−
transcription
[Kamijo et al., 1998]
−
degradation
(#1: Corresponding transitions in the HFPN, #2: Speed of corresponding transitions in the HFPN. mX (X = 1, . . . , 20) is the concentration of a corresponding substance in Table 2.)
transition: see Fig. 1), where the reaction speed of the event is assigned. Twenty biological events are assigned to the transitions T i (i = 1,. . .,20) as shown in the third column of Table 1. Types of biological
A. Doi et al. / Simulation-Based Validation of the p53 Transcriptional Activity with Hybrid Functional Petri Net
135
Table 2 Places in the HFPN model of Fig. 2 Place Name p53(N) MDM2(N) p53 MDM2 p19ARF p19ARF(N) p53 MDM2(N) MDM2 p19ARF p53 MDM2(C) Ubiquitin p53[Ub] p53 mRNA p53(C) MDM2(C) MDM2 mRNA Bax mRNA p19ARF mRNA p19ARF(C)
Variable (mX) m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12 m13 m14 m15 m16
Initial Value 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0
Variable (mX (X = 1, . . ., 16)) indicates a concentration of each substance. Initial Value is a initial content of a place.
processes are described in the fourth column of Table 1 with the literature in the fifth column. Table 2 summarizes variable and initial value of the place in the Fig. 2. Note that the transitions d j (j = 1, . . ., 15) represent natural degradation of the corresponding substances. We define the speed of natural degradation as mX*0.01(mX indicates the concentration of a corresponding substance). By means of these transitions and notations for molecules, the molecular interactions in the pathway can be described as follows: Protein p53 in the nucleus (p53(N)) forms complex with MDM2(N) (T 1 ), migrating to the outside of the nucleus (T 7 ) (p53 MDM2(C)). Then with ubiquitin, p53 MDM2(C) produces p53[Ub] (T8 ) which will be decomposed by proteasome (T 9 ). Hence, the complex formation of p53 MDM2(N) decreases the concentration of protein p53 in the nucleus (p53(N)). In contrast, p19ARF(N) forms trimer p53 MDM2 p19ARF with proteins p53(N) and MDM2(N), thereby preventing p53(N) from decreasing. There are two pathways to form the trimer p53 MDM2 p19ARF: One is the case that p19ARF(N) is bound to complex p53 MDM2(N) (T 3 ) after forming the complex of p53(N) and MDM2(N) (T1 ), and the other is the case that p53(N) is bound to complex MDM2 p19ARF (T 4 ) after forming the complex of MDM2(N) and p19ARF(N) (T2 ). Consequently, p19ARF(N) prevents p53(N) from decreasing because p53 MDM2 p19ARF can not be transferred to the cytoplasm, not being marked with ubiquitin. After degradation of protein p53 of the heterodimer p53 MDM2(C) by proteasome (T 8 and T9 ), the remaining MDM2(C) migrates (T10 ) to the inside of the nucleus (MDM2(N)). Gene p53 is transcribed (T5 ) and translated (T6 ) to produce protein p53(C), then it is migrated (T11 ) to the inside of the nucleus (p53(N)). The fact that p53(N) can contribute to the transcription of gene MDM2 is expressed in this HFPN model by describing a test arc from place p53(N) to transition T 18 . Transitions T12 and T14 are used for expressions of MDM2 and p19ARF, respectively. Translations of genes MDM2 and p19ARF are represented by transitions T 13 and T15 , respectively. Transition T16 represents the nuclear import of protein p19ARF. Activation of gene Bax by protein p53(N) is represented by a test arc from place p53(N) to transition T17 . Experimentally, the transcriptional activity of protein p53 is detected by the concentration of Bax mRNA, and this is a reason why gene Bax appears in Fig. 2.
136
A. Doi et al. / Simulation-Based Validation of the p53 Transcriptional Activity with Hybrid Functional Petri Net
Our HFPN pathway model involves knowledge about protein subcellular localization, process of forming protein complexes, and functional molecular interaction. Starting from a qualitative pathway model, we manually tuned the parameters for the transitions and initial conditions on places in the HFPN model so that the model is consistent with the data in [Pomerantz et al., 1998]. Thus it also involves knowledge about system dynamics. The HFPN model in Fig. 2 is available from http://genomicobject.net/, http://genomicobject.net/˜gon/p53/ including all parameters in the model and can be simulated on Cell Illustrator 2.0 ( http://www.fqspl.com.pl/?a=product view&id=20&lang=en). Transcriptional activity of p53-MDM2-p19ARF complex on MDM2 and Bax For the transcriptional activity of the complex p53-MDM2-p19ARF, two cases opposite to each other can be considered: the p53-MDM2-p19 complex can activate MDM2 and Bax or can not activate them. It is confirmed that the complex of proteins p53 and MDM2 has no transcriptional activity on MDM2 and Bax [Oliner et al., 1993; Honda et al., 1997], while protein p53 itself has transcriptional activity on them [Barak et al., 1993; Miyashita and Reed, 1995]. This change on the p53 transcriptional activity results from the binding of protein MDM2 to the site of protein p53 which is also the transactivation domain for downstream genes. From this fact, it is natural to consider that protein p53 loses its transcriptional activity by forming the complex with protein MDM2. However, we find the following observations in the literature which are contradictory to the above fact: A. Protein p53 accumulates in the nucleus when both proteins p19ARF and MDM2 exist in the cell [Pomerantz et al., 1998; Zhang and Xiong, 1999; Zhang and Xiong, 2001]. B. When plenty of protein p53 exist in the nucleus, an increase of p53 transcriptional activity can be observed [Pomerantz et al., 1998]. On the assumption that p53 itself and the complex p53-MDM2 do not accumulate enough to have transcriptional activity through translocation of the p53-MDM2 complex from the nucleus to the cytoplasm, we may consider that in A)-B), protein p53 forms the complex p53-MDM2-p19ARF with proteins MDM2 and p19ARF. This may mean that the complex p53-MDM2-p19ARF has transcriptional activity on genes MDM2 and Bax. The next section shows simulations on the HFPN model of Fig. 2. The simulation results suggested that the complex p53-MDM2-p19ARF should have the transcriptional activity on genes MDM2 and Bax. SIMULATION AND RESULTS Figure 3 shows the results of simulations, where concentration behaviors of p53(N), MDM2(N), p19ARF(N), p53 MDM2 p19ARF, and Bax mRNA are observed in the following combinations of three genes expressions; p53, MDM2, and p19ARF. We introduced gene Bax in the HFPN model in order to detect the expression level of gene p53. We suppose that a cell is rich in an amount of ubiquitin (the initial value of the place for ubiquitin is set to be 100). We considered the following cases: 1. 2. 3. 4.
All genes p53, MDM2, and p19ARF are not expressed (transitions T 5 , T12 , and T14 are deleted). Only gene p53 is expressed (transitions T 12 and T14 are deleted). Genes p53 and MDM2 are expressed (transition T 14 is deleted). The complex p53-MDM2-p19ARF can not activate genes MDM2 and Bax, while all of genes p53, MDM2, and p19ARF are expressed (transitions T 19 and T20 are deleted).
A. Doi et al. / Simulation-Based Validation of the p53 Transcriptional Activity with Hybrid Functional Petri Net
137
Fig. 3. Simulation results of concentration behaviors of p53(N), MDM2(N), p19ARF(N), p53 MDM2 p19ARF, and Bax mRNA on combinations of p53, MDM2, and p19ARF overexpressions. +; overexpress the corresponding gene, –; not overexpress the corresponding gene. Transition of T5 , T12 , and T14 incorporates to the HFPN model for representing the overexpression of p53, MDM2, and p19ARF. Each of these transitions is removed from the HFPN model in the case of no overexpression of the corresponding gene.
5. The complex p53-MDM2-p19ARF can activate genes MDM2 and Bax, while all of genes p53, MDM2, and p19ARF are expressed. When none of three genes p53, MDM2, and p19ARF is expressed, the concentrations of p53(N), MDM2(N), p19ARF(N), p53 MDM2 p19ARF, and Bax mRNA do not grow (Fig. 3(1)). When gene p53 is expressed, protein p53 in the nucleus (p53(N)) shows a temporal high concentration, and thereafter keeps its concentration at a lower level (Fig. 3(2)). Low accumulation of MDM2(N) in the nucleus and low expression of Bax mRNA are also observed.
138
A. Doi et al. / Simulation-Based Validation of the p53 Transcriptional Activity with Hybrid Functional Petri Net
Fig. 4. Interactions of p53, MDM2, and ARF described in KEGG. Simplified information between two proteins is presented.
Figure 3(3) shows that MDM2(N) accumulates more in the nucleus compared to that of Fig. 3(2) when genes MDM2 and p53 are expressed. In contrast, the concentration of protein p53(N) becomes lower compared to its concentration in Fig. 3(2). On the assumption that gene MDM2 is not knocked out in our model, protein MDM2 is accumulated (Fig. 3(2)) in the nucleus due to the activation of MDM2 by p53 in the nucleus (T 18 ). As shown in Fig. 3(2), although protein p53 concentrates in the nucleus, it rapidly decreases after reaching the peak. This decrease is caused by protein MDM2 that ubiquitinates protein p53 exported to the cytoplasm from the nucleus. This low concentration of protein p53 causes the low level expression of gene Bax. In Fig. 3(3), we can observe that protein p53 keeps at lower concentration level in the nucleus than Fig. 3(2). This reduction of protein p53 results from the expression of gene MDM2. Figure 3(4) and 3(5) show the case when all of genes p53, MDM2, and p19ARF are expressed under two different assumptions on the transcriptional activity of the complex p53-MDM2-p19ARF. – High accumulation of the complex p53 MDM2 p19ARF is observed under the assumption that the complex p53-MDM2-p19ARF has no transcriptional activity for genes MDM2 and Bax (Fig. 3(4)). Besides, both of the accumulation of protein p53(N) and the expression of Bax mRNA keep in low level. – In contrast, when we assume the transcriptional activity of the complex p53-MDM2-p19ARF for MDM2 and Bax genes (Fig. 3(5)), lower accumulation of p53 MDM2 p19ARF is observed. In addition, both of the accumulation of MDM2(N) and the expression of Bax mRNA grow rapidly and keep in higher level in comparison with the case of Fig. 3(4). OTHER DESCRIPTIONS ON p53, MDM2 AND p19ARF INTERACTIONS Figure 4 is the part of molecular interactions in KEGG [Kanehisa and Goto,2000], (http://www.genome. jp/kegg/pathway/hsa/hsa04110.html) which shows the relationship between p53 and MDM2. 2 In general, biological interactions to repress gene products are not restricted to one kind of effect: a protein A represses the expression of a gene B , and a protein A represses the activities of a protein B and so on. Figure 4 does not give any specific information for the repression of p53 by MDM2. Figure 5 is the part of the pathway about p53 in TRANSPATH [Krull et al., 2003], (http://www. biobase.de/). Although this map describes the information on the complex formation of MDM2 and 2
To unify the symbols used in this paper, we use “MDM2” instead of “Mdm2” written in KEGG and TRANSPATH. In addition, “ARF” in KEGG and TRANSPATH involves both meanings of p14ARF and p19ARF [Zhang and Xiong, 2001].
A. Doi et al. / Simulation-Based Validation of the p53 Transcriptional Activity with Hybrid Functional Petri Net
139
Fig. 5. Interactions of p53, MDM2, and ARF described in TRANSPATH. This map includes ubiquitination of protein p53 by protein MDM2 and degradation of protein p53 by proteasome. In addition, the following facts are described: acetylated and phosphorylated p53 proteins have transcriptional activity by forming tetramer, and ARF-INK4a forming complex with MDM2 localizes in the nucleolus.
ARF, ubiquitination of p53 by the complex of MDM2 and ARF is not involved in this map while the effect of tetramer formation of protein p53 is involved.3 Figure 6 shows the relationships among proteins p53, MDM2, and p19ARF in the description proposed by Kohn [Kohn, 1999], (http://discover.nci.nih.gov/mim/index.jsp). The arc terminated with the short bar in this map represents repression of p53 degradation by the complex MDM2-p19ARF. The same function as this repression of p53 degradation is involved in the HFPN model of Fig. 2, while this function of repression is not modeled using such symbol of arc for repression. That is, the HFPN model of Fig. 2 realizes this repression by transition T 4 in Fig. 2, to which two arcs from places p53(N) and MDM2 p19ARF connect, and from which an arc to place p53 MDM2 p19ARF connects. It is easily seen that this complex formation of p53-MDM2-p19ARF prevents the protein p53 from degrading if we notice the fact that p53 is degraded through the complex formation with protein MDM2. Hence, the HFPN of Fig. 2 involves more exact description than Fig. 6 about the interactions of proteins p53, MDM2, and p19ARF. Note that the information interpreted into the HFPN model in Fig. 2 has been extracted by human experts from the literature and it involves various biological facts as mentioned in section of HFPN model. However, such information is not fully included in these molecular interaction maps. DISCUSSION AND CONCLUSION Through simulation, we discussed whether the complex p53-MDM2-p19ARF has transcriptional activity for genes Bax and MDM2 or not. 3
It is not clear from [Pomerantz et al., 1998] whether protein p53 forms tetramer or remains as monomer.
140
A. Doi et al. / Simulation-Based Validation of the p53 Transcriptional Activity with Hybrid Functional Petri Net
Fig. 6. Molecular interactions of proteins p53, MDM2, and p19ARF described by Kohn [Kohn, 1999] based on descriptions in [Pomerantz et al., 1998].
Pomerantz et al. [Pomerantz et al., 1998] provided the result that protein p53 accumulates in a HeLa cell after injecting gene p19ARF in the cell. In addition, they reported that protein p19ARF injection increases protein p53 and the amount of products which are activated by protein p53. Figure 3(4) shows an observation corresponding to that by Pomerantz et al. [Pomerantz et al., 1998]. In this figure, the complex p53-MDM2-p19ARF concentration keeps high level due to the complex formation between p53, MDM2 and p19ARF, which allows protein p53 to escape from its ubiquitination by protein MDM2. The increase of protein p53 (the part of p53-MDM2-p19ARF complex) coincides with the experimental observation reported in [Pomerantz et al., 1998]. However, for gene Bax, which is activated by protein p53, the simulation result shows low expression while the gene expression of Bax is high in [Miyashita and Reed, 1995]. Recall that we have assumed that the complex p53-MDM2-p19ARF has no transcriptional activity for genes Bax and MDM2. This means that, in the HFPN simulation model, only protein p53 in the nucleus (p53(N)) can activate Bax and MDM2. On the other hand, the simulation result of Fig. 3(5) shows high expression of gene Bax as well as a certain amount of concentration of protein p53, being consistent with the experimental observation in the literature [Pomerantz et al., 1998]. Note that the increase of p53-MDM2-p19ARF complex promotes not only the expression of gene Bax but also the production of protein MDM2 in the nucleus. The increased MDM2 accelerates the decomposition of p53 exported to the cytoplasm from the nucleus. Thereby a lower concentration of Fig. 3(5) than that of Fig. 3(4) is induced. The simulation results suggested that protein p53 should have the transcriptional activity in the forms of the trimer of proteins p53, MDM2, and p19ARF. FURTHER REMARKS Most existing pathway databases present many biological maps of molecular interactions. However, in order to conduct simulations based on these maps, more biological facts such as reaction speeds of complex formation and protein degradations have to be included to these maps. This means that we have to reconstruct computational pathway maps for simulations after careful reading of papers of the interest. In other words, these pathway databases have not been constructed on the assumption that pathways included in them will be used for simulations. The HFPN model of the complex p53-MDM2-p19ARF has been constructed based on biological knowledge being extracted by careful reading of the literature. Of course, with no proof by biological experiments, we could not conclude that the complex p53-MDM2-p19ARF has transcriptional activity for genes Bax and MDM2. However, without the help of simulations, it is hard to get insights into the
A. Doi et al. / Simulation-Based Validation of the p53 Transcriptional Activity with Hybrid Functional Petri Net
141
systematic behavior of the genes and proteins forming the complex p53-MDM2-p19ARF, as demonstrated in this paper. Simulations can reduce the number of biological experiments and save the costs from both sides of expense and time. As demonstrated in this paper, the description of biological pathways with HFPN allows the biological pathways to be simulated directly, since HFPN includes dynamic elements (transitions) at which reaction speeds are assigned as well as static elements (places) which represent the states of substances such as concentration. We have developed the automatic conversion systems of biological pathways in KEGG and TRANSPATH into HFPN models [Nagasaki et al., 2004]. By incorporating dynamic parameters such as reaction speeds of translation and complex formation from the knowledge of biologists and/or information in the literature into the converted HFPN models, the HFPN models become simulatable on Cell Illustrator (http://www.fqspl.com.pl/?a=product view&id=20&lang=en). We have recently developed a new biological pathway description format in XML called Cell System Markup Language (CSML) (http://www.csml.org/). By using the CSML and this conversion system, we are now working on the construction of a simulatable pathway database. ACKNOWLEDGEMENTS This work is partially supported by the Grand-in-Aid for Scientific Research on Priority Areas 17014067 and 17017007 from the Ministry of Education, Culture, Sports, Science and Technology in Japan. REFERENCES • Alla, H. and David, R. (1998). Continuous and hybrid Petri nets. J. Circuits, Systems, and Computers 8, 159-188. • Barak, Y., Juven, T., Haffner, R. and Oren, M. (1993). mdm2 expression is induced by wild type p53 activity. EMBO J. 12, 461-468. • Doi, A., Nagasaki, M., Fujita, S., Matsuno, H. and Miyano, S. (2003). Genomic Object Net: II. Modelling biopathways by hybrid functional Petri net with extension. Appl. Bioinformatics 2, 185-188. • Doi, A., Fujita, S., Matsuno, H., Nagasaki, M. and Miyano, S. (2004). Constructing biological pathway models with hybrid functional Petri net. In Silico Biol. 4, 0023. • el-Deiry, W. S. (1998). Regulation of p53 downstream genes. Semin. Cancer Biol. 8, 345-357. • Fujita, S., Matsui, M., Matsuno, H. and Miyano, S. (2004). Modeling and simulation of fission yeast cell cycle on hybrid functional Petri net. IEICE Trans. Fundamentals E87-A, 2919-2928. • Honda, R., Tanaka, H. and Yasuda, H. (1997). Oncoprotein MDM2 is a ubiquitin ligase E3 for tumor suppressor p53. FEBS Lett. 420, 25-27. • Iwakuma, T. and Lozano, G. (2003). MDM2, An Introduction. Mol. Cancer. Res. 1, 993-1000. • Kamijo, T., Weber, J. D., Zambetti, G., Zindy, F., Roussel, M. F. and Sherr, C. J. (1998). Functional and physical interactions of the ARF tumor suppressor with p53 and Mdm2. Proc. Natl. Acad. Sci. USA 95, 8292-8297. • Kanehisa, M. and Goto, S. (2000). KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27-30. • Kohn, K. W. (1999). Molecular interaction map of the mammalian cell cycle control and DNA repair systems. Mol. Biol. Cell. 10, 2703-2734. • Krull, M., Voss, N., Choi, C., Pistor, S., Potapov, A. and Wingender, E. (2003). TRANSPATH : an integrated database on signal transduction and a tool for array analysis. Nucleic Acids Res. 31, 97-100. • Matsui, M., Fujita, S., Suzuki, S., Matsuno, H. and Miyano, S. (2004). Simulated cell division processes of the xenopus cell cycle pathway by Genomic Object Net. J. Integrative Bioinfomatics, 0003. • Matsuno, H., Doi, A., Nagasaki, M. and Miyano, S. (2000). Hybrid Petri net representation of gene regulatory network. Pac. Symp. Biocomput. 5, 338-349. • Matsuno, H., Murakami, R., Yamane, R., Yamasaki, N., Fujita, S., Yoshimori, H. and Miyano, S. (2003a). Boundary formation by notch signaling in Drosophila multicellular systems: experimental observations and a gene network modeling by Genomic Object Net. Pac. Symp. Biocomput. 8, 152-163.
142
A. Doi et al. / Simulation-Based Validation of the p53 Transcriptional Activity with Hybrid Functional Petri Net
• Matsuno, H., Tanaka, Y., Aoshima, H., Doi, A., Matsui, M. and Miyano, S. (2003b). Biopathways representation and simulation on hybrid functional Petri net. In Silico Biol. 3, 0032. • Miyashita, T. and Reed, J. C. (1995). Tumor suppressor p53 is a direct transcriptional activator of the human bax gene. Cell 80, 293-299. • Nagasaki, M., Doi, A., Matsuno, H. and Miyano, S. (2003). Genomic Object Net: I. A platform for modeling and simulating biopathways. Appl. Bioinformatics 2, 181-184. • Nagasaki, M., Doi, A., Matsuno, H. and Miyano, S. (2004). Integrating biopathway databases for large-scale modeling and simulation. In: Conferences in Research and Practice in Information Technology, Vol. 29 – Bioinformatics, Chen, Y.-P. P. (ed.), Australian Computer Society, pp. 43-52. • Oliner, J. D., Pietenpol, J. A., Thiagalingam, S., Gyuris, J., Kinzler, K. W. and Vogelstein, B. (1993). Oncoprotein MDM2 conceals the activation domain of tumor suppressor p53. Nature 362, 857-860. • Pomerantz, J., Schreiber-Agus, N., Liegeois, N. J., Silverman, A., Alland, L., Chin, L., Potes, J., Chen, K., Orlow, I., Lee, H. W., Cordon-Cardo, C. and DePinho, R. A. (1998). The Ink4a tumor suppressor gene product p19Arf interacts with MDM2 and neutralizes MDM2’s inhibition of p53. Cell 92, 713-723. • Tao, W. and Levine, A. J. (1999). P19ARF stabilizes p53 by blocking nucleo-cytoplasmic shuttling of Mdm2. Proc. Natl. Acad. Sci. USA 96, 6937-6941. • Zhang, Y., Xiong, Y. and Yarbrough, W. G. (1998). ARF promotes MDM2 degradation and stabilizes p53: ARF-INK4α locus deletion impairs both the Rb and p53 tumor suppression pathways. Cell 92, 725-734. • Zhang, Y. and Xiong, Y. (1999). Mutation in human ARF exon 2 disrupt its nucleolar localization and impair its ability to block nuclear export of MDM2 and p53. Mol. Cell 3, 579-591. • Zhang, Y. and Xiong, Y. (2001). Control of p53 ubiquitination and nuclear export by MDM2 and ARF. Cell Growth Differ. 12, 175-186.
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2006, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-143
143
Analyzing Stationary States of Gene Regulatory Network Using Petri Nets Anna Gambin, Sławomir Lasota and Michał Rutkowski ∗ Institute of Informatics, Warsaw University, Banacha 2, 02-097 Warszawa, Poland
ABSTRACT: We introduce and formally define the notion of a stationary state for Petri nets. We also propose a fully automatic method for finding such states. The procedure makes use of the Presburger arithmetic to describe all the stationary states. Finally we apply this novel approach to find stationary states of a gene regulatory network describing the flower morphogenesis of A. thaliana. This shows that the proposed method can be successfully applied in the study of biological systems. KEYWORDS: Petri net, stationary state, gene regulatory network, Presburger arithmetic, flower morphogenesis, Arabidopsis thaliana
INTRODUCTION Recent research in the field of molecular biology yields large amounts of data. Analysis of that data brought much better understanding of the processes that govern the life cycle of the cells. At the base of all these processes lie chemical reactions. A major drawback of experiments conducted in laboratories is that they are time and money consuming. For this reason computer analysis is very promising. Petri nets is a mathematical formalism used in modeling concurrent and distributed systems. The theory has been studied for over 40 years and is well developed. For a biologist Petri nets might be of interest because their graphical representation is very similar to that of chemical systems [see Peleg et al., 2005; Hardy and Robillard, 2004]. Moreover, a stochastic extension of Petri nets defines the same stochastic process as the one defined by the Chemical Master Equation, which is commonly used to describe the dynamics of chemical systems. This implies that apart from numerical analysis and simulations, one might derive conclusions about a chemical system, using the theory of Petri nets. Some successful application of stochastic Petri nets is reported in [Goss and Peccoud, 1998]. One of central goals of contemporary molecular biology is to understand the cellular processes described by time series of thousands of gene expression measurements. The interactions between genes can be represented in a graphical form as a gene regulatory network. Some previous work on applications of Petri nets to regulatory networks can be found in [Matsuno et al., 2000]. The crucial task in modeling gene regulation is the analysis of stationary states of the network. Multistationarity (i.e. the property of systems whose structure induces two or more distinct steady states) can account for ∗
Corresponding author. E-mail:
[email protected].
144
A. Gambin et al. / Analyzing Stationary States of Gene Regulatory Network Using Petri Nets
epigenetic differences also those involved in cell differentiation [Thomas and Kaufman, 2001a & 2001b]. Another interesting approach to the stationary analysis based on structural properties of the interaction system can be found in [Soul, 2003]. In papers [Mendoza and Alvarez-Buylla, 1998; Mendoza et al., 1999], Boolean networks were used as a model, but no automatic procedure was proposed for analysis of stationary states. Sophisticated algorithm for recovering the stationary states for boolean networks was proposed in [Gat-Viks et al., 2004]. However, the applicability of this algorithm is restricted to very simple models. The Petri net approach was applied in [Voss et al., 2003], where authors exploit the well known notion of S/T-invariants. In contrast to previous methods we propose a fully automatic effective procedure for analyzing the stationary states of the network. Our definition of the notion of stationarity reflects the chemical equilibrium of the system that underlies the cell differentiation process. In the rest of this section we define the notion of Petri net and introduce some basic concepts that we use in the sequel. Next, we formally define the notion of stationary state and introduce a concise Presburger arithmetic formula that describes all such states. Finally, we show how this formal approach can be applied to find stationary states of the gene regulatory network describing the flower morphogenesis of A. thaliana. Basic definitions A Petri net consists of places and transitions. Transitions specify the dynamics of the network, that is how tokens move from one place to another. A special function, referred to as the weight function defines the quantitative aspect of the networks behavior. Firing of a transition changes the state of the network in accordance to the weight function. The process induced by the network consists of sequentially fired transitions. Symbols N and N+ denote sets of non-negative and positive integers, respectively. Definition 1. (Petri net): The network is defined as a five element tuple, N = (S , T , F , W , M 0 ), where: 1. 2. 3. 4. 5.
S is a set of places. T is a set of transitions. F ⊆ (S × T ) ∪ (T × S) defines the networks topology. W : F → N+ is the weight function. M0 : S → N is the initial marking (state).
Petri nets, when applied in modeling biochemical systems can be viewed in the following way: the set of places S specifies the entities involved (e.g., chemical molecules), the set of transitions T specifies the possible interactions (e.g., chemical reactions) between these entities and relation F together with the weight function W specify which and how many entities are consumed and produced by these interactions. Let’s consider a simple chemical reaction of the form nA + mB + k C → y D + xE. The entities involved here, are molecules of types A, B, C, D and E. There is only one possible interaction, the above stated chemical reaction, let’s call it R. Figure 1 depicts an appropriate Petri net. There are three arcs ending in R and two which originate in R. The arcs that end in R lead from the places representing substrates whereas the arcs that originate in R lead to places representing reaction products. These arcs make up the F relation. The labels on these arcs correspond to values, assigned to them by the weight function W and are equal to the coefficients from the reaction. We have already constructed a network for our reaction. We have places representing reactants and a transition representing the reaction. These elements make up the structure of the net, they define the
A. Gambin et al. / Analyzing Stationary States of Gene Regulatory Network Using Petri Nets
145
Fig. 1. A Petri net representing a chemical reaction of the form nA + mB + kC → yD + xE. Circles and squares represent places and transitions, respectively.
static aspects of the system. We still need a mean to express the dynamical properties of the system. For that purpose we introduce the notion of network marking. A marking is a function that defines the state of the system, that is the quantity of each of the reactants. The initial marking M 0 defines the initial state, and that state undergoes changes as the interactions take place. Definition 2. (Network marking): For a given network N , a function M : S → N will be called the marking of the network. The definitions that follow, formally describe the mechanics of the network. In a given state, available transitions fall into two separate categories, active and inactive. Only an active transition can be fired. Firing of a transition results in change of current state. For ease of expressing that change we introduce the notion of transition function. Definition 3. For a given network N , and an element e ∈ T ∪ S the following two sets are introduced: 1. •e = {x : (x, e) ∈ F } – set of predecessors. 2. e• = {x : (e, x) ∈ F } – set of successors. Definition 4. (Active transition): For a given network N and its marking, M a transition t ∈ T is considered active if the following condition is met: ∀s ∈ •tW (s, t) M (s),
denoted by M [t >. Definition 5. (Transition function): For every transition t one can define a corresponding transition function t: S → N as follows: ⎧ −W (s, t) s ∈ •t\t• ⎪ ⎪ ⎨ W (t, s) s∈t•\•t t(s) = W (t, s) − W (s, t) s ∈ •t ∩ t• ⎪ ⎪ ⎩ 0 s ∈ •t ∪ t•
146
A. Gambin et al. / Analyzing Stationary States of Gene Regulatory Network Using Petri Nets
Fig. 2. An example of a Petri net. Circles and squares represent places and transitions, respectively. Dots within the circles, specify the marking of the network. The current marking is M0 = (2, 1, 1, 1, 0). An arc, unless labeled differently, is assumed to have a label of 1.
Definition 6. (Firing a transition): For a given network N and its marking M , firing an active transition t changes the marking of the network to a new marking M (denoted by M [t > M ), obtained from the old one according to the formula M = M + t (+ denotes vector addition in N S ). We also write M [t > when the successor marking M is irrelevant. In the example in Fig. 1, the transition R is active only if the marking M satisfies the following condition: M (A) n, M (B)m, M (C) k
that is, when there is enough substrates for the reaction to take place. The transition function for R is defined as follows: R(A) = −n, R(B) = −m, R(C) = −k, R(D) = y, and R(E) = x.
A more complex example is depicted in Fig. 2. A dotted line marks active transitions, that is t 1 , t2 and t3 . Figure 3 shows the net from Fig. 2, after firing of transition t 2 .
A. Gambin et al. / Analyzing Stationary States of Gene Regulatory Network Using Petri Nets
147
Fig. 3. The Petri net from Fig. 2 after transition t2 was fired. The new marking is M = (3, 1, 0, 1, 0).
Cases Some transitions are independent of each other and instead of being fired sequentially, they could be fired concurrently. Moreover, even if they affect the same places, there are cases when a marking allows for them being fired concurrently. We will introduce the notion of a case, to formally define this intuitive notion. Definition 7. (Non-conflicting transitions): For a given network N and its marking M , a set of transitions A = {t1 , . . . ,tm } ⊂ T is said to be non-conflicting in marking M if for each permutation σ of {1, . . . , m}, ∀1i<m (M + tσ(1) + . . . + tσ(i) )[tσ(i+1) .
Intuitively, non-conflicting transitions can be fired in any order. This is of importance when modeling chemical reactions, because in real world situations many chemical reactions take place concurrently. Fact 1. (Uniqueness): Let N be a net with a marking M and A be a non-conflicting set, then the marking M reached by sequential firing of transitions from A is independent of the order in which these transitions were fired.
148
A. Gambin et al. / Analyzing Stationary States of Gene Regulatory Network Using Petri Nets
Proof: Directly from the definition of a non-conflicting set and from the fact that finite addition is commutative. Definition 8. (Case in marking M ): For a given net N and its marking M , every maximal non-conflicting subset of transitions will be called a case in M. Fact 2. Every case in M contains only transitions active in M. Definition 9. (Case firing): Let P = {t1 , . . . , tk } be a case of net N in marking M . Firing of case P , denoted M [P M , is defined as firing of all transitions from P , in an arbitrary order. Once we formally defined the notion of case, we can use the case semantics of Petri nets. The case semantics is a more coarse grained approach to the behavior of the network, which now consists of sequential firing of cases. From now on, when referring to the Petri net semantics, unless stated otherwise, we will mean the case semantics. The case semantics is introduced to model the concurrency of chemical reactions. A case is a set of reactions that are potentially independent and can take place at the same time. It should be noted that cases of the network are not a its static property, because they depend on the current marking. The net shown on Fig. 2 has exactly one case, that is P = {t 1 , t2 , t3 }. As a result of case P being fired, the marking of the net is left unchanged. METHODS One central issue that arises when studying gene regulatory networks is identifying their stationary states. A stationary state can be defined in a variety of ways, however in this article, the term mean a steady state i.e. the “attractor” of the system (similar concepts are considered in [Thomas and Kaufman, 2001a; Soul´e, 2003]. That is, when a system is in a stationary state, potentially many chemical reactions may take place, but the concentrations of different substrates stay unchanged. Such a situation occurs when products of one set of reactions are immediately consumed as substrates in another set of reactions. The notion of a stationary state is of great importance in biology. Cells achieve stationary states as a result of their differentiation process. It determines the type and the function of the cell. Changes in the cell’s chemical environment may lead to a change of its stationary state. A known example is the behavior of E. coli when its environment is supplied with lactose [Stryer, 1995]. Presence of lactose causes a rapid change in the cell’s stationary state, making it possible for E. coli to exploit lactose as an energy source. Since the stationary states of a cell may change as an effect of some external factors we must assume that the cells behavior changes over time. A way of determining the possible stationary states of a chemical system might yield answers to fundamental questions in medicine and biology. It would be very promising if one could check if a cell can transform into a dangerous type (e.g. cancer) or if one could modify the cells chemical system in such a way, that it would never reach a dangerous stationary state. Moreover when studying new organisms a lot of useful information could be obtained if one could identify the possible stationary states of the cells. In this paper we propose a method that might be very useful in this field. Below we formally define the notion of a stationary state in terms of Petri nets. Moreover we propose a method for finding such states based on a static analysis of the structure of the net.
A. Gambin et al. / Analyzing Stationary States of Gene Regulatory Network Using Petri Nets
149
In our model, a transition represents a chemical reaction. In chemical systems, usually more than one reaction takes place over a given period of time. To model the concurrency of a chemical system, we will use the notion of a case as introduced in Definition 8. So instead of firing one transition at a time we will fire one case at a time. In this approach a stationary state can be viewed as a special marking which is invariant with respect to cases of that marking. Definition 10 formalizes this notion. Definition 10. (Stationary state): A marking M of a given net N is said to be stationary if for every case P in this marking we have M [P M . One must realize that the concept of a stationary state as introduced above is only a mathematical model of a real world situation. The complexity of the model influences the computational complexity: the more realistic the model is, the harder it is to analyze. Our Definition 10 is only an approximation of what we observe in reality; however we think it might be useful. The intuition behind Definition 10 is as follows: transitions correspond to chemical reactions and cases correspond to maximal groups of reactions that can take place simultaneously. We assume that reactions always take place in groups (case firing). With this in mind, a stationary state is a state that is left unchanged no matter what group of reactions takes place. This reflects the concept of a chemical equilibrium, reactions take place, but the quantities of reactants are left unchanged. Presburger arithmetic Once we have defined the notion of a stationary state in terms of Petri nets, we need a tool to find and test such states. We propose an approach based on Presburger arithmetic formulae. For a given network we describe a way to construct a formula with free variables s 1 , s2 , . . . that correspond to places of the network. Such a formula is satisfied if and only if the values assigned to the variables form a stationary state. Hence, the satisfying valuations of the variables s 1 , s2 , . . . correspond to all the stationary states. We do not present any specific algorithm for constructing the formula, only an idea is given. However the steps described here can be easily translated into an efficient automated procedure. Presburger arithmetics has been chosen since it has decidable validity problem (see, for instance, [Oppen, 1978]. Furthermore, the set of all valuations satisfying a given formula is a semi-linear set [Ginsburg and Spanier, 1966], hence admits a finite representation. In addition, this semi-linear set can be effectively constructed. In our setting this means that the set of all stationary states can be effectively computed, by combining construction of the Presburger formula presented below and solving this formula. We give some basic definitions first, and then we describe the formula for a given network. At the end of this section we introduce a variation of the semantics of Petri net and show that it is easy to extend the formula to match the modified semantics. Definition 11. (Presburger arithmetic): Presburger arithmetic is first order theory of natural number with addition. The signature is (N, +, 0, 1, =) where “+” is a two argument function, “0”, “1” are constants and “=” is the equality predicate. The formulae are built by first-order quantification (∀ x φ, ∃x φ) and boolean connectives (∧, ∨, ¬, etc.) from the atomic equality predicates. Remark 1. The inequality relation can be defined in Presburger arithmetic as: x < y ⇔ ∃z y = x + z + 1,
150
A. Gambin et al. / Analyzing Stationary States of Gene Regulatory Network Using Petri Nets
and the non-strict inequality as: x y ⇔ ∃z y = x + z.
Remark 2. Presburger arithmetics contains no multiplication (adding this operator would make the validity problem undecidable). But we can still speak about multiplication by a constant. For c ∈ N, we will use a shorthand: c ∗ x = x + . . . + x . c times
Logical description of stationary states Let N = (S , T , F , W , M0 ) be a fixed Petri net in the sequel. Without loss of generality we assume S = {s1 , . . . , sm } and T = {t1 , . . . , tn }, where m = |S| and n = |T | are cardinalities of S and T , respectively. Below we show how to construct a formula with free variables s 1 , . . . , sm such that for a given valuation v the formula is satisfied if and only if corresponds to a stationary state. We allow ourselves to use the same symbol s i ∈ S , depending on the context, either to denote the place of , or as a variable in the formula that corresponds to that place. Similarly for transitions t j ∈ T . We hope that such an overloading of symbols does not lead to confusion but improves readability. A straightforward approach is to directly translate the definition of a stationary state to a Presburger arithmetic formula. Unfortunately such an approach yields a formula of exponential size. But a more sophisticated approach, proposed below, allows one to construct a formula of polynomial size. Definition 12. For a given place s ∈ S the following two sets are defined: • s− = {t ∈ T |t(s) < 0}, • s+ = T \s− .
Set s− contains the transitions with negative balance with respect to the place s, and s + contains the remaining transitions. Lemma 1. For a given net N and its marking M , a set A ⊆ T is non-conflicting if and only if the following condition is met, for each s ∈ S and t ∈ A: M (s) + t (s) W (s, t). t ∈(s− ∩A)\{t}
Proof: For a non-conflicting set A, transitions can be fired sequentially, regardless of the order of firing. In particular, if we choose some t ∈ A and a place s ∈ S we can enforce the following order of transition firing: (s− ∩ A)\{t}, t, (A\s− )\{t}
In other words, first we fire the transitions with a negative balance (except for the chosen one) in any order, then the chosen one, and at last, the remaining transitions in any order. The condition from the lemma states, that when we are to fire t, we can do so with respect to s. But since A is non-conflicting,
A. Gambin et al. / Analyzing Stationary States of Gene Regulatory Network Using Petri Nets
151
this condition is met. Since the choice of s and t was arbitrary, the condition is met for all pairs of s and t. Now let’s suppose that the condition from the lemma is met. In that case for any chosen s and t, the following is true: M (s) + (W (t , s) − W (s, t )) W (s, t). (1) t ∈(A∩s− )\{t}
The value of the left-hand side of Eq. (1) is the smallest reachable marking of s, before t is fired, when firing the transitions from A in marking M . Since the choice of s was arbitrary, t will be fireable regardless of the firing order of transitions in A. And, since the choice of t was also arbitrary, this reasoning applies to all transitions in A. Therefore A is non-conflicting. To avoid the exponential growth of the size of formula, we use the condition from Lemma 1 to express that a set is non-conflicting. But instead of writing a single sub-formula for every possible subset of T , we quantify over variables t 1 , . . . , tn , which correspond to transitions in T . There are 2 n possible valuations of these variables and each valuation represents a different subset A ⊆ T . That is t i = 1 ⇔ ti ∈ A. Note that, depending on the context, t i and sj denote either variables or elements of the net. A condition for a marking to be a stationary is expressed by the following stationary state(s 1 , . . . , sm ) formula: ∀t1 ,...,tn (t1 1 ∧ . . . ∧ tn 1) ∧ case (s1 , . . . , sm , t1 , . . . , tn ) ⇒ balance (t1 , . . . , tn ).
(2)
Below we use ∧
1im
p(si ) and
∧
1jn
p(tj )
as a shorthand for p(s 1 ) ∧ . . . ∧ p(sm ) and p(t1 ) ∧ . . . ∧ p(tn ), respectively. For instance, Eq. (2) can be rewritten as: ∀t1 ,...,tn (3) ∧ tj 1 ∧ case (s1 , . . . , sm , t1 , . . . , tn ) ⇒ balance (t1 , . . . , tn ). 1jn
The subformula balance (t 1 , . . . , tn ) is standing for: ∧
1im
(t1 (si ) ∗ t1 + . . . + tn (si ) ∗ tn = 0).
(4)
The intuition for balance (t1 , . . . , tn ) is that transitions belonging to the set determined by the valuation of t1 , . . . , tn must have balance equal to 0. The size of the formula is proportional to |S||T |. Note that tj (si ) may be of negative value. However this is not a problem because the negative values can be moved to the right-hand side of the equality predicate. The formula case (s1 , . . . , sm , t1 , . . . , tn ) is more complicated. Its intended meaning is that in the marking determined by valuation of variables s 1 , . . . , sm , the set of transitions determined by valuation of variables t1 , . . . , tn is a case. To guarantee that the latter set is maximal, we introduce additional variables: t1 , . . . , tn ∈ {0, 1}, such that for i ∈ {1, . . . , n}, t i ti . The valuation of the newly
A. Gambin et al. / Analyzing Stationary States of Gene Regulatory Network Using Petri Nets
152
introduced variables determines a subset of T , which includes the subset determined by the valuation of t1 , . . . , tn . The formula for case(s1 , . . . , sm , t1 , . . . , tn ) is given by: non conflicting (s1 , . . . , sm , t1 , . . . , tn )∧ ∧ tj 1 ∧ ∨ ∀t1 ,...,tn ∧ tj tj ∧ 1jn
¬non conflicting
1jn
1jn
(s1 , . . . , sm , t1 , . . . , tn )
tj <
tj
⇒
(5)
.
To present the formula non conflicting (s 1 , . . . , sm , t1 , . . . , tn ) we need a special ordering over the variables corresponding to transitions. For every place s ∈ S we assume the permutation σ s of {1, . . . , n} such that in the induced ordering of transitions tσs (1) , . . . , tσs (n) ,
transitions with a negative balance with respect to s, precede the remaining transitions. Formally, this condition can be expressed by: tσs (1) , . . . , tσs (|s− |) ∈ s− .
(6)
The formula for non conflicting (s 1 , . . . , sm , t1 , . . . , tn ) is based on Lemma 1, as follows: ∧
∧
(7)
1im1jn
⎛
⎞
⎜ ⎟ ⎜si + tσ (1) (si ) ∗ tσ (1) + . . . + t (si ) ∗ tσs (|s− |) −tj (si ) ∗ tj W (si , tj ) ∗ tj ⎟ si si σsi (|s− |) ⎝ ⎠. i i i
if tj ∈s− i
It is easy to see that the length of the Eq. (7) is proportional to |S||T | 2 . This parsimony was achieved thanks to Lemma 1. The complexity of the whole stationary state(s 1 , . . . , sm ) formula is therefore proportional to |S||T |2 as well. Petri nets with bounds The above reasoning was true for a general type of Petri nets. In some cases however, the use of modified Petri nets is necessary. In particular this is the case in modeling gene regulatory networks, when the level of gene expression is bound and discrete. To model such situations we slightly modify the classical semantics of Petri nets, by introducing the notion of bounds. We also show that it is easy to alter the previously introduced formulae to allow for the changes in the semantics. The necessary change in semantics will be accomplished by introduction of the bounding function, L : S → N, which specifies the upper bounds, imposed on places. At any time, the following condition must be met by each place s: M (s) L(s)
(8)
Once the bounds are introduced, an issue arises how does the net behave when the bound is reached in one or more output places of a case. One option is to consider the case inactive, while the alternative is
A. Gambin et al. / Analyzing Stationary States of Gene Regulatory Network Using Petri Nets
153
Fig. 4. Gene regulatory network of A. thaliana flower morphogenesis. Pointed arrows reflect an activation relationship, whereas the blunt arrows reflect a repression relationship between genes. For instance EMF1 activates TFL1, but at the same time it is a repressor of LFY.
to ignore the possible overflow. We adopt the latter option. To be more specific, when firing a case, we fire all the transitions from that case, as if there was no bound, and after that we remove the overflow. All the definitions from previous sections hold also for the newly introduced semantics of the net. However we must slightly modify Eq. (4) and add an extra condition for variables s 1 , . . . , sm , so that the possible markings do not exceed the bounds. In ordinary networks the state was unaffected by firing of a case only when its balance was equal to 0. This is also true for nets with bounds, but the state can also remain unaffected, when the balance is non zero and place bounds are reached. The modified version of Eq. (4), referred to as balance (s 1 , . . . , sm , t1 , . . . , tn ), is as follows: ∧
1im
(t1 (si ) ∗ t1 + . . . + tn (si ) ∗ tn = 0) ∨ (t1 (si ) ∗ t1 + . . . + tn (si ) ∗ tn > 0 ∧ si = L(si )) (9)
RESULTS In the previous sections we introduced the notion of a stationary state and we also proposed a way to describe and analyze it mathematically. In this section we use an example of A. thaliana to demonstrate the practical application of these methods. Flower morphogenesis of A. thaliana The gene regulatory mechanism, that lay behind the flower morphogenesis of Arabidopsis thaliana are broadly studied by biologists. The mechanisms that control the differentiation of the cells are well understood. Extensive research allowed for construction of a gene regulatory network that depicts the mechanism governing this process (see Fig. 4). For this reason the model of A. thaliana is commonly used to test new mathematical formalisms for identifying stationary states [see Mendoza and Alvarez-Buylla, 1998; Mendoza et al., 1999; Espinosa-Sotoa et al., 2004]. During the flower morphogenesis the cells are subject to a differentiation. This biochemical process ends when a chemical equilibrium is reached. The type of the equilibrium determines the type of the cell. The A. thaliana flower is build of four concentric whorls: sepals, petals, stamens and carpels. Research has shown that cells building up different parts of the flower have different types of chemical equilibrium.
154
A. Gambin et al. / Analyzing Stationary States of Gene Regulatory Network Using Petri Nets
Fig. 5. The ABC model. The expression of gene AP1 is responsible for the activity A, expression of genes AP3, PI is responsible for the activity B and expression of gene AG is responsible for the activity C.
Fig. 6. Gene A activates or represses gene B, as shown by the arrow in the upper part of the left- and right-hand side diagram, respectively. The bottom parts show how this is expressed in terms of Petri nets. Genes are denoted by places and the activation relation is denoted by a proper transition.
The ABC model is broadly used to describe these different equilibria. According to the model there are three types of activities (A, B and C) and their combination determines the type of the cell. Each activity has genes associated with it. The activity is observable if in the equilibrium these genes are active. Figure 5 illustrates the connection between activities, genes and flower parts (compare with Mendoza et al., 1999). From gene regulatory network to a Petri net Figure 4 shows the graphical representation of the gene regulatory network underlying flower morphogenesis of A. thaliana. The vertices represent genes and arrows represent the regulatory relations. In the corresponding Petri net genes are represented by places and regulatory relations by transitions. Figure 6 present the rules for translating an ordinary regulatory network to a Petri net. After applying these rules to the network from Fig. 4 we obtain a Petri net as the one shown in Fig. 7. Stationary state analysis This section presents an analysis of the stationary states of A. thaliana flower cells using Petri nets, which have a uniform bound of 1 on all places (see section “Petri nets with bounds”). This models genes being either active (1) or inactive (0). First we discuss the genes involved in the morphogenesis and later we show two different approaches to the analysis of the gene regulation network. Following [Mendoza and Alvarez-Buylla, 1998; Mendoza et al., 1999] we assume that the topology of the network is given. The first approach is to translate the network to the Petri net and then find all the stationary states. In the second approach we restrict the set of genes under question, translate to a smaller Petri net and find all the stationary states in that net. The first approach requires no preprocessing, but heavy post-processing is needed. The second approach requires one to restrict the set of analyzed genes, but yields results that require virtually no post-processing. Moreover, the first approach can lead to loss of some significant information due to lack of information on gene activity times. This is not the case in the second approach were the set of genes is by definition restricted to non-temporary ones. In this refined approach we consider the network model “from the developmental biology piont of view” and highlight only the subset of interactions occuring in given
A. Gambin et al. / Analyzing Stationary States of Gene Regulatory Network Using Petri Nets
155
Fig. 7. The result of translating the network shown in Fig. 4 to a Petri net subject to rules shown on Fig. 6.
nuclei in the particular time frame. Such an analysis is contrasted with “the functional genomics point of view” when regulations responsible for the all developmental processes are considered at the same time. The most studied example of the developmental gene regulatory network that controls the specification of endoderm and mesoderm in the sea urchin embryo is presented in [Davidson et al., 2002]. The authors of [Mendoza and Alvarez-Buylla, 1998; Mendoza et al., 1999] use the following classification of the genes involved in the morphogenesis, with respect to times of their activity: 1. Genes active in the cells that are not part of the flower. During the morphogenesis they are active only in the very early stage. The genes are EMF1 and TFL1.
156
A. Gambin et al. / Analyzing Stationary States of Gene Regulatory Network Using Petri Nets Table 1 Stationary states for the full regulatory network of A. thaliana flower morphogenesis (see Fig. 7) EMF1 0 0 0 ? ? ? 0 0 0
TFL1 0 0 0 1 1 1 0 0 0
LFY 0 0 0 0 0 0 0 0 0
AP1 0 0 0 0 0 0 0 0 0
AG 1 1 1 0 0 0 0 0 0
LUG 0 0 0 0 0 0 1 1 1
UFO 0 0 1 0 0 1 0 0 1
AP3 1 0 1 1 0 1 1 0 1
PI 1 0 1 1 0 1 1 0 1
SUP 0 ? 0 0 ? 0 0 ? 0
PI 1 0 1 0
SUP 0 0 0 0
Symbol ‘?’ allows for both 0 and 1. Table 2 Stationary states from Table 1 (only genuine ones) EMF1 0 0 1 1
TFL1 0 0 1 1
LFY 0 0 0 0
AP1 0 0 0 0
AG 1 1 0 0
LUG 0 0 0 0
UFO 0 0 0 0
AP3 1 0 1 0
2. As the flower morphogenesis begins the expression of genes LFY and AP1 can be observed. The latter is also active in sepal cells of the flower. 3. Next, the genes LUG and UFO come to play, but their activity is only a part of the process and is not observed in the differentiated cells. 4. As the activity of the above mentioned genes fades away, the new genes responsible for the flower type – AG, AP3 and PI – become active. 5. At the end of the process the gene SUP is briefly active. The genes can be divided into two groups, the genes that are temporary (LFY, LUG, UFO and SUP) and genes that are an essential part of the final equilibria (EMF1, TFL1, AP1, AG, PI and AP3). This classification allows one to distinguish between genuine and false stationary states, as it is unlikely for temporary genes to be observed in genuine stationary states. The straightforward approach Table 1 shows the stationary states found for the Petri net from Fig. 7. For this type of network one can find the stationary states either manually, or automatically using our translation to the Presburger formulae, as described in Sections “Logical description of stationary states” and “Petri nets with bounds”. Table 2 shows only those states from Table 1 that are considered biologically significant (see the beginning of this Section). The identified stationary states can be easily attributed to the ones observed in the different flower cells (compare with Mendoza et al., 1999). The stationary state [0000100110] represents activities B and C (expression of genes AG, AP3 and PI), and is observed in the stamen cells. The stationary state [0000100000] represents activity C (only gene AG is expressed) and is observed in the carpel cells. [1100000000] is the last known stationary state, which is found in non-flowering cells. The stationary state [1100000110] was never observed in reality, however it might be of biological significance as mentioned in the beginning of the section “Stationary state analysis” (see also Mendoza et al., 1999).
A. Gambin et al. / Analyzing Stationary States of Gene Regulatory Network Using Petri Nets
157
Fig. 8. The simplified A. thaliana flower morphogenesis gene regulatory network.
Fig. 9. Petri net corresponding to the gene regulatory network in Fig. 8.
An advantage of the straightforward approach is that no preprocessing of the network is necessary. However, it produces many false-positives and many genuine states are not identified. For instance the states [0001000000] (equivalent to activity A that is observed in sepal cells) and [0001000110] (equivalent to activities A and B that are observed in petal cells). This implies that substantial post-processing is necessary. The refined approach The refined approach overcomes the weakness of the straightforward approach, although at the cost of some preprocessing. In addition the resulting network is simpler, and therefore the Presburger arithmetic formulae are much shorter. The idea is to remove the temporary genes from the analyzed network. So the simplified network consist only of genes EMF1, TFL1, AP1, AG, AP3 and PI, as shown in Fig. 8. Figure 9 shows the corresponding Petri net. The formula below is an instance of Eq. (3), for the subnetwork consisting of genes PI and AP3. The variable tAP I denotes the activation of PI by AP3 and variable t IP A denotes the activation of AP3 by PI. ⎛ ⎛ ⎜ ⎜ (P I 1 ∧ AP 3 1) ∧ ⎝∀tAP I ,tIP A ⎝(tAP I 1 ∧ tIP A 1) ∧
⎛
place bounds
possible transitions
⎛
⎜ ⎜ ⎝non conflicting (P I, AP 3, tAP I , tIP A ) ∧ ⎝∀tAP I ,tIP A tAP I 1 ∧ tIP A 1 ∧
non−conflicting set
primed variables denote the superset
A. Gambin et al. / Analyzing Stationary States of Gene Regulatory Network Using Petri Nets
158
Table 3 Stationary states found for the network in Fig. 9 EMF1 0 0 0 0 ? ?
TFL1 0 0 0 0 1 1
AP1 0 0 1 1 0 0
AG 1 1 0 0 0 0
AP3 1 0 1 0 0 1
PI 1 0 1 0 0 1
tAP I tAP I ∧ tIP A tIP A ∧ tAP I < tAP I ∨ tIP A < tIP A ⇒
superset definition
⎞⎞⎞
⎟⎟⎟
¬non conflicting P I, AP 3, tAP I , tIP A ⎠⎠⎠ ⇒
the superset cannot be non−conflicting
⎛ ⎝(tAP I = 0 ∨ (tAP I > 0 ∧ P I = 1)) ∧ (tIP A
balance for P I
⎞⎞ ⎟ = 0 ∧ (tIP A > 0 ∧ AP 3 = 1))⎠⎠
balance for AP 3
The bound on every place is 1. The subformula non conflicting(PI, AP3, t AP I , tIP A ) is given by: ((AP 3 tAP I ) ∧ (P I tIP A ) ∧ (P I 0) ∧ (AP 3 0)).
Table 3 shows the result of the analysis of the simplified network. All significant states were identified, and only two insignificant where found (gene TFL1 is active, but EMF1 is not). This proves empirically that the refined approach is more efficient than the straightforward one. DISCUSSION In this article we proposed a mathematical formalism for identification of stationary states in Petri nets. We also showed that this approach yields very promising results when applied to a real world example such as the regulatory network of A. thaliana flower morphogenesis. The proposed method for identifying stationary states uses Presburger arithmetic. The complexity 2pn [see Oppen, 1978] and the of present algorithms for testing these formulae is proportional to 2 2 pn problem has a lower bound of 2 2 as shown in [Fischer and Rabin, 1974]. However the formulae we have constructed have a relatively simple structure, with only two general quantifiers. This renders it highly probable that the complexity of computations may be reduced to a satisfactory level in our setting. Moreover, in the case of bounds equal to one, we get essentially a quantified boolean formula (QBF). It is an interesting question if symbolic techniques of effectively presenting and solving such formulae can be applied, for instance BDDs [Bryant, 1992]. Another possibility is to use existing QBF solvers (e.g., see Benedetti, 2005). This are the most promising ideas for further investigations, and could result in a very effective procedure for computing stationary states. Apart from logical description of stationary states, we also proposed two different approaches to studying gene regulatory networks, a straightforward one and a refined one. The latter was found to be more efficient, but requires a human expert intervention. The optimization of the algorithms for finding stationary states is an issue for further research.
A. Gambin et al. / Analyzing Stationary States of Gene Regulatory Network Using Petri Nets
159
ACKNOWLEDGEMENTS This work was supported by Polish Ministry of Education and Science grant KBN-8 T11F 021 28. REFERENCES • Bryant, R. E. (1992). Symbolic Boolean Manipulation with Ordered Binary Decision Diagrams. ACM Computing Surveys 24, 293-318. • Benedetti, M. (2005). sKizzo: a Suite to Evaluate and Certify QBFs. In: Proc. 20th International Conference on Automated Deduction (CADE05), Nieuwenhuis, R. (ed.), Springer LNCS 3632, pp. 369-376. • Davidson, E. H., Rast, J. P., Oliveri, P., Ransick, A., Calestani, C., Yuh, C. H., Minokawa, T., Amore, G., Hinman, V. Arenas-Mena, C., Otim, O., Brown, C. T., Livi, C. B., Lee, P. Y., Revilla, R., Rust, A. G., Pan, Z. J., Schilstra, M. J., Clarke, P. J. C., Arnone, M. I., Rowen, L., Cameron, R. A., McClay, D. R., Hood, L. and Bolouri, H. (2002). A genomic regulatory network for development. Science 295, 1669-1678. • Espinosa-Soto, C., Padilla-Longoria, P. and Alvarez-Buylla, E. R. (2004). A gene regulatory network model for cell-fate determination during Arabidopsis thaliana flower development that is robust and recovers experimental gene expression profiles. Plant Cell 16, 2923-2939. • Gat-Viks, I., Tanay, A. and Shamir, R. (2004). Modeling and analysis of heterogeneous regulation in biological networks. J. Comput. Biol. 11, 1034-1049. • Ginsburg, S. and Spanier, E. (1966). Semigroups, Presburger formulas and languages. Pacific J. Math. 16, 285-296. • Goss, P. J. and Peccoud, J. (1998). Quantitative modeling of stochastic systems in molecular biology by using stochastic Petri nets. Proc. Natl. Acad. Sci. USA 95, 6750-6755. • Fischer, M. J. and Rabin, M. O. (1974). Super-exponential complexity of Presburger arithmetic. Proc. SIAM-AMS Symposium in Applied Mathematics 7, 27-41. • Hardy, S. and Robillard, P. N. (2004). Modeling and simulation of molecular biology systems using petri nets: modeling goals of various approaches. J. Bioinform. Comput. Biol. 2, 595-613. • Matsuno, H., Doi, A., Nagasaki, M. and Miyano, S. (2000). Hybrid Petri net representation of gene regulatory network. Pac. Symp. Biocomput. 5, 338-349. • Mendoza, L. and Alvarez-Buylla, E. R. (1998). Dynamics of the genetic regulatory network for Arabidopsis thaliana flower morphogenesis. J. Theor. Biol. 193, 307-319. • Mendoza, L., Thieffry, D. and Alvarez-Buylla, E. R. (1999). Genetic control of flower morphogenesis in Arabidopsis thaliana: a logical analysis. Bioinformatics 15, 593-606. • Oppen, D. C. (1978). A 222pn upper bound on the complexity of Presburger arithmetic. J. Comput. System Sciences 16, 323-332. • Peleg, M., Rubin, D. and Altman, R. B. (2005). Using Petri Net tools to study properties and dynamics of biological systems. J. Am. Med. Inform. Assoc. 12, 181-199. • Soul´e, C. (2003). Graphic Requirements for Multistationarity. ComPlexUs 1, 123-122. • Stryer, L. (1995). Biochemistry, Freeman, New York. • Thomas, R. and Kaufman, M. (2001a). Multistationarity, the basis of cell differentiation and memory. I. Structural conditions of multistationarity and other nontrivial behavior. Chaos 11, 170-179. • Thomas, R. and Kaufman, M. (2001b). Multistationarity, the basis of cell differentiation and memory. II. Logical analysis of regulatory networks in terms of feedback circuits. Chaos 11, 180-195. • Voss, K., Heiner, M. and Koch, I. (2003). Steady state analysis of metabolic pathways using Petri nets. In Silico Biol. 3, 0031.
160
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2010, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-160
Cell Illustrator 4.0: A Computational Platform for Systems Biology Masao Nagasaki∗ , Ayumu Saito, Euna Jeong, Chen Li, Kaname Kojima, Emi Ikeda and Satoru Miyano Human Genome Center, Institute of Medical Science, University of Tokyo, Japan
ABSTRACT: Cell Illustrator is a software platform for Systems Biology that uses the concept of Petri net for modeling and simulating biopathways. It is intended for biological scientists working at bench. The latest version of Cell Illustrator 4.0 uses Java Web Start technology and is enhanced with new capabilities, including: automatic graph grid layout algorithms using ontology information; tools using Cell System Markup Language (CSML) 3.0 and Cell System Ontology 3.0; parameter search module; high-performance simulation module; CSML database management system; conversion from CSML model to programming languages (FORTRAN, C, C++, Java, Python and Perl); import from SBML, CellML, and BioPAX; and, export to SVG and HTML. Cell Illustrator employs an extension of hybrid Petri net in an object-oriented style so that biopathway models can include objects such as DNA sequence, molecular density, 3D localization information, transcription with frame-shift, translation with codon table, as well as biochemical reactions. KEYWORDS: Biopathway, simulation, Petri net, modeling, pathway database, ontology, CSML, CSO, CI, ODE, Cell Illustrator, Java, JWS
INTRODUCTION Systems Biology requires computational tools that enable us to understand and analyze complex biopathways. A strong need is a biology-oriented software with which biological scientists (users) can intuitively model and simulate complex dynamic interactions and processes in biopathways comprising of hundreds of entities within and among cells, e.g., gene regulatory networks, metabolic pathways, signal transduction pathways, and cell-cell interactions. To this aim, we started developing a software tool in 1999; the first version was published as Genomic Object Net 1.0 in 2002 [1,2], and it was later released under the name Cell Illustrator 1.0. This paper presents the new technologies and tools introduced to the latest version of Cell Illustrator 4.0 while discussing their impacts on biopathway modeling and simulation. For instance, Cell Illustrator 4.0 uses Java Web Start [3] and includes new pathway layout algorithms [4–7], formats {Cell System Markup Language (CSML) [8], Cell System Ontology (CSO) [9]} and tools related to pathway modeling and simulation. Cell Illustrator employs the concept of a hybrid Petri net [10,11] as the modeling method. We extended this concept for object-oriented style and developed its core architecture, hybrid functional ∗
Corresponding author. Tel.: +81 5449 5615; Fax: +81 5449 5442; E-mail:
[email protected].
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
161
Petri net with extension (HFPNe) [10,12], which is optimized to be suitable for biopathway modeling and simulation. HFPNe was introduced to handle any type of objects to match with the original Petri net concept. In HFPNe, the new elements generic entity and process are introduced to handle these objects (Fig. 1). Additionally, by using discrete and continuous entities and processes, HFPNe can handle the discrete and continuous events at once and any kind of functions can be assigned to the delay, weight and speed parameters of these elements. ODE can be easily modelled using the subset of HFPNe elements, i.e., continuous entity, continuous process, and setting all input connector weight parameters to be “nocheck”. For detailed formal definition and properties of HFPNe, readers are referred to [12,13]. With Cell Illustrator, we can model and simulate any biological objects in biopathways, not only biochemical reactions, molecular density, and 3D localization information, but also sequence-level information (Fig. 2), e.g., translated product with frame-shift and translation with codon table. Some modeling applications, including E-CELL [14], Gepasi [15] and BioSPICE [16], require some skills in mathematics and programming. The concept of Cell Illustrator, in contrast, does not require any prior knowledge in differential equations and programming. To achieve this, we developed biologically sophisticated GUIs and related tools described in the following sections. Prerequisites of Cell Illustrator are advertised as “interest in biology, ability to operate a cell phone, and the mathematical ability of standard middle school student or better.” Since 1999, new versions of Cell Illustrator have been released almost every two years. Although there is a policy to distribute software applications as open source in the community, Cell Illustrator is distributed as commercial software in order to afford the ability to attend to every user’s need and make continuous improvements quickly. Cell Illustrator Player (CI Player), a full viewer of Cell Illustrator models without the simulation engine, is freely distributed. Thus, users can share and view complete models similarly to what Adobe Reader does for PDF documents. The text book [13] and [17] present use of Cell Illustrator and its applications in detail. With Cell Illustrator, a considerable number of users have been conducting biopathway modeling and simulation for their interested networks and have proven its practicality in their research. For example, Troncale et al. [18] modeled the regulation of hematopoiesis and investigated the role of interleukin-6 in human early hematopoiesis by simulations with an HFPNe model constructed with Cell Illustrator. Koh et al. [19] conducted parameter estimation by applying an evolutionary technique to their HFPNe model of Akt and MAPK signaling pathways and investigated their working hypothesis on crosstalk interaction. Hardy and Robillard [20] simulated the Ca 2+ /calmodulin-dependent protein kinase II (CaMKII) regulation network with Cell Illustrator, and then analyzed the dynamics of signal propagation in the CaMKII regulation pathway. Sato et al. [21] modeled the olfactory transduction pathway and implied that increased PDE1C dosage extends the longevity of the depolarization signals of the olfactory receptor neuron. Wu and Voit [22] demonstrated how the canonical GMA and S-system models in BST can be directly implemented in HFPNe framework using Cell Illustrator. In addition, they described on Cell Illustrator how to account for different types of time delays as well as for discrete, stochastic, and switching effects [23]. The above mentioned researches have shown that pathway modeling based on the Petri net concept is practically accepted by biological scientists. We believe that Cell Illustrator will enhance more biological research using biopathway modeling and simulation. In the Methods section, we address new functions of Cell Illustrator 4.0 for graph layout, exchange of formats from/to CSML, CSO with visualized icons, and SaaS (Software as a Service) technology and modules. Finally, we discuss limitations of Cell Illustrator 4.0 and further functionalities that should be built up for large scale Systems Biology.
162
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
(a)
(b)
Fig. 1. Petri net elements in HFPNe and their icons on Cell Illustrator. (a) A Petri net consists of three elements: place, transition, and arc. In HFPNe, to bridge the gap between computer science and biology researchers, these terms are renamed as more intuitive terms: entity, process, connector, respectively. In HFPNe, entity and process have three types – discrete, continuous, and generic – and connector has three types – process, associate, and inhibitory. In Cell Illustrator, entity and process can be displayed with more intuitive icons with the annotation by one of the biological pathway ontologies named CSO. (b) The connection rules of elements in HFPNe.
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
163
Fig. 2. Transcription simulation of sequence level on Cell Illustrator. The upper part shows a sequence level simulation model using generic entities and a generic process (http://www.csml.org/download/model/csml30/generic transcription 30.xml). In the lower part, a concentration level simulation model is displayed that uses only one continuous entity and process.
METHODS Automatic graph layout When the total number of elements in a biopathway model is fewer than fifty, the function of the automatic pathway layout is less important. In that situation, it was enough to put and arrange those elements manually. No automatic layout function was implemented in Cell Illustrator 1.0 released in 2002. Along with the progress of Systems Biology, there has arisen a strong requirement for handling larger pathway models and pathway models written in other XML formats that lack graphical layout information. Naturally automatic layout functionality was keenly demanded to solve this requirement. The first simple solution was to use a known graph layout library. The later Cell Illustrator has selected one of the graph layout libraries named JGraph with Circle, Moen, Sugiyama, and organic layout algorithms [24]. Unfortunately, these layout algorithms were not enough for most biopathway models. From this fact, new grid-based layout algorithms have been developed [4–6] and implemented in Cell Illustrator. Figure 3 shows all of the graph layout algorithms, including BLK [4,25], SCCB [6], and Grid Eades [5]. These grid layout algorithms position the elements on the grid points. With this function, Cell Illustrator succeeded in laying out the pathway models by considering cellular location information that has a complicated structure, e.g., a need to position some elements on the internal region of the torus
164
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
Fig. 3. Graph Layout dialog. Six grid layout algorithms (BLK, CB, SCCB, Eades, Random Grid, Adjustment) are implemented. Subcellular localization information can be used for layout. Layout parameters can be set up using the “Option” button.
shape. In Fig. 4, elements in the nucleus, e.g. transcription process, pri-micro RNA entity, are arranged on the nucleus cell component while elements in the cytoplasm, e.g., the translation process, are put on the cytoplasm cell component. It is difficult to generate such a layout with such complicated cellular location information using the force directed based layout algorithms. This is a unique feature of Cell Illustrator 4.0 that is superior to other such software applications. CSML 3.0 and import from and export to other formats The native XML format of Cell Illustrator 1.0 was CSML 1.9 which aimed to represent the HFPNe simulation model with custom graphical information. Until now, various pathway databases were built
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
165
Fig. 4. Layout result of the ASE cell fate model in C. elegans. computed with the grid graph layout algorithm (SCCB) from a random layout shown at the left-bottom. The model has 76 nodes (entities: 24; processes: 52) and 82 connectors.
up with their own XML formats. Therefore, development of XML with high expressive power that can cover most of them for data import without loss of information while keeping their contents, e.g., the biological meanings, simulation model, and layout information, is important. To deal with this situation, CSML version 3.0 [8] was developed as a highly optimized XML format for biopathway modeling and simulation that almost achieves this objective (the latest version is 3.0 in September 2009). Its major features are as follows: 1. Full compatibility with the pathway modeling and simulation ontology format Cell System Ontology 3.0 (CSO) [9]. With this feature, it is possible to exchange other ontology based formats, e.g., BioPAX [26]. 2. Ability to contain the fact-based information that has no effect on the simulation model but has very important meaning to the biopathway, e.g., indirect regulation from one gene to another. 3. Capacity to represent not only high-level Petri net models but also ODE based models. Cell Illustrator 4.0 faithfully implements the major CSML 3.0 specifications: other modeling and simulation XML formats, such as SBML [27] and CellML [28], can be imported exactly into Cell Illustrator 4.0. BioPAX can also be imported into Cell Illustrator 4.0 via CSO 3.0 format while complementing kinetics to template simulation models. Other pathway databases, e.g., KEGG [29,30] and
166
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
Fig. 5. XML format transformation. CellML and SBML are transformed with CellML2CSML and SBML2CSML to CSML 1.9 and BioPAX is transformed with BioPAX2CSO to CSML 3.0 without losing any information. Any models in CSML 1.9, CSML 3.0, and CSO 3.0 run on CIO 4.0. All transformation tools such as CellML2CSML, SBML2CSML, and BioPAX are included in CIO 4.0 and user can directly import any models to CIO 4.0.
TRANSPATH [31], can also be imported into CSML 3.0 format. It should be noted that these functions in Cell Illustrator 4.0 have been employed and extended the results in several studies [32–35]. The detailed internal conversion data flow of Cell Illustrator 4.0 is summarized in Fig. 5. Cell Illustrator 4.0 is able to export a CSML 3.0 model into a CSO 3.0 model without loss of information since CSML 3.0 is fully compatible with CSO 3.0. Other well-known XML formats, SBML V3L3, CellML 1.1, and BioPAX L2, are not rich enough to hold the full CSML 3.0 model; thus, exporting from CSML 3.0 to these formats results in loss of important information, which is not handled in SBML and CellML. For example, none of these formats have unified graphical representation since the model developed in one application cannot load on other application with the same view. Recently, although SBGN L1 has been proposed [36], its format is still under development and is considered only a limited graphical representation of the biological components. As another example, the BioPAX L2 cannot handle signal transduction reactions and also cannot deal with information related to pathway simulation. From the user’s point of view, more export functionality to other formats, e.g., Cytoscape [37], will provide better usability. This will be a task for a future release. Cell System ontology and standard icons The native XML format of Cell Illustrator 4.0 is CSML 3.0, which has the background of Cell System Ontology (CSO) 3.0 [32]. CSO allows ontology based representation of signal transduction pathways, gene regulatory networks, metabolic pathways, cell-cell interactions with kinetics, and graphical information. Formally, the schema is defined using Web Ontology Language (OWL) [38]. The major features of CSO are as follows: 1. CSO 3.0 can be applied to a pathway model both with and without simulation; and 2. the core vocabulary in a biopathway is prepared in CSO 3.0 as entities, processes, and cellular locations (92, 275, and 114, respectively) and all terms in the vocabulary are equipped with standard icons. Feature (i) allows representation of a pathway model in the absence of kinetics, e.g., KEGG [29] or Reactome [39], without loss of information. Feature (ii) not only boosts more exact data exchange with other applications (since the original feature of the ontology concept) but also enhances the intuitive data exchange among users since standard icons completely remove the ambiguity of a graphical pathway model, e.g., the process icon always means phosphorylation among any applications that support CSO 3.0. In Cell Illustrator 4.0, all standard icons are collected in the Biological Element dialog (Fig. 6). Thus, only by repeating drag and drop (D&D) operations of those icons from the dialog with a filtering
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
167
step, e.g., keyword search, user can create a graphical pathway model with an ontology background. The ontology information is effectively used in some applications. Automatic pathway layouts that consider cellular locations with this ontology are implemented in CIO4.0. In [40] and [41], semi-automatic pathway validation and model checking with the ontology information are applied to generate qualified pathway models from given pathway models. The research outputs of those pathway validations and models will be available in a future release. SaaS technology and Cell Illustrator Online Various types of analysis requests are coming from Systems Biology research and development. Some of these requests require a huge supercomputer system for tasks such as optimal parameter search, very expensive/large databases to own, or very specific analysis focused on a specific research topic. No single software can cover all of these capabilities, and software customization is very expensive or impossible to cope with. Thus, inevitably, we require Software as a Service (SaaS) technology [42] for the Systems Biology computational platform. SaaS is a software application delivery model that is usually associated with software businesses and is considered a low-cost way to obtain the same merits without the associated complexity and high initial cost as licensing. The technology is introduced in Cell Illustrator 4.0 and its modules (described in the next section) and is provided by the CIO servers. Users can select the desired modules from them on their demands. The following six modules including the beta version, are serviced on the server side: 1. 2. 3. 4. 5.
CSMLDB Search Module Project Management Module High-performance Simulation Module Pathway Parameter Search Module Pathway Model to Multiple Program Languages Export Module (Java, FORTRAN, C++, C, Perl, and Python) 6. CSML to SVG Module and HTML Module (beta) Moreover, third parties can develop their own modules and deliver them from the server side module using APIs. To distinguish the Cell Illustrator via Java Web Start (JWS) [3] and the original (standalone) version, the former one is named Cell Illustrator Online (CIO). After JRE 1.6.0 12, the 64-bit version of JWS is also supported by Sun Microsystems, Inc. with the technical improvement in 2009. This allows users to allocate more than 1.4 gigabytes of memory to CIO 4.0, which is very useful since pathways with more than 3,000 elements with biological annotations sometimes require more memory that cannot be handled within 32-bit JRE. In our experience, more than 2 terabytes of memory can be allocated and the limitation will depend on the total memory of the client machine. With the JWS technology, users can easily publish their CSML model on a website by creating the URL links with the rules listed in Table 1. If a user creates a link to launch Cell Illustrator Online Player (CIO Player), the read-only version of CIO that can launch without user registration, the linked CSML model can be freely accessed by any user with a Java 6.0 installed on a machine. If the linked CSML file is generated with logging information of simulation called CIL file, CIO Player can also replay the simulation by using the logged information (Fig. 7). Thus, without using CIO itself, the overview of the pathway structure, the kinetics of reactions, and the simulation behavior of the CSML model are available via the Internet.
168
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
Fig. 6. Biological Elements Dialog. Three tabs-“Entity”, “Processes”, and “Cell Component”-are shown. The “Filter” box allows easy search of target elements.
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
169
Table 1 URLs and options for CIO Application by Java Web Start (a) Base URLs Usage Application Academic CIO Player CIO Commercial CIO Player CIO (b) Allowed options (suffixes) Option Allowed value model https://xxx/xxx.csml or http://xxx/xxx.csml antialias on or off
URL https://cionline.hgc.jp/cifileserver/launchCIOPlayer? https://cionline.hgc.jp/cifileserver/launchCIO? https://cio.bioillustrator.com/cifileserver/launchCIOPlayer? https://cio.bioillustrator.com/cifileserver/launchCIO? Status required
Default –
Note Usually, .csml, .csml.gz, .cil or .cil.gz will be the suffix; it is possible to specify multiple CSML models with “,”. optional – If this value is on/off, then force apply antialias/nonantialias to the displayed image; when this option is not given, it starts without changing the setting of CIO. mode GN or BP optional – If GN then CIO is forced to launch in gene network mode; if BP then CIO is forced to launch in Petri net mode. When this option is not given, it starts without changing the setting of CIO. XMX integer value optional 512 The value specifies the maximum memory (MB) of the launched application. If the large gene network model should be loaded on CIO. The size should be larger. In 32bit machine the maximum value can be 1400; in 64bit machine, the maximum value will be almost unlimited. (a) Select server and Java Web Start application. (b) Options for applications. If a model is located at http://www.aaa. bbb/file.csml, user can make URL https://cionline.hgc.jp/cifileserver/launchCIOPlayer?model=http://www.aaa.bbb/file.csml and can view the model in file.csml with CIO Player.
Fig. 7. CIO Player is replaying the simulation of circadian rhythms in the Drosophila melanogaster model. CIO Player can replay the simulation by loading the CIL file, a CSML file with log information. This image contains the loaded URL https://cionline.hgc.jp/cifileserver/launchCIOPlayer?model= http://www.csml.org/download/model/csml30/circadian drosophila 30.cil.gz.
170
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
Fig. 8. CSMLDB Search dialog. Shown are the results of a search for “sam” to in CSMLDB 8.4 (CSMLDB Professional). By default, the search results are sorted by ID.
MODULES OF CIO 4.0 CSMLDB Search Module The “CSMLDB Search Module” can store each CSML pathway model into an XML database and can search the pathway content via GUI interface (Fig. 8). As of September 2009, TRANSPATH Academic and TRANSPATH Professional are fully supported. CSMLDB Academic originates from the academic version of TRANSPATH version 7.4 with Transpath2CSML technology [31] and will be available to academic users via BIOBASE GmbH (trial version of which is available for one month after registration). CSMLDB Professional originates from the commercial version of TRANSPATH version 8.4 with Transpath2CSML technology and contains more than 100,000 reactions discovered in mammalians (Table 2). The CSMLDB GUI provides three tabs, “Entity”, “Process” and “Fact” and can search over name, ID and synonyms for entity, process and fact elements (Fig. 8). The matched result can be placed by
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
171
Table 2 Total number of entries in CSMLDB 7.4 and CSMLDB 8.4 Database CSMLDB 7.4 CSMLDB 8.4
Element type Entity Process Entity Process
Number of entries 89,469 140,868 117,967 182,383
drag-and-drop (D&D) to the main canvas by merging to the current model of the active canvas or by insertion onto a new canvas with or without applying automatic graph layout (Fig. 9). It should be noted that the source CSML models of CSMLDB Academic and Professional are simulatable models since entity and process elements are stored in the CSMLDB. In other words, no content is stored in the Fact tab of that dialog in the current version. Facts, e.g., indirect reactions, that do not have any effect on simulation models will be prepared as fact elements in a future release. The module allows users to simplify the modeling step into two steps: (i) search the genes and proteins of interest from the known 100,000 reactions, and (ii) D&D the matched result by filtering. Moreover the created model will be a template ready for simulation. Project Management and CSML Pathway Library Modules As already mentioned, CIO 4.0 is launched with JWS technology after user authentication. With this feature, the CIO 4.0 server can identify each user and can provide services in each user level. The “Project Management Module” (the top rectangle region in Fig. 10a) allows users to create their own projects on the server side and stores the CSML models and any files related with those projects, e.g., pdf, ppt, doc, xls, and txt. With this module, users can launch CIO 4.0 on any computer and can access his/her projects on the server side. With D&D operation of a CSML model into the main canvas, the CSML model is automatically opened in the main canvas. Moreover, users can share the contents with other users for project level with read or read-write permissions (Fig. 10b). The “CSML Pathway Library Module” (the bottom rectangle region in Fig. 10a) provides the CSML pathway libraries in public domains (public library) or commercial (commercial library). The public library contains all CSML models in http://www.csml.org/ with three categories, signal transduction pathways, gene regulatory networks and metabolic pathways. The library also contains all CSML models in the text book [13]. The commercial library registers more than 1,000 well-established biopathways that originated from TRANSPATH [43] in BIOBASE with the Transpath2CSML application [31]. All of those CSML models can be loaded, edited, saved, and simulated with CIO 4.0. Those modules ease users to access and share their CSML models and reuse of CSML libraries in public or commercial domains. High-performance Simulation Module The native simulation engine in CIO 4.0 is tightly integrated with one of the script engines, named Pnuts [44], which can be compiled into Java byte code. The native simulation engine has two modes: simple math engine and complex math engine. The two modes are automatically switched depending on the complexity of the reactions of the pathway model. In more detail, if the reaction rules consist of simple math, e.g. four arithmetic operations, simple math engine is used. If other reaction rules that can be described with Pnuts script language – e.g. “if . . . then . . .”, pow, log, and Java method itself – then
172
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
Fig. 9. Importing models from CSMLDB. When a canvas has a model p53 + Mdm → p53:Mdm and to the canvas, a model for p53 → MDM2 (activation of transcription of MDM2 by p53) is imported with the combination of checkbox options: “Merge”, “Auto-Layout” and “Create New Canvas” (in a total of six patterns). If “Merge” is checked, the entities with the same ID on a canvas are merged into one entity. If “Auto-Layout” is checked, automatic layout is applied to the whole elements on the canvas. If “Create New Canvas” is checked, the newly inserted model (without “Merge” option) or merged model (with “Merge” option) is inserted on the new canvas.
complex math engine is used. The selected mode is shown in the status bar at the bottom as Optimize on or off (“on” means use of the simple math engine). The simulation performance of the simple math engine is ten times better than the complex math engine on average and both engines are acceptable with 100 reactions (200 to 300 as total elements) on a normal machine. Most of the simulation models are categorized into this range of reactions and acceptable for most users while sometimes the size of a simulation model is getting more than thousand reactions (3,000 to 4,000 as total elements, depend on the complexity of scripts). For example, a Caenorhabditis elegans vulval development model that modeled cell-cell regulations of six cells and signal transduction
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
(a)
173
(b)
Fig. 10. Project Manager Dialog. (a) The Project Manager dialog consists of two sections, “User area” (top) and “Library area” (bottom). (b) Sharing step of a project (here “our shared project”) in the “User area.” Each project can be shared with “Read”, “Write”, or both permissions.
regulatory network in each cell contains 1,649 elements (the complexity of each script is high) and takes several hours on a normal machine (12,410 seconds on Intel Core 2 Extreme X9650 3 G Hz) [40,45]. In CIO 4.0 under the SaaS concept, users can activate and use the high-performance simulation module on demand. The module writes down the CSML model into pure Java native language and compiles it by using javac, which is freely distributed with Java Development Kit (JDK) [46] (requires JDK 6.0 or higher). Using this module, the above C. elegans model can run within one min, i.e., hundreds times faster (31 seconds on the same machine). Thus, the High-performance Simulation Module can accept models with thousands of elements. Pathway Parameter Search Module The problem of parameter search for dynamic pathway models is one of the most crucial topics in Systems Biology. Some challenges have been made for automatic parameter estimation for HFPNe models by using a technology called data assimilation (DA) which blends simulation models and observational data rationally [47,48]. This data assimilation method is more suited for a high-performance computing system with peta FLOPS computing ability. These efforts and developments are anticipated to create
174
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
Fig. 11. Example of externalization of coefficient. The top model has a mass action m1*0.1 on the process p1. The coefficient of the mass action, i.e., 0.1, can be externalized with an entity e3 and the mass action on the process becomes m1*k1 as in the bottom model. Once externalized, the parameter can be estimated with the “Pathway Parameter Search Module.”
groundbreaking modeling platforms for Systems Biology. In geophysics, the DA approach is applied to the prediction of the El Nin˜ o-Southern Oscillation (ENSO) phenomenon [49] that is known as the strongest climate variation on seasonal to inter-annual timescales. Since this requires high performance computers to obtain the acceptable performance, we decided not to include this function to the CIO 4.0 module. As an alternative solution, in CIO 4.0, “Pathway Parameter Search Module” is provided for the normal computer. This module executes multiple simulations at once with many initial conditions with some range of values, e.g. run six simulations from the initial value zero to ten with every two intervals, and displays the results with 2D or 3D plots. If user searches ten different conditions for each of three entities, then in total one thousand simulations (10 3 ) should be executed at once. The module cannot work with acceptable performance without using the technology in “High-performance Simulation Module”. This module informs how the systems behavior will change according to the changes in initial values. Additionally, by minor updates of the target model, the module can be applicable to investigate the effect of the coefficient of reaction speed of process and threshold value of connector. In those cases, the coefficient (or threshold) itself should be represented by using an entity (what we call externalization of coefficient or threshold). If the coefficient (or threshold) is once externalized, the instruction to use “Pathway Parameter Search Module” is the same. An example to externalize the coefficient of mass action is shown in Fig. 11. Pathway Model to Multiple Program Languages Export Module As an advanced usage, users want to use their CSML models for other applications as simulation models. For this purpose, “Pathway Model to Multiple Program Languages Export Module” is pro-
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
175
vided. The module can export one CSML model into one simulation model with Java, FORTRAN, C++, C, Perl or Python. The exported program can be directly compiled and executed with suitable compiler of each programming language, e.g., javac, gfortran95, g++, gcc or executed without the step of compile, i.e., Perl or Python. If the written script of the input model only contains predefined kinetics, i.e., mass, stochasticmass, stochasticlognormalmass, Michaelis Menten provided in CIO 4.0 or using custom kinetics, i. e. connectorrate, connectorcustom and custom, but limited to use four arithmetic operations, IfTime(simulator, compare time), getElapsedTime(simulator), getSamplingInterval(simulator) and ternary operator, e.g., ? x : y;, then the generated model can be compiled without any updates (note that the ternary operator is special scripting syntax for modules (3) and (4), this can be used instead of “if” and “else” statement in normal simulation mode since if else syntax is supported in Pnuts script). In other words, if the model contains other advanced syntax in Pnuts script language, the exported model should apply custom update with some efforts. The exported result of the CSML model to several programming languages is shown in Fig. 12. The function of this module to export with Java is used in one of the processing steps in module “Pathway Parameter Search”. CSML to SVG Module and CSML to HTML Module In CIO 4.0, without using module functionalities, the model can be saved with some raster image formats, i.e., png or jpeg format, which are usually used for displaying purpose only. For editing purpose, vector image format is better, e.g., pdf, ai and SVG [50]. “CSML to SVG Module” is developed for this purpose and can export the CSML model as a file with SVG format. The reason to select SVG format among those vector image formats is that the format is the sole XML format to represent vector image and also officially supported as a vector image format in CSML 3.0, e.g.
. With the same reason all predefined biological terms with icons in CSML 3.0 are distributed in SVG format. Many viewers and editors of SVG format are distributed, e.g., Inkscape [51], Adobe Illustrator, and CorelDraw, still with minor implementation differences of SVG format among them. Thus, the exported result by this module might not be correctly displayed on some platforms and therefore the module is currently in beta status. A snapshot of CIO 4.0 for the circadian rhythms in a Mus musculus model and the exported and loaded result on Inkscape are shown in Fig. 13. For reporting purposes of CSML models, the “CSML to HTML Module” is provided. As in Fig. 14, this module generates HTML files with png images by taking one CSML model as its input. RESULTS AND DISCUSSION Since the first release of Cell Illustrator 1.0, many developments and improvements have been made. First, the native format is extended to be more suitable for biological pathway modeling, visualization, and simulation of ontology background by using new formats CSML 3.0 and CSO 3.0. The CSML 3.0 can create simulation models not only limited to Petri net models but also ODE based models. The standard icons are prepared in CSO 3.0 and thus user can create the graphical pathway model with ontology background by simply preparing the D&D operations from the Biological Element dialog that provides all those predefined icons. The created model becomes a template ready-for-simulation model since it is represented with HFPNe. With the highly optimized feature of CSML 3.0 and CSO 3.0 to represent biopathways, pathway databases in other formats can be imported to CIO 4.0 directly without loss of any information, e.g., KEGG, Reactome [39] with BioPAX [26], CellML repository [28], or BioModels [52].
176
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
Fig. 12. Program Language Export Module. The converted result of the HFPNe model on top into the source codes of Java language with Program Language Export Module.
As to SaaS technology, Cell Illustrator 4.0 was developed with the Java Web Start technology with authentication on server side and each user can select user’s own optimal combination of modules in Cell Illustrator. The “High-performance Simulation Module” helps the user whose focus is to conduct a heavy simulation with thousands of elements. The “Pathway Parameter Search Module” allows users to
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
177
(a)
(b)
Fig. 13. Result of CSML to SVG Module. Circadian rhythms in Mus musculus model (http://www.csml.org/models/csmlmodels/circadian-rhythms-in-mouse/) (top) converted to an SVG image and loaded on Inkscape (bottom).
178
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
Fig. 14. Result of CSML to HTML Module. Circadian rhythms in Mus musculus model (http://www.csml.org/models/ csml-models/circadian-rhythms-in-mouse/) converted to an HTML reporter format with the “CSML to HTML Module” and loaded on a web browser. Entities and processes are arranged on the header with links to the detailed descriptions that contains ID, name, simulation properties, biological properties and external URL links.
find better parameter sets for their models. The “Pathway Model to Multiple Program Language Export Module” will be useful for the more advanced user who needs to connect the simulation model on CIO 4.0 to user’s own application with source code level. The “CSMLDB Search Module” will be helpful for the user who needs to create mammalian pathway models from scratch since more than 100,000 reliable reactions are registered in CSMLDB. The websites [8,53] and the textbook [17] are useful for users who
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
179
are interested in developing their skills for building biological pathways with Cell Illustrator. The forthcoming Cell Illustrator Online (CIO 5.0) will have the full implementation of CSML 3.0. By this feature, an entity-fact based pathway model (static pathway) and an entity-process based pathway model (simulatable pathway) can be mixed into one model. The CIO 4.0 needs to model the entity-fact and entity-process based pathway models in different modes, named “gene network mode” and “normal mode”. Moreover, the user can create multiple sub-views from the main model by filtering the contents with some rules, e.g., gene layer, protein layer, nucleus layer, cytoplasmic layer, or expression levels of each element. Those sub-views do not have any effect on simulation since the main model (model before filtering) is simulated. Since the network size is getting larger, e.g., several thousand elements, the sub-view concept will be inevitable to grasp the characteristic features of those pathways. In CSML 3.0, any language can be set for simulation of each kinetics and initial value, namely, <script language=""> is used in that format. However, CIO 4.0 currently supports only the Pnuts language [44]. In CIO 5.0, script-based languages Javascript and Jython [54] and a compile-based Java language will be allowed. Furthermore, CIO 5.0 will allow mixing of several script languages in one model, e.g., one reaction speed uses Javascript language and another reaction speed uses Java language. In CIO 4.0, the simulation result of ODE compatible modeling as mentioned before (just model with continuous elements and assign “nocheck” as the weight parameter of arcs) is similar to the simulation result of the numerical integration of the Euler method. To keep better compatibility with the high precision ODEbased simulators without violating the Petri-net formalism, higher order numerical integration methods, e.g., Runge-Kutta, can be selected in the next release. In CIO 4.0, the biological elements of mRNA, protein, and their modified form and complexes are available as standard 92 icons of the CSO core vocabulary and 100,000 icons of the “CSMLDB Search Module”. But CIO 4.0 supports less vocabulary of chemical compounds and the future release should cope with this weakness owing to high user demand. ACKNOWLEDGEMENTS We are grateful to many people. First and foremost, we would like to thank the current and former members of the Cell System Markup Language projects: Hiroko Nishihata, Kazuyuki Numata, Atsushi Doi, Yayoi Sekiya, Yoshinori Tamada, Simamura Teppei, Ruy Yamaguchi, Seiya Imoto, Kazuko Ueno of Human Genome Center in University of Tokyo; Hanji Hioka, Yuto Ikegami, Hironori Kitakaze, Yoshimasa Miwa, Daichi Saihara, Tomoaki Yamamotoya, Hiroshi Matsuno of Yamaguchi University. We would also thank users of Cell Illustrator who develop the excellent models on this platform and give insightful feedbacks for the development of Cell Illustrator. REFERENCES [1] [2] [3] [4] [5]
Nagasaki, M., Doi, A., Matsuno, H. and Miyano, S. (2003). Genomic Object Net: I. A platform for modeling and simulating biopathways. Appl. Bioinformatics 2, 181-184. http://www.genomicobject.net/. http://java.sun.com/javase/technologies/desktop/javawebstart/index.jsp. Kato, M., Nagasaki, M., Doi, A. and Miyano, S. (2005). Automatic drawing of biological networks using cross cost and subcomponent data. Genome Inform. 16, 22-31. Kojima, K., Nagasaki, M., Jeong, E., Kato, M. and Miyano, S. (2007). An efficient grid layout algorithm for biological networks utilizing various biological attributes. BMC Bioinformatics 8, 76.
180 [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40]
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology Kojima, K., Nagasaki, M. and Miyano, S. (2008). Fast grid layout algorithm for biological networks with sweep calculation. Bioinformatics 24, 1433-1441. Hashimoto, T. B., Nagasaki, M., Kojima, K. and Miyano, S. (2009) BFL: a node and edge betweenness based fast layout algorithm for large scale networks. Bioinformatics 10, 19. http://www.csml.org/. Jeong, E., Nagasaki, M., Saito, A. and Miyano, S. (2007) Cell System Ontology: Representation for modeling, visualizing, and simulating biological pathways. In Silico Biol. 7, 0055. Alla, H. and David, R. (1998). Continuous and Hybrid Petri Nets. Journal of Circuits, Systems, and Computers 8, 159-188. Matsuno, H., Doi, A., Nagasaki, M. and Miyano, S. (2000). Hybrid Petri net representation of gene regulatory network. Pac. Symp. Biocomput. 5, 341-352. Nagasaki, M., Doi, A., Matsuno, H. and Miyano, S. (2004). A versatile Petri net based architecture for modeling and simulation of complex biological processes. Genome Inform. 15, 180-197. Nagasaki, M., Saito, A., Doi, A., Matsuno, H. and Miyano, S. (2009). Foundations of Systems Biology: Using Cell Illustrator and Pathway Databases. Springer, Berlin Heidelberg. Tomita, M. (2001). Whole-cell simulation: a grand challenge of the 21st century. Trends Biotechnol. 19, 205-210. Mendes, P., (1993). GEPASI: a software for modeling the dynamics, steady states and control of biochemical and other systems. Comput. Appl. Biosci. 9, 563-571. http://biospice.sourceforge.net/. Nagasaki, M., Doi, A., Matsuno, H. and Miyano, S. (2005). Computational modeling of biological processes with Petri Net-based architecture. In: Bioinformatics Technologies, Chen, Y.-P. P. (ed.), Springer, pp. 179-242. Troncale, S., Tahi, F., Campard, D., Vannier, J.-P. and Guespin, J. (2006). Modeling and simulation with Hybrid Functional Petri Nets of the role of interleukin-6 in human early haematopoiesis. Pac. Symp. Biocomput. 11, 427-438. Koh, G., Teong, H. F., Cl´ement, M. V., Hsu, D. and Thiagarajan, P. S. (2006). A decompositional approach to parameter estimation in pathway modeling; a case study of the Akt and MAPK pathways and their crosstalk. Bioinformatics 22, e271-e280. Hardy, S. and Robillard, P. N. (2008). Petri net-based method for the analysis of the dynamics of signal propagation in signaling pathways. Bioinformatics 24, 209-217. Sato, Y., Hashiguchi, Y. and Nishida, M. (2009). Evolution of multiple phosphodiesterase isoforms in stickleback involved in cAMP signal transduction pathway. BMC Syst. Biol. 3, 23. Wu, J. and Voit, E. (2009). Hybrid modeling in biochemical systems theory by means of functional petri nets. J. Bioinform. Comput. Biol. 1, 107-134. Wu, J. and Voit, E. (2009). Integrative biological systems modeling: challenges and opportunities. Front. Comput. Sci. China 3, 92-100. http://www.jgraph.com/jgraph.html. Li, W. and Kurata, H. (2005). A grid layout algorithm for automatic drawing of biochemical networks. Bioinformatics 21, 2036-2042. http://www.biopax.org/. http://www.sbml.org/. http://www.cellml.org/. http://www.kegg.org/. Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H. and Kanehisa, M. (1999). KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 27, 29-34. Nagasaki, M., Saito, A., Li, C., Jeong, E. and Miyano, S. (2008). Systematic reconstruction of TRANSPATH data into Cell System Markup Language. BMC Syst. Biol. 2, 53. Jeong, E., Nagasaki, M. and Miyano, S. (2007). Conversion from BioPAX to CSO for System Dynamics and Visualization of Biological Pathway. Genome Inform. 18, 225-236. http://www.csml.org/tools/sbml2csml/. http://www.csml.org/tools/cellml2csml/. Nagasaki, M., Doi, A., Matsuno, H. and Miyano, S. (2003). Recreating biopathway databases towards simulation. In: Computational Methods in Systems Biology, Miyano, S., Wolkenhauer, O., Degano, P., Danos, V., Lincoln, P. and Cho, K.-H. (eds.). Lecture Notes in Computer Science 2602, 168-169. Le Novere N., et al. (2009). The Systems Biology Graphical Notation. Nat. Biotechnol. 27, 735-741. Killcoyne, S., Carter, G. W., Smith, J. and Boyle, J. (2009). Cytoscape: a community-based framework for network modeling. Methods Mol. Biol. 563, 219-239. http://www.w3.org/TR/owl-features/. http://www.reactome.org/. Li, C., Nagasaki, M., Ueno, K. and Miyano, S. (2009). Simulation-based model checking approach to cell fate specification
M. Nagasaki et al. / Cell Illustrator 4.0: A Computational Platform for Systems Biology
[41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54]
181
during Caenorhabditis elegans vulval development by hybrid functional Petri net with extension. BMC Syst. Biol. 3, 42. Jeong, E., Nagasaki, M. and Miyano, S. (2008). Rule-based reasoning for system dynamics in cell systems. Genome Inform. 20, 25-36. Hoch, F., Kerr, M. and Griffith, A. (2001). Software as a Service: Strategic Backgrounder, SIIA eBusiness Division, Software & Industry. http://www.siia.net/estore/pubs/SSB-01.pdf. Schacherer, F., Choi, C., G¨otze, U., Krull, M., Pistor, S. and Wingender, E. (2001). The TRANSPATH signal transduction database: a knowledge base on signal transduction networks, Bioinformatics 17, 1053-1057. http://pnuts.org/. http://www.csml.org/models/csml-models/vulvaldev/. http://java.sun.com/. Nagasaki, M., Yamaguchi, R., Yoshida, R., Imoto, S., Doi, A., Tamada, Y., Matsuno, H., Miyano, S. and Higuchi, T. (2006). Genomic data assimilation for estimating hybrid functional Petri net from time-course gene expression data. Genome Inform. 17, 46-61. Tasaki S., Nagasaki M., Oyama M., Hata H., Ueno K., Yoshida R., Higuchi T., Sugano S. and Miyano S. (2006). Modeling and estimation of dynamic EGFR pathway by data assimilation approach using time series proteomic data. Genome Inform. 17, 226-238. Chen, D., Zebiak, S. E., Busalacchi, A. J. and Cane, M. A. (1995). An improved procedure for El Ni˜no forecasting: Implications for predictability. Science 268, 1699-1702. http://www.w3.org/Graphics/SVG/. http://www.inkscape.org/. http://www.biomodels.net/. http://genome.ib.sci.yamaguchi-u.ac.jp/˜gon/. http://www.jython.org/.
182
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2010, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-182
Modeling of Cell-to-Cell Communication Processes with Petri Nets Using the Example of Quorum Sensing Sebastian Janowskia,∗, Benjamin Kormeiera , Thoralf T¨opela , Klaus Hippea , Ralf Hofest¨adta , Nils Willassen b, Rafael Friesena , Sebastian Ruberta , Daniela Borcka , Peik Haugenb and Ming Chenc a
Department of Bioinformatics, Faculty of Technology, Bielefeld University, Bielefeld, Germany Department of Molecular Biotechnology, Faculty of Medicine, University of Tromsø, Tromsø, Norway c Department of Bioinformatics, College of Life Science, Zhejiang University, Hangzhou, P.R. China b
ABSTRACT: The understanding of the molecular mechanism of cell-to-cell communication is fundamental for system biology. Up to now, the main objectives of bioinformatics have been reconstruction, modeling and analysis of metabolic, regulatory and signaling processes, based on data generated from high-throughput technologies. Cell-to-cell communication or quorum sensing (QS), the use of small molecule signals to coordinate complex patterns of behavior in bacteria, has been the focus of many reports over the past decade. Based on the quorum sensing process of the organism Aliivibrio salmonicida, we aim at developing a functional Petri net, which will allow modeling and simulating cell-to-cell communication processes. Using a new editor-controlled information system called VANESA (http://vanesa.sf.net), we present how to combine different fields of studies such as life-science, database consulting, modeling, visualization and simulation for a semi-automatic reconstruction of the complex signaling quorum sensing network. We show how cell-to-cell communication processes and information-flow within a cell and across cell colonies can be modeled using VANESA and how those models can be simulated with Petri net network structures in a sophisticated way. KEYWORDS: Quorum sensing, cell-to-cell communication, cellular rhythm, dynamic modeling, Petri nets, database integration, VANESA, biological network editor, simulation
INTRODUCTION Every living cell is governed by a vast network of interacting proteins, RNA, DNA, metabolites and other molecules. There are many biochemical processes involved in a cell. Naturally, those processes with their elements and interactions can be captured by network representations. During research enormous amounts of biological data are produced daily. Information from different fields of studies is brought together in order to examine and analyze quantities and relationships. The approach of extracting, analyzing and modeling meaningful biological data of heterogeneous data sets as a biomedical network ∗
Corresponding author. E-mail: [email protected].
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
183
is one of the main tasks in integrative bioinformatics. To trim down data to a manageable yet relevant size and to analyze and identify new as well as altered versions of interaction patterns we have implemented a new editor-controlled information system called VANESA (http://vanesa.sf.net). VANESA provides new bioinformatics methods and visualization approaches to analyze dynamic interacting networks. The idea of VANESA is to extend any molecular data based network by new targets and interacting elements. Using VANESA we aim at developing sophisticated network structures for the modeling and simulation of coordinated cell actions based on the quorum sensing system of the bacteria Aliivibrio salmonicida. Coordinated cell actions and basic cellular activities are controlled by cell signaling and communication processes. The study of individual parts of cell signaling pathways has become a major objective in bioinformatics. Bacterial cells are able to adapt their behavior to the environment and its conditions [Schauder et al., 2001]. In biology and medicine, investigating how cells perceive and respond to their microenvironment adapting processes such as development, growth, tissue repair, virulence production and other complex actions can lead to a better understanding of molecular interactions and the causes of diseases [Visick et al., 2005]. It is necessary to understand the gene-controlled cell differentiation processes to be able to modify the metabolic behavior, which will be the elementary operation of synthetic biology. Until now methods of biotechnology could not control the cell differentiation process, which is based on fundamental gene regulation events. One aspect related to cell differentiation is cell-to-cell communication. Although cell-to-cell communication is not stringently connected to cell differentiation, it plays an important role in gene regulation processes. Therefore it is necessary to study and to understand cell-to-cell communication processes. Based on the quorum sensing process of the organism Aliivibrio salmonicida our goal is to develop sophisticated cell differentiation models. The exploration of the quorum sensing system includes many different fields of studies. In addition to the actual experimental laboratory work, integrative bioinformatics methods are necessary to extend the molecular knowledge. In systems biology and especially in the case of cell-to-cell communication the path from an initial question to a proven answer consists of many individual steps such as laboratory work, literature research, life-science database consulting, modeling, simulation and visualization, among others. In this paper we will present how to combine the aforementioned fields of studies to construct complex Petri nets for the simulation of cell-to-cell communication processes. We will show how the quorum sensing network of the bacteria Aliivibrio salmonicida has been reconstructed by VANESA, using experimental data and literature. The main idea of VANESA is to offer a powerful network editor to reconstruct biological systems. Presently none of the Petri net simulation software applications are able to provide scientists with strong network editing function and network prediction tools. However, there is an urgent need for a strong network editor with additional analysis functions to model experimental results that can be expanded with database information. Since none of the Petri net simulation tools offers such functionality, the software application VANESA was conceived. Based on the project experimental data and the integrated databases we began our work by exploring and reconstructing the quorum sensing system in VANESA. For the simulation processes of the cellto-cell communication processes we made use of the Petri net language. Therefore, we implemented an additional software feature in VANESA, which allows the automatic transformation of the quorum sensing network into the language of a Petri net. Using VANESA and complex Petri net structures we now present new communication models for the transmission and information flow within and across cells.
184
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
BASICS This section presents the conceptual basics of the interdisciplinary subject. First of all, the quorum sensing system of the organism Aliivibrio salmonicida is described. The section gives an insight into the initial question of the dynamic biological system and the underlying research project. In the following, a Petri net definition is given. Special consideration is given to the Petri net language as it serves as the simulation technique for the quorum sensing system. Quorum sensing Over time, natural science and its studies on living organisms has answered many essential questions. But still, a lot of biochemical processes are far beyond our understanding. Research groups around the world address their attention to unanswered questions and biological phenomena. One question that is not answered yet is how the biological system quorum sensing works in detail. Understanding the intracellular molecular machinery that is responsible for the complex collective behavior of cellular and multicellular populations is one of the main tasks in natural science. Quorum sensing is a sophisticated cell-to-cell signaling system and an important issue in the study of bacterial behavior dynamics. The intracellular circuitry of signal transduction and gene expression has become the focus of important research activities. Many bacterial species use quorum sensing to coordinate their gene expression according to the local density of their population. The bacteria population benefits from quorum sensing as they have the ability to restrict certain behavioral traits. By this process, the system allows to control the gene expression of the entire bacteria community and to have an effect on the bacterial behavior [Schauder et al., 2001]. Accordingly, bacteria populations are able to control and limit activities that will not have any impact on small population densities. In the aspect of fish infection by pathogenic bacteria, the exploration of the quorum sensing network is very important. The quorum sensing regulated virulence factor production is of great importance in the development of more efficient vaccines and vaccination strategies and is of broad interest to a large scientific community. Vibrio infection which occurs in many species including shellfish, fish and humans, has been reported throughout the world. Fish infection with Vibrio bacteria is one of the major bacterial threats in marine aquaculture [Enger et al., 1989]. Breading at high densities under artificial conditions imposes considerable risks of losses from outbreaks of infectious diseases. Therapeutic treatments may harm the environment, and modern intensive farming practices are increasingly confronted with husbandry diseases. Intensive fish farming has unfortunately suffered from disease problems although effective vaccines have reduced the economical losses dramatically. Farming of other marine species (cod, turbot, halibut etc.) has, however, faced other disease problems, which require new vaccines, vaccination strategies or use of alternative therapeutics, individually adjusted depending on the fish pathogen and its host. Motivated by the goal of having a better understanding of Vibrio infections we began to examine the quorum sensing network of the organism Aliivibrio salmonicida. Aliivibrio salmonicida, a moderate halophilic and psychrophilic bacterium, is the causative agent of cold-water vibriosis. In contrast to other bacterial infections in fish, which usually occur during summer months, outbreaks of cold-water vibriosis occur predominantly during winter months when water temperatures are below 10 ◦ . Surprisingly, no exotoxins or enzymes with cytolytic activity have been identified in Aliivibrio salmonicida, although the infection results in tissue degradation, haemolysis and sepsis in vivo [Totland et al., 1988]. Infections due to Aliivibrio salmonicida have a 50-fold greater occurrence in salmon compared to cod, suggesting that this pathogen requires very specific host properties [Schrøder et al., 1992]. The whole genome
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
185
of Aliivibrio salmonicida has been sequenced and the quorum sensing system predicted [Hjerde et al., 2008]. An abstraction of the quorum sensing model would be comparable to an ‘on’ and ‘off’ system under the influence of molecular noise [Goryachev et al., 2006]. Bacteria that use quorum sensing constantly produce small, freely diffusible signaling molecules (also called autoinducers or pheromones). The detection of the autoinducer works on the basis of specific receptors which recognize external signaling molecules. The likelihood of a bacterium detecting its own signaling molecules is very low and can be dismissed. When the inducer binds to the receptor, the transcription of certain genes will be activated. Among those genes, the synthesis of the autoinducer will be stimulated. At low density, when the bacteria exist only as a few individuals, diffusion of the autoinducer into the environment reduces the concentration to almost zero. Until a certain threshold concentration of the autoinducer is achieved in the environment, the intracellular network remains in the ‘off’ state. This is a kind of a stand-by level of autoinducer production. When the population grows and cell density increases, the concentration of the inducer increases and will pass a certain threshold level. When this threshold level is achieved the incoming signal will activate the quorum sensing network through intracellular and extracellular receptors which turns on the expression of the phenotype specific genes. In many cases the activation often strongly increases the production of autoinducer. The intracellular network then switches into the ‘on’ state. This phenomenon is a so-called positive feedback loop. The network becomes fully activated and activates phenotype specific genes. The intracellular circuitry of the cell will change in order to initiate processes such as motility, biofilm formation, virulence factor production and secretion or other processes to achieve advantages for adaptation to the environment and survival. In general, vibrios have the ability to perform quorum sensing in a similar fashion [M u¨ ller et al., 2006]. Communication via LuxI/LuxR-type signaling circuits appears to be the standard mechanism by which many gram-negative bacteria communicate. It was in V. fischeri that the quorum sensing transcriptional regulators LuxR and LuxI (the autoinducer synthase) were first discovered [Antunes et al., 2007]. The LuxI-produced signaling molecule, an acyl homoserine lactone (acyl-HSL), diffuses out of the cell, and under conditions of high cell density, returns back into the cell. Within the cell it activates LuxR and thus the lux bioluminescence operon, which are under LuxR control [Visick et al., 2005]. Studies of quorum sensing in Aliivibrio salmonicida subsequently revealed a much more complex system for the LuxI/LuxR-type pathway, including two LuxR transcriptional regulators. Figure 1 demonstrates a proposed quorum sensing network of Aliivibrio salmonicida under low and high population density which is based on experimental data [Hjerde et al., 2008]. It is fascinating to the scientific community how the proposed network appears and how the intracellular circuitry of signal transduction and gene expression adapts to environmental perturbations. There are still many unanswered questions and unexplored directions in this circuitry. The aim is to answer those questions and to expand the relevant paths in the network by predicted information, quantitative experimental data and literature to construct a sophisticated cell-to-cell-communication model. Petri nets A Petri net is a mathematical modeling language with an exact mathematical definition of its execution semantics. Petri nets offer a graphical notation for stepwise processes that include choice, iteration, and concurrent execution. The advantage of Petri nets is that they provide a balance between modeling power and analyzability. Using Petri nets, complex systems can be modeled and simulated in a sophisticated
186
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
Fig. 1. Proposed quorum sensing system in Aliivibrio salmonicida*.
way. Furthermore, concurrent systems can be automatically determined for Petri nets, although some of the systems are difficult and expensive to determine [Petri and Reisig, 2008]. For the modeling and simulation of cell-to-cell communication processes we make use of Petri nets. The various modeling possibilities and analytic power offer a well-developed basis for the description of chemical processes and, in addition, a mathematical theory for process analysis. A Petri net PN = (P , T , F , W , m0 ) consists of a finite set of transitions (T ) and a finite set of places (P ), which are connected by directed arcs (F ). F ⊆ (P × T ) ∪ (T × P ) is a finite set of arcs. The triple (P , T , F ) is called net and W is called weight function (W : F → N). Regarding the graphical representation, places are drawn as circles, transitions are drawn as rectangles and arcs are drawn as directed arrows. Places may contain tokens, which are drawn as black dots. In the beginning, the start configuration, which is called m 0 , will assign tokens to places. The definition of the basic Petri net is given in [Reisig, 1982]. The first application of Petri nets for modeling simple biochemical reactions was published by Reddy et al. [Reddy et al., 1993]. The idea presented in that paper was that Petri nets *A colored version of the figure/chart is available at In Silico Biol. 10, 0003 , 1 February 2010.
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
187
Fig. 2. The enzyme-catalyzed process of glucose into glucose-6-phosphate is modeled using a simple Petri net. The left side shows one abstract glucose molecule, one abstract enzyme hexokinase and energy (ATP). The right side shows the result of this biochemical reaction*.
Fig. 3. This Petri net represents an abstract gene-controlled biochemical reaction. Fundamental processes like protein synthesis and gene regulation can be modeled and integrated*.
are useful for the representation of qualitative reactions. Figure 2 shows a Petri net representation of a simple enzyme-controlled reaction. The disadvantage of this application is that neither complex metabolic processes nor quantitative processes can be modeled. With the more complex Petri net language devised by Hofest a¨ dt [Hofest¨adt, 1994], it was shown that the Petri net concept can also model gene-controlled metabolic networks (Fig. 3) and cell communication processes. It was important to include the modeling of kinetic effects. Therefore the functional Petri net was defined. The parameters of the Petri net language were extended [Hofest a¨ dt and Thelen, 1998], which now allows the kinetic simulation of metabolic networks by placing specific functions to the arcs. Based on the definition of the self modifying Petri net [Valk, 1978] a functional Petri net is a 5-tuple FPN = (P , T , F , VF , m0 ) where (P , T , F ) is a net and VF is a mapping, which assigns each f from F a mapping VF (f ) and VF (f ) is an element of: {g(x1 ,. . . ,xn ) | g : PN × . . . × PN IN, n ∈ IN}. *A colored version of the figure/chart is available at In Silico Biol. 10, 0003 , 1 February 2010.
188
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
Fig. 4. This figure shows the fundamental symbols of HFPN [Matsuno et al., 2003]*.
PN represents any number IN or the number of tokens represented by the place P N regarding the actual configuration of FPN. The advantage of the FPN is that kinetic effects of biological networks can be simulated. Therefore, any qualitative Petri net model of a biological network can be extended by using the functional Petri net and including quantitative experimental data (for example quantitative proteomic data or kinetic data of enzymes). Regarding molecular biology we can see an exponentially growing amount of quantitative data. Therefore, any more realistic simulation of biological networks required more extensions. The usage of real numbers instead of tokens is one important aspect of the Hybrid Functional Petri Net (HFPN) [Matsuno et al., 2003]. The HFPN is an extension of the Hybrid Petri Net (HPN) [Alla and David, 1998]. The idea of the HPN was the representation of two kinds of places and transitions that allow calculating the discrete and analytical molecular values. Therefore, discrete places (discrete transitions) and the continuous places (continuous transitions) were defined (Fig. 4). The idea of the continuous place is that nonnegative real numbers can be used, which can be interpreted as the concentration of metabolites.
METHODS Due to the complexity of pathway interactions and large number of components involved in signal transduction, cellular rhythms and cell-to-cell communication, it is quite difficult to intuitively understand the behavior of cellular networks. Still, we do not understand many fundamental laws of biology. Laboratory experiments for testing hypotheses are in terms of cost, ease and speed quite expensive. Computer modeling and simulation techniques have proved useful for testing hypotheses in silico. Experiments that are infeasible in vivo, such as gene knock-outs of vital genes, can be performed in silico. Dynamic computer models are able to monitor cellular rhythm, signal transduction, cellular metabolism, changes and influences within a system. Furthermore, computer based models can suggest novel experiments. Besides the traditional modeling and simulation approach based on ordinary differential equations (ODEs), partial differential equations (PDEs) and stochastic differential equations, the approach of using Petri nets and logic based descriptions are widely used to analyze biochemical networks [Gilbert et al., 2006]. The advantage of Petri nets is the structural analysis and temporal logic. Petri nets perform sophisticated model analysis and relate predicted behavior and observations. *A colored version of the figure/chart is available at In Silico Biol. 10, 0003 , 1 February 2010.
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
189
Table 1 Comparison of Petri net-based simulation tools Feature
Tool GEPASI/ Jarnac Dbsolve EVirtual Cell COPASI CELL Cell Designer Pathway DB retrievability w w w s w w Pathways graphic editor n s s w m s Kinetic types s s s m w w Virtual cell model w w w m m w Simulation graphic display m s s m w w Mathematical model modification s s s s w w 2D and 3D plots s w w w w w 3D spatial model n n n n m n SBML compatibility m s s s s m User interface m m m m m s Listed are the major features of the compared Petri net simulation tools [Chen et al., 2009]. Notation: s = strong, m = moderate, w = weak, n = none.
Cell Illustrator s s s m s m m n m s
In view of the quorum sensing network, still many biological questions are not yet answered. The quorum sensing system and its topology and dynamics is very complex and difficult to represent with sophisticated mathematical equations. Much improvement is needed before applying mathematical equations, such as stochastic differential equations for whole cell simulations [Meng et al., 2004]. Another problem is the lack of actual laboratory data. In many cases it is not possible to formulate a mathematical system. Finally, we made use of the Petri net language to model and simulate cell-to-cell-communication processes to construct a sophisticated base for further experiments, analysis and simulation techniques. Petri net simulation tools Presently, several biological simulators exist that use the Petri net language as modeling and simulation technique. A list of well-known simulation software packages that provide biological network modeling and simulation is given at the System Biology Markup Language website (http://sbml. org/SBML Software Guide/SBML Software Summary). In view of the amount of available tools, special respect was given to the quality of the architectural design and system. Requirements such as performance, security, reliability were examined in the first term to select the most suiting software applications. For our research aims worth mentioning are software packages such as GEPASI [Mendes, 1993] and COPASI [Hoops et al. 2003], E-CELL [Tomita et al., 1999] and Cell Illustrator [Nagasaki et al., 2003] for qualitative and quantitative simulation of biological networks. Table 1 shows a comparison of the best suiting systems. From Table 1, the Petri net tool Cell Illustrator obtains very good remarks. The software application allows users to model, visualize and simulate various biological systems using the Cell System Markup Language (CSML). The software application is powerful, intuitive and easy to use. The most important advantages are the strong pathway database retrievability, the various kinetic types for modeling and the mathematical model modification. It also provides the CSML3.0 exchange format for visualizing, modeling and simulating biological pathways. Additionally, it supports the System Biology Markup Language (SBML) for dynamic simulations. For further details, see the contribution of Nagasaki et al. to this Special [Nagasaki et al., 2010]. The only disadvantage of the software solutions presented in Table 1 is their lack of access to external databases. For our research project, however, databases such as KEGG, BRENDA, ENZYME among
190
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
others had to be consulted. Additionally, we needed a strong network editor, which would be able to formulate a complete biological system with all its details based on experimental and database information. Because of the limitation of the software applications we had to develop a new software application, which we called VANESA. Modeling the quorum sensing pathway using VANESA Software solutions, which provide biological network modeling and analysis services, are in high demand among scientists. It is not surprising that many groups have contributed to the task of developing software frameworks, which are able to formulate and visualize biological systems. But each software solution focuses on a particular problem, and none is so powerful that it would be able to address all problems. To summarize the situation, different branches of software solutions with different tasks have come into existence. On the one hand software solutions exist which try to simulate real biological systems and their processes. Their main task is the representation of a biological system by a mathematical model. The mathematical language is designed for a precise description of complicated systems. Also the dynamic processes in biology and their large number of interactions and competing tendencies can make it difficult to see the whole picture at once. Another problem is the lack of actual laboratory data. However, in many cases it is not possible to formulate a mathematical system. Besides the mathematical software solutions, different network modeling frameworks for biology and medicine were developed. Those frameworks contribute to the task of creating and visualizing biological systems and networks. Their main focus lies on model management, data integration and data analysis. Those software solutions provide the possibility to build up biological network models by common graph representations. Furthermore, they make use of biomedical data sources to extract and integrate information, which is particular to the topic. The analysis by those software solutions is based on the network model and the integrated data sets. Presently it is quite difficult to formulate a sophisticated biological quorum sensing system with the existing software frameworks. The existing frameworks are not able to capture all relevant elements in the quorum sensing system. There was a need to develop a software solution that is able to represent a sophisticated quorum sensing model by an accurate representation. This was the motivation for the development of the software solution VANESA (http://vanesa.sf.net). Using VANESA we started to explore and to extend the quorum sensing network of the model organism A. salmonicida. VANESA provides new bioinformatics methods and visualization approaches to analyze dynamic interacting networks. VANESA is built upon the CardioVINEdb data warehouse [Kormeier et al., 2009], which provides some of the most important life science databases, such as UniProt, KEGG, OMIM, GO, ENZYME, BRENDA, PDB, MINT, SCOP, EMBL-Bank, and Pub-Chem [Hariharaputran et al., 2007]. The CardioVINEdb data warehouse was exclusively constructed for the EU project CardioWorkBench. Nowadays, the CardioVINEdb data is integrated into the Data Warehouse Information System for Metabolic Data (DAWIS-M.D.). DAWIS-M.D. is a platformindependent data warehouse system that integrates heterogeneous data sources into a local database and provides a comprehensible updating strategy to ensure a maximum of transparency and up-to-dateness of the integrated data. Beside the common webbased user interface (http://agbi.techfak.uni-bielefeld.de/DAWISMD/) there is a visualization component that allows interactive graphical exploration of the integrated data. However, the data from the CardioVINEdb system have been proven useful to be analyzed on a large scale and visualized in a biologically meaningful way for the quorum sensing system as well. An important aspect
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
191
of visualization is the consideration of multi-dimensional data annotations in a way suitable for the information discovery process. The mentioned data sources provide an established basis for the modeling and characterization of a biomedical system. The data integration is a powerful feature of VANESA that provides many possibilities. Furthermore, graph comparison and graph theory functions support the user in a better understanding of biological circumstances. Highlighting and comparison functions point out important facts in a set of different models. In order to make the graphical representation and the analysis on the networks more legible, graph layout transformations and animation algorithms are considered as well. Another important feature of VANESA is that information is visualized in a clear and understandable manner to meet the purposes of underlying research activities. With an intuitive graphical user interface, the user is enabled to record research results and thoughts in the form of a digital network model. The user is not limited to any kind of biological model; moreover it is possible to create an individual system that meets the requirements of each research activity. The reconstructed quorum sensing network is constructed using 11 different life science data sources (UniProt, KEGG, OMIM, GO, ENZYME, BRENDA, PDB, MINT, SCOP, EMBL-Bank, and Pub-Chem) and experimental information. The mentioned data sources provide an established basis for the modeling and the characterization of the underlying biological system. Due to the availability of information from these data sources and the use of VANESA we were able to construct and extend the proposed quorum sensing network (Fig. 5). The network creation was started with the proposed quorum sensing system represented in Fig. 1. The network was created step by step, adding all relevant biological elements. In reference to the research activities the following elements were added to the network: enzymes, proteins, receptors, transcription factors, sRNAs, small molecules and pathway maps. As a whole, the mentioned elements characterize the quorum sensing system and form the fundamentals of the network. During the development of the network all elements were put into relation with each other to demonstrate the involved biological processes. The biological elements are either connected through activation, dissociation, reaction or binding/association edges. The type of the edges is derived from database information and experimental data. After creating the network, the integrated biological databases were automatically checked for useful information. Especially the databases KEGG and BRENDA were queried for information that might be related to the network and its elements. Thanks to the BRENDA and KEGG database information it was possible to extend the network with further biological elements and to compare it to other related organisms. Additionally, the elements within the system were linked to database information. In the last step we transformed the model into the language of Petri nets to simulate cell-to-cell communication structures. In order to model a qualitative quorum sensing Petri net, the quorum sensing network modeled by VANESA was transformed into the Petri net language. For this purpose, an additional export function for the software application Cell Illustrator (http://cellillustrator.com) was implemented in VANESA. By making use of the CSML export file format version 1.9 it is possible to transform the quorum sensing network model constructed in VANESA into the Petri net language. RESULTS In the first term a first draft of the quorum sensing model was taken into account. Figure 5 shows the modeled quorum sensing network which has been used for the Petri net representation. Exporting the
192
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
Fig. 5. Reconstruction of the Aliivibrio salmonicida quorum sensing system in VANESA. Triangles represent the important elements in the system, which have a fundamental role in the quorum sensing communication processes. Circles/octahedrons represent the other fundamentals in the system. Squares represent biological processes, such as motility, biofilm formation, and others. The color is related to pathway activities, which are based on experimental data*.
model into Cell Illustrator the Petri net model in Fig. 6 was devised. However, the modeled system had to be extended by further elements and simulation techniques to sense the real biological behavior. The elements and techniques that were added to the system are described in the following. Those enzymes which synthesize the autoinducer (AI) were connected with a transition to an input connector to simulate a continuous activation of the AI and enzymes, e.g. LuxS or AinS. Some connections were revised, e.g. the connection between the two AI C-HSL and 3-oxo-C6-HSL. A further refinement of the model had to be performed, due to the fact that LuxR1 and LuxR2 are only activated as far as the AI bind to the enzymes. The gene activation was modeled with association connectors to simulate a continuous activation of biological processes such as biofilm formation, among others. To model a bacterial state change within the circular transduction, the dynamic quorum sensing behavior at high and low cell density had to be improved. Several association and inhibitory connectors were added to the system, e.g. the one from AI-2 to LuxU. Further more, several transitions can only fire by reaching a certain AI-threshold. Those transitions were connected with the AI by an association connector. On the other hand, several transitions are not able to fire at a given threshold level, e.g. the one from AI-2 to LuxU. Those transitions were connected with the AI by an inhibitory connector. *A colored version of the figure/chart is available at In Silico Biol. 10, 0003 , 1 February 2010.
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
Fig. 6. The Aliivibrio salmonicida quorum sensing system in the language of Petri nets*.
A colored version of the figure/chart is available at In Silico Biol. 10, 0003 , 1 February 2010.
193
∗
194
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
In order to simulate the quorum sensing network, initial parameters had to be assigned to the Petri net. Particular for cell-cell-communication processes, where the lack of laboratory data is a common problem, it is quite difficult to formulate a sophisticated system. For this reason, the Petri net simulation was initiated with values from written sources and partially with data from laboratory experiments. To give insight into the simulated system, it is briefly described which values are set up to model and simulate the quorum sensing system under low and high population density. The level of autoinducer (AI) outside the cell is set up to an abstract concentration of 30, the level of AI inside the cell is set up to an abstract concentration of 10. To sense the real biological characteristics and time response, each transition that fires a token to the enzymes that synthesize the AI has been assigned a time delay parameter of 3. The cellular rhythm of the organism changes once a concentration of autoinducers over 25 inside the cell is reached. Reaching an AI concentration value higher than 25, the cascade inside the cell is activated and the system performs biological processes such as bioluminescense, secretion, biofilm formation and motility. A lower concentration inhibits the aforementioned processes. Some further characteristics of the simulated system are presented in the following charts. Chart 1 presents the AI C6-HSL level inside the cell in red, the C6-HSL level outside the cell in blue, the LuxU concentration in purple and the AinR concentration in green. Because of the diffusion gradient the AI diffuse into the cell until the AI levels inside and outside the cell are balanced. The intracellular circuitry of the signal transduction is activated as soon as the AI concentration within the cell reaches the threshold for the biological state change. Once the threshold is reached, the LuxU synthesis is increased and AinR is activated.
Chart 1: AI C6-HSL level inside the cell in red, the C6-HSL level outside the cell in blue, the LuxU concentration in purple and the AinR concentration in green in the quorum sensing system. The y-axis represents the abstract concentration of the elements. The x-axis represents the simulation steps to perform a fire state within the Petri net. (Colours are visible in the online version of the article at www.iospress.nl.)
The second chart presents the AI 3-oxo-C6-HSL concentrations outside (blue) and inside the cell (red). After the diffusion process the AI level is increasing equally inside and outside the cell. Beyond the threshold level, the intracellular network switches into the “on” state. Additionally, the lux operonbioluminiscence (purple) is illustrated. The concentration level of the LuxR1 (blue)/LuxR2 (green) varies within in the system. This behavior is related to the fact that the place LitR fires a token either to LuxR1 or LuxR2 due to a stochastic distribution. The third chart presents the different gene activation mechanisms by LuxR1/LuxR2 and LitR. The secretion, the mobility, the biofilm formation and other operons (in blue) rise equally as soon as the LitR place is active. The concentration of the lux operon-bioluminiscence (in red) is lower than the
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
195
Chart 2: AI 3-oxo-C6-HSL outside the cell in blue, inside the cell in red. lux operon-bioluminiscence (purple), LuxR1 (blue) and LuxR2 (green) in the quorum sensing system. The y-axis represents the abstract concentration of the elements. The x-axis represents the simulation steps to perform a fire state within the Petri net. (Colours are visible in the online version of the article at www.iospress.nl.)
Chart 3: The different gene activation mechanisms by LuxR1/LuxR2 and LitR. The secretion, the mobility, the biofilm formation and other operons in blue. The concentration of the lux operon-bioluminiscence in red. (Colours are visible in the online version of the article at www.iospress.nl.)
one of LuxR1/LuxR2 within the system. This behavior senses the biological characteristics. The lux operon-bioluminiscence expression can only increase as long as the LuxR1 and LuxR2 are both activated. Based on the aforementioned Petri net model, we used a reduced Petri net representation due to the visibility of the networks (Fig. 7). The resulting Petri net contains the fundamental elements of the quorum sensing system and can be used for modeling a single cell or an entire population of cells. In the following steps, this Petri net as a combined network serves as a basis for the cell-to-cell communication processes within bacteria colonies. For simulation processes we made use of the detailed Petri net model indicated in Fig. 6. Based on the developed Petri net for the quorum sensing network, we are attempting to find the most suitable Petri net representations for the modeling of cell-to-cell communication processes. The major idea behind our work is to combine existing fundamental Petri net concepts. Therefore, we defined an elementary cell representation, which is presented in Fig. 7. This model is used for the demonstration of the communication processes, but is replaced by the detailed Petri net indicated in Fig. 6 for simulation purposes. The simplest connection of cells and its environment is without respect to space. In the following model all cells are connected to a global place that represents the autoinducer in the environment and is affected by diffusion as shown in Fig. 8. In this model, the concentration of produced autoinducer
196
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
Fig. 7. A tiny Petri net model for a bacteria cell performing quorum sensing (AI, autoinducer)*.
is determined by the number of bacteria but can be balanced by an adjusting parameterization of the transitions. Figure 9 represents a more detailed view on the communication system within a colony. All relevant Petri net structures and processes are included and simulated. In vivo single bacteria in a population do not always show the same behavior at the same face of the development. If the space is a constraint of interest the cells can be arranged in a grid. In nature, the bacteria are seemingly not well-arranged but appear scattered in space. However, in the model they can be arranged in hexagonal, square or other grid forms. In the following examples a square grid is shown. The grid can even have a third dimension (z -dimension), to model and simulate space in nature. A three-dimensional grid is the intuitive approach to model a bacteria population and it makes the cell-to-cell communication more realistic. Diffusion can be modeled with simple transitions without respect to the place of transitions, because diffusion mechanism takes place everywhere in the same manner. A disadvantage is that the model cannot be displayed clearly on two-dimensional screens and the simulation needs much more computing power. An advantage of a one-dimensional Petri net concatenation like that in Fig. 10 is that the net is small and easy to simulate. It is more complicated to simulate the diffusion if behavior in three-dimensional space is simulated because a bacterial population is denser at the center. A two-dimensional model is able to capitalize on the advantages of a one-dimensional and the threedimensional model. It can be displayed on a two-dimensional screen and utilizes acceptable computational power. A part of such Petri net is shown in Fig. 11. Figure 12 demonstrates an overview of a complete quorum sensing system consisting of 4 cells. An important factor of quorum sensing is the size of a population. To switch from a concentration below the threshold to a concentration above the threshold, the population has to grow. The growth of a bacteria population can be modeled by adding ‘mass’ and ‘alive’ places as indicated in Fig. 13. The possible bacteria positions are created by the Petri subnets for the cells. A bacterium only exists *A colored version of the figure/chart is available at In Silico Biol. 10, 0003 , 1 February 2010.
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
197
Fig. 8. Some Petri nets are connected to a global place representing autoinducer (AI) in the environment*.
at this position if the ‘alive’ place contains tokens. The example shows a one-dimensional array, but three-dimensional arrays are modeled in the same manner. Further modeling methods for this kind of growth are mentioned in [Gronewold and Sonnenschein, 1998]. Figure 14 indicates a more detailed view in respect to cell growth and cell division. Once a new cell has formed, the production of autoinducers is activated as a consequence of increasing cell density.
DISCUSSION At present, the exploring, modeling and simulation of signaling networks is one of the most important areas of bioinformatics and systems biology. Signal transduction, cellular rhythms and cell-to-cell communication are three major subjects which have been intensely studied over the last decade. The area of dynamic system modeling is increasingly being applied to biochemical problems like the aforementioned *A colored version of the figure/chart is available at In Silico Biol. 10, 0003 , 1 February 2010.
198
∗
A colored version of the figure/chart is available at In Silico Biol. 10, 0003 , 1 February 2010.
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
Fig. 9. Detailed communication processes within a cell colony. Illustrated are two cells exchanging autoinducer (AI) and performing bacterial behavior. Colored edges emphasize the elements involved in the cell communication. The Petri nets are in relation to all known and relevant autoinducers (AI) to perform quorum sensing processes*.
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
199
Fig. 10. A one-dimensional array of sub Petri nets simulating the quorum sensing system*.
Fig. 11. A two-dimensional grid of sub Petri nets simulating the quorum sensing system*.
*A colored version of the figure/chart is available at In Silico Biol. 10, 0003 , 1 February 2010.
200
∗
A colored version of the figure/chart is available at In Silico Biol. 10, 0003 , 1 February 2010.
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
Fig. 12. An overview of a complete quorum sensing system based on the Petri net language consisting of 4 cells*.
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
201
Fig. 13. A one-dimensional array of sub Petri nets for including growth in the model*.
and a central motif for the future. Dynamic models provide a powerful framework for hypothesis generation and testing, and the identification of inconsistencies in a model. In general, dynamic modeling permits a range of analytic techniques that give insight into system level features that emerge from the models elementary interactions. This field of studies is fascinating and challenging. Regarding signaling networks, it is a great undertaking not only due to the non-linear network topology. Additionally, the different types of interactions that the elements of the biochemical networks can undergo are of particular interest and importance: protein associations, enzymatic catalysis and reversible or irreversible protein modification, to name only the most common types. For the development of systems biology and synthetic biology the understanding of cell-to-cell communication processes is of importance. To understand how this signaling system coordinates individuals in a cell population is fundamental for understanding molecular mechanisms in general. This understanding is also essential in designing and building complex synthetic biological systems. Therefore, we developed and implemented a new information system, VANESA (http://vanesa.sf.net), that allows the modeling and editing of pathways and networks. The major idea of this system is to offer a powerful and easy-to-use molecular network editor to members of the laboratory. Therefore the biological partner could edit the project data/knowledge including information from written sources. Furthermore the biological partner could use the extended function of VANESA to realize the semiautomatic lab-validated extension of this network. To simulate cell-to-cell communication processes, an automatic translation of the VANESA network into the language of the hybrid Petri nets for the software application Cell Illustrator (http://cellillustrator.com/home) was implemented. *A colored version of the figure/chart is available at In Silico Biol. 10, 0003 , 1 February 2010.
202
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
Fig. 14. A one-dimensional array of sub Petri nets for including growth and cell division in the model*.
Based on the work presented in this paper, the use of the Petri net models for the analysis of cellto-cell communication processes was demonstrated. Besides the traditional modeling and simulation approach based on ordinary differential equations (ODEs), the approach of using Petri nets and logic based descriptions becomes more and more important. For cell-to-cell-communication processes, where the lack of laboratory data is a common problem, it is particularly difficult to formulate a sophisticated mathematical system. We showed that the cell-to-cell communication process and information flow within a cell and across cell colonies can be modeled and simulated with Petri net network structures. The models are examples of how the behavior of cells modeled by a Petri net can be simulated. Ideas for laboratory experiments can be gained and tested by changing the basic Petri net. One approach is a theoretical knockout experiment. For this task knocked out genes are modeled by deleting the corresponding transitions. The changed Petri nets can again be combined as mentioned above and the simulation results can be compared. A change of gene sequences influencing the catalytic efficiency can be modeled by changing speed parameters of the corresponding transition. Factors outside the cells that influence the signal molecules, the diffusion speed of the medium, the availability of nutrients or even processes inside the cell can also be modeled. In conclusion, the approach of using Petri nets for cell-to-cell-communication processes gives immense possibilities for the future. It is a sophisticated modeling and simulation technique to represent signal transduction, cellular rhythms and cell-to-cell communication. ACKNOWLEDGEMENTS The work is partially supported by the EU project “CardioWorkBench” (http://www.cardioworkbench. eu/) and the BMBF Project (CHN 08/001). Ming Chen would like to thank the DAAD fellowship and the NSFC for related project financial support. *A colored version of the figure/chart is available at In Silico Biol. 10, 0003 , 1 February 2010.
S. Janowski et al. / Modeling of Cell-to-Cell Communication Processes with Petri Nets
203
REFERENCES • Alla, H. and David, R. (1998). Continuous and hybrid Petri nets. Journal of Circuits, Systems, and Computers 8, 159-188. • Antunes, L. C. M., Schaefer, A. L., Ferreira, R. B. R., Qin, N., Stevens, A. M., Ruby, E. G. and Greenberg, E.P. (2007). The transcriptome analysis of the Vibrio fischeri LucR-LuxI regulon. J. Bacteriol. 189, 8387-8391. • Chen, M., Hariharaputran, S. Hofest¨adt, R., Kormeier, B. and Spangardt, S. (2009). Petri net models for the semi-automatic construction of large scale biological networks. Springer Science and Business. Natural Computing, Article 9151. DOI: 10.1007/s11047-009-9151-y. • Enger, Ø., Husevåg, B. and Goksøyr, J. (1989). Presence of the fish pathogen Vibrio salmonicida in fish farm sediments. Appl. Environ. Microbiol. 55, 2815-2818. • Gilbert, D., Fuß, H., Xu, G., Orton, R., Robinson,S., Vyshemirsky, V., Kurth, M. J., Downes, C. S. and Dubitzky, W. (2006). Computational methodologies for modelling, analysis and simulation of signalling networks. Brief. Bioinform. 7, 339-353. • Goryachev, A. B., Toh, D. and Lee, T. (2006). Systems analysis of a quorum sensing network: Design constraints imposed by the functional requirements, network topology and kinetic constants. Biosystems 83, 178-187. • Gronewold, A. and Sonnenschein, A. (1998). Event-based modeling of ecological systems with asynchronous cellular automata. Ecol. Modell. 108, 37-52. • Hariharaputran, S., T¨opel, T., Brockschmidt, B. and Hofest¨adt, R. (2007). VINEDdb: a data warehouse for integration and interactive exploration of life science data. Journal of Integrative Bioinformatics 4, 63. • Hjerde, E., Lorentzen, M. S., Holden, M. T. G., Seeger, K., Paulsen, S., Bason, N., Churcher, C., Harris, D., Norbertczak, H., Quail, M. A., Sanders, S., Thurston, S., Parkhill, J., Willassen, N. P. and Thomson, N. R. (2008). The genome sequence of the fish pathogen Aliivibrio salmonicida strain LFI1238 shows extensive evidence of gene decay. BMC Genomics 9, 616. • Hofest¨adt, R. (1994). A Petri net application to model metabolic processes. SAMS 16, 113-122. • Hofest¨adt, R. and Thelen, S. (1998). Quantitative modeling of biochemical networks. In Silico Biol. 1, 0006. • Hoops, S., Sahle, S., Gauges, R., Lee, C., Pahle, J., Simus, N., Singhal, M., Xu, L., Mendes, P. and Kummer, U. (2006). Copasi – a COmplex PAthway SImulator. Bioinformatics 22, 3067-3074. • Kormeier, B., Hippe, K., To¨ pel, T. and Hofest¨adt, R. (2009). Cardiovinedb: A data warehouse approach for integration of life science data in cardiovascular diseases. Im Fokus des Lebens. Beitr¨age der 39. Jahrestagung der Gesellschaft f¨ur Informatik e.V. (GI) 40, 704-708. • Matsuno, H., Murakami, R., Yamane, R., Yamasaki, N., Fujita, S., Yoshimori, H. and Miyano, S. (2003). Boundary formation by notch signaling in Drosophila multicellular systems: experimental observations and a gene network modeling by Genomic Object Net. Pac. Symp. Biocomput. 8, 152-163. • Mendes, P. (1993). Gepasi: a software package for modeling the dynamics, steady states and control of biochemical and other systems. Comput. Appl. Biosci. 9, 563-571. • Meng, T. C., Somani, S. and Dhar, P. (2004). Modeling and Simulation of biological systems with stochasticity. In Silico Biol. 4, 0024. • M¨uller, J., Kuttler, C., Hense, B. A., Rothballer, M. and Hartmann, A. (2006). cell-to-cell communication by quorum sensing and dimension-reduction. J. Math. Biol. 53, 672-702. • Nagasaki, M., Doi, A., Matsuno, H. and Miyano, S. (2004). Integrating biopathway databases for large-scale modeling and simulation. In: Second Asia-Pacific Bioinformatics Conference (APBC 2004), Chen, Y.-P. P. (ed.), Volume 29 of Conferences in Research and Practice in Information Technology, Australia Computer Society, vol. 29, pp. 43-52. • Nagasaki, M., Saito, A., Jeong, E., Li, C., Kojima, K., Ikeda, E. and Miyano, S. (2004). Cell Illustrator 4.0: A computational platform for systems biology. In Silico Biol. 10, 0002. • Petri, C. A. and Reisig, W. (2008). Petri net. Scholarpedia 3, 6477. • Reddy, V. N., Mavrovouniotis, M. L. and Liebman, M. N. (1993). Petri net representation in metabolic pathways. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1, 328-336. • Reisig, W. A. (1982). A Primer in Petri Net Design. Springer, Berlin. • Schauder, S. and Bassler, B. L. (2001). The languages of bacteria. Genes Dev. 15, 1468-1480. • Schrøder, M. B., Espelid, S. and Jørgensen, T. Ø. (1992). Two serotypes of Vibrio salmonicida isolated from diseased cod (Gadus morhua L.); virulence, immunological studies and vaccination experiments. Fish Shellfish Immunol. 2, 211-221. • Tomita, M., Hashimoto, K., Takahashi, K., Shimizu, T. S., Matsuzaki, Y., Miyoshi, F., Saito, K., Tanida, S., Yugi, K., Venter, J. C. and Hutchison, C. A. III (1999). E-CELL: software environment for whole cell simulation. Bioinformatics 15, 72-84. • Totland, G. K., Nylund, A. and Holm, K. O. (1988) An ultrastructural study of morphological changes in Atlantic salmon, Salmo salar L., during the development of cold water vibriosis. J. Fish Dis. 11, 1-13. • Valk, R. (1978). Self-modifying nets: a natural extension of Petri nets. Lecture Notes in Computer Science 62, 464-476. • Visick, K. L. (2005). Layers of signaling in a bacterium-host association. J. Bacteriol. 187, 3603-3606.
204
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2010, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-204
On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways by Introducing Stochastic Decision Rules Yoshimasa Miwaa,1, Chen Lib,1 , Qi-Wei Gec , Hiroshi Matsunoa,∗ and Satoru Miyanob a
Graduate School of Science and Engineering, Yamaguchi University, Yamaguchi, Japan Human Genome Center, Institute of Medical Science, University of Tokyo, Minato-ku, Tokyo, Japan c Faculty of Education, Yamaguchi University, Yamaguchi, Japan b
ABSTRACT: Parameter determination is important in modeling and simulating biological pathways including signaling pathways. Parameters are determined according to biological facts obtained from biological experiments and scientific publications. However, such reliable data describing detailed reactions are not reported in most cases. This prompted us to develop a general methodology of determining the parameters of a model in the case of that no information of the underlying biological facts is provided. In this study, we use the Petri net approach for modeling signaling pathways, and propose a method to determine firing delay times of transitions for Petri net models of signaling pathways by introducing stochastic decision rules. Petri net technology provides a powerful approach to modeling and simulating various concurrent systems, and recently have been widely accepted as a description method for biological pathways. Our method enables to determine the range of firing delay time which realizes smooth token flows in the Petri net model of a signaling pathway. The availability of this method has been confirmed by the results of an application to the interleukin-1 induced signaling pathway. KEYWORDS: Petri net, interleukin-1 (IL-1) signaling pathway, firing delay time, stochastic decision rule, conflict resolution
INTRODUCTION Petri net is a formal description for modeling concurrent systems [1]. Petri nets have recently been widely accepted as a description method for biological pathways such as gene regulation networks, metabolic pathways and signaling pathways by researchers in computer science as well as those in biochemistry [2]. Various types of Petri nets (e.g. stochastic Petri nets [3,4], hybrid Petri nets [5,6], colored Petri nets [7]) have been applied to study biological pathways in both quantitative and qualitative 1
These authors equally contributed to this work. Corresponding author: Hiroshi Matsuno, Graduate School of Science and Engineering, Yamaguchi University, Yoshida 1677-1, Yamaguchi, 753–8512, Japan. E-mail: [email protected]. ∗
Y. Miwa et al. / On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways
205
approaches because of potential advantages of Petri nets providing an intuitive graphical representation and capabilities for mathematical analysis. Signaling pathways regulate elaborated cell communication mechanisms by controlling various alteration procedures of cell behavior such as cell growth, survival, proliferation, and apoptosis. By such cellular communication mechanisms, cell activities could be subtly governed and maintained in a good condition along with other biochemical interactions and processes. Recently, Li et al. [8] proposed a qualitative modeling method of signaling pathways by paying attention to the molecular interactions and mechanisms using discrete Petri nets. Further, they proposed a timed Petri net based method of determining the firing delay time (or delay time for short) of transitions, i.e., the time each transition takes to fire [9]. Their method to determine delay times in a timed Petri net is based on the assumption that, for any reaction, a total amount of consumed substrates is equal to a total amount of products of the reaction. Although this assumption ensures the concentration equilibrium of the reaction, their method can only produce “exact delay time” for any transition in the Petri net model, i.e., no time range is allowed for any transition in determining delay time. This means that their method eliminates the variety of possible reaction speeds, which commonly exists in an ordinary signaling pathway. Furthermore, their method is designed based on the strategy that in the conflict situation at a certain place, “the same firing frequencies” should be assigned to the transitions going out from that place. This strategy should be improved, since it does not reflect real reactions in a cell. This paper proposes a new method to determine delay times in a signaling pathway, which resolves above two problems, “exact delay time” and “the same firing frequency,” while keeping smooth signal flows of a signaling pathway. A new concept “retention-free” is introduced to the Petri net. In a retention-free Petri net, the total token amount flowed out of a place per time unit should not be less than the one that flowed into the place. This new concept allows us to resolve the former problem, providing a delay time in some range to any of the transitions. On the other hand, the latter problem is resolved by introducing stochastic factors into transitions coming from a certain place where a conflict happens. These stochastic factors reflect the reaction rates of corresponding biological reactions, making it possible to realize more realistic signal flows of a signaling pathway. The organization of this paper is as follows. After giving necessary definitions of timed Petri nets for the discussion of this paper, a Petri net modeling method of a signaling pathway is introduced with the example of the IL-1 signaling pathway according to the method proposed by Li et al. [8]. We then present formulas in two cases, conflict-free transitions and conflict transitions, which express the conditions for smooth token flows in a timed Petri net, i.e., smooth signal transductions in a signaling pathway. Based on these formulas, we propose an algorithm to determine the delay time of transitions in a timed Petri net model of a signaling pathway. The availability of this algorithm was confirmed by the results of applying this algorithm to the timed Petri model of the IL-1 signaling pathway. DEFINITIONS OF PETRI NETS Petri net technology provides a powerful approach to modeling and simulating various concurrent systems [1], which has been widely employed as a description method for biologists and computational biologists owing to following advantages [10]: 1. “firm mathematical foundation” enabling formal and clear description of biological pathways as well as their structural analysis, and 2. “visual representation of networks” which provides intuitive understanding of biological pathways without any mathematical descriptions that are basically difficult for ordinary biologists.
206
Y. Miwa et al. / On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways
Fig. 1. Basic elements of Petri nets.
Fig. 2. Examples illustrating source, sink and synchronous transitions.
We here briefly give the necessary definitions of Petri net and its extension (i.e., timed Petri net) used in the paper. For detailed definitions the reader is referred to [1]. BASIC DEFINITIONS OF PETRI NET A Petri net is comprised of three types of elements, places, transitions and arcs, whose symbols are illustrated in Fig. 1. Definition 1: A Petri net is denoted as a 5-tuple PN = (T , P , E , α, β ) that is a bipartite graph, where E = E+ ∪ E−, T : a set of transitions {t1 , t2 , . . ., t|T | } P : a set of places {p1 , p2 , . . ., p|T | } E + : a set of arcs from transitions to places e = (t, p) E − : a set of arcs from places to transitions e = (p, t) α: α(e) is the weight of arc e = (p, t) β : β (e) is the weight of arc e = (t, p) Definition 2: Let PN be a Petri net. 1. • t (or t• ) is a set of input (or output) places of t, and • p (or p• ) is a set of input (or output) transitions of p. 2. A transition without input arc (Fig. 2) is called source transition. Let T sour = {tsour , . . ., tsour } a 1 (a 1) be the set of such source transitions. A source transition is always firable. 3. A transition without output arc (Fig. 2) is called sink transition. The set of such sink transitions is , . . ., tsink } (b 1). denoted by Tsink = {tsink 1 b 4. A transition connected with two or more input arcs (Fig. 2) is called synchronous transition, and is defined by Tsync = {tsync , . . ., tsync } (c 1). c 1
Y. Miwa et al. / On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways
207
Fig. 3. Examples showing firing rules of Petri nets.
Fig. 4. Lower two output transitions are in conflict. The place whose output transitions are in conflict is called a conflict place.
A place can hold a positive integer that represents the number of tokens. An assignment of tokens in places expressed in form of a vector is called marking M , which varies during execution of a Petri net. Firing rule of Petri net PN: A transition t is firable if each input place pI of PN has at least αe (e = (pI , t)) tokens, where αe denotes the weight of an arc e = (p I , t). Firing of a transition t removes αe tokens from each input place p I of t and deposits β e (e = (t, p0 )) tokens to each output place p O of t. Figure 3 shows the movement of tokens by the firing of the transition. Conflict-free Petri net A conflict (see Fig. 4) corresponds to the condition of a place that has at least two output transitions and has an insufficient number of tokens for firing. Deterministic resolutions are considered to decide the firings of the transitions. Usually, the concepts of priority-order and probability are employed when dealing with timed Petri nets, stochastic Petri nets and high-level Petri nets [11,12]. A Petri net is called a conflict-free Petri net if no such conflict exists, which is a subclass of Petri nets. Timed Petri net Petri net can be extended by assigning a delay time of firing to each transition for facilitating a system-level understanding by means of simulation. Such an extended Petri net is called a timed Petri net.
208
Y. Miwa et al. / On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways
Definition 3: Let PN be a Petri net. A timed Petri net TPN is defined by TPN = (PN, D ), where D is a set of firing delay times of each transition in T . The firing rule of a timed Petri net TPN is defined as follows: (i) If the firing of a transition t i is decided, tokens required for the firing are reserved. We call these tokens reserved tokens. (ii) When the delay time di of ti passed, t i fires to remove the reserved tokens from the input places of t i and put non-reserved tokens into the output places of t i . In a timed Petri net, firing times of a transition ti per time unit is called firing frequency f i . f i represents the maximum firing frequency of t i . The delay time di of ti is given as the reciprocal of f i . Retention-free Petri net Signaling pathways are composed of consecutive signaling events that are mediated by intracellular signaling proteins (usually enzymes) that relay the signal into the cell by activating the next enzyme from inactive state to active state on receipt of up-stream signal. With this feature, it can be considered that the signaling pathway will be in an abnormal state if the accumulation of the substances occurs per time unit. In Petri nets, such accumulation of a substance is represented by the token retention of the corresponding place. The token number of the place grows infinitely with the firing of its input transitions, and the lack of firing by the output transition. In this paper, such a Petri net in which signal flows can be steadily and smoothly propagated without the token retention at any place is called a retention-free Petri net. Retention-free Petri nets are a subclass of timed Petri nets, where the total amount of in-flowing tokens m p ) for each ( i=1 KIpi ) is not larger than the possible maximum number of out-flowing tokens ( nj=1 KO j place p per time unit, which is represented by following inequality: m
KIpi
i=1
n
p KO j
(1)
j=1
MODELING OF SIGNALING PATHWAYS We give here a modeling method with a series of modeling rules for signaling pathways based on Petri net representation. As an example, the interleukin-1 (IL-1) signaling pathway is used to demonstrate our modeling method. Petri net based modeling of signaling pathways provides us with an intuitive understanding of the intrinsic structure and features of signaling pathways, and further enables computational experiments on the constructed Petri net model as being demonstrated in the following sections. Modeling rules The structural characteristics of signaling pathways can be naturally and explicitly expressed by Petri nets according to following rules. 1. Places denote static elements including chemical compounds, conditions, states, substances and cellular organelles participating in the biological pathways. Tokens indicate the presence of these elements. The number of tokens is given to represent the amount of chemical substances. 2. Transitions denote active elements including chemical reactions, events, actions, conversions and catalyzed reactions.
Y. Miwa et al. / On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways
209
3. Directed arcs connecting the places and the transitions represent the relations between corresponding static elements and active elements. Arc weights α and β describe the quantities of substances required before and after a reaction, respectively. In case of modeling a chemical reaction, arc weights represent quantities given by stoichiometric equations of the reaction itself. Note that the weight of an arc is omitted if the weight is 1. Signaling pathways are information cascades of enzyme reactions from transmembrane receptors to the nucleus DNA, which ultimately regulate intracellular responses such as programmed cellular proliferation, gene expression, differentiation, secretion and apoptosis. For signaling pathways, besides the catalytic reactions, the information among the molecular interactions such as complex formation, gathering action, translocation and channel switching need to be modeled according to different types of interactions as long as the biological facts are known. Figure 5 shows various molecular interactions of signaling pathways and their correspondence to the Petri net models. We give explanations about the molecular interactions that appear in the IL-1 signaling pathway, which is used below to demonstrate our method. I. A binding reaction induces the formation of homo- or heterodimers and generates a complex compound. This block shows the ligand-receptor binding interaction and the corresponding Petri net model that indicates that the transition cannot fire in the absence of ligand although receptors exist. The number of input places of transitions is two or more while the output place number is one in the binding reaction. Obviously, we can also expand the conception of association to the model represented in block I (b), generally representing the simultaneous binding of substrates S1 , . . ., Sn (n 1) forming a complex C in biological systems. II. Phosphorylation is a reaction to add a phosphate (PO 2− 3 ) group to a protein or a small molecule, and dephosphorylation that is the backward reaction removing phosphate groups from a compound by hydrolysis. III. Autophosphorylation is a transphosphorylation reaction between receptor subunits frequently following the binding of a ligand to a receptor with intrinsic protein kinase activity. IV. Each down-regulated pathway transmits the signals to regulate different down-regulated reactions according to the modification positions on the ligand-receptor complex. The complex often has many chemical modifications, e.g., phosphorylation and acetylation. Few methods using Petri nets have been proposed to model this process by using one place (i.e., ligand-receptor complex) possessing more than one output transitions used to represent the multiple reactions of modification. These methods are easily understood, but raise an issue that, if the transition of such a place fires to remove the token(s) from a shared input place at one time, it will disable all the other transitions of this place simultaneously although the token(s) will return back to the same input place via a self-loop. To deal with it, we model each distinct active site (modification position) on the complex as an individual place C i (1 < i < n) as shown in block IV. V. Adaptor protein (e.g., Grb2 and SHC1)-mediated association reactions are different from the binding reaction (see I). The main participator adaptor protein is an accessory protein to main proteins. These proteins lack the intrinsic enzymatic activities themselves but instead mediate specific protein-protein interactions driving the formation of protein complexes. VI. Chemical reactions are the most common reactions in signaling pathways, for which the conversion of substances to products is ordinarily modeled as connecting input places to output places, both belonging to the same transition.
210
Y. Miwa et al. / On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways
Fig. 5. Petri net models of various reaction types in signaling pathways.
VII. Homodimerization is a polymerization reaction of two identical substances to shape a dimer similar to the association reaction. A substance is modeled as an input place connected with a 2-weighted arc. It is easy to expand the conception to model the formation of a multimer
Y. Miwa et al. / On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways
VIII.
IX.
X.
XI.
XII. XIII.
211
holding n-weight such as homotrimers or -tetramers. Translocation refers to the movement of molecules, substances or ions across cell membranes or via the bloodstream in biology. Figure 5 shows the nuclear translocation within a cell. A transition is modeled to indicate the movement action of substances before and after. Intracellular signal pathways are largely carried out by second messenger molecules. Ca 2+ acts as a second messenger molecule inside the cell. Usually the concentration of free Ca 2+ within the cell is very low; it is stored inside of organelles, mostly the endoplasmic reticulum. In order to become active, Ca 2+ has to be released from the organelles into the cytosol. Two transitions to and tc are introduced to denote channel activity of “open” and “close”, respectively. t o is enabled when input place holds up token(s) after the association of organelles and substances, whereas tc is enabled as long as some stop mechanisms shut off the channel. This is the opposite of I. Dissociation process is a general process in which complexes and molecules separate or split into smaller molecules or ions. The number of input places of transitions is one while the output place number is two or more. Since an enzyme itself plays a role as catalyst in biological pathways and there occurs no consumption in biochemical reactions, the reaction is modeled as a transition, where the enzyme is modeled as an enzyme place that has a self-loop with the same arc-weight. That is, once an enzyme place is occupied by a token, the token will return to the place again to keep the firable state, if the transition has fired. A source transition represents an activity to provide substances that will take part in the reactions. A sink transition denotes small and natural degradation of a substance. Internalization is a phenomenon to decrease the number of receptors on the surface of a cell membrane, due to be exposed to corresponding biological agents such as ligands for a long time. As a result of decreasing the number of receptors, responsiveness to the ligands is decreased. The internalization is modeled by an output transition connected with a place denoting the receptor via a normal arc.
IL-1 induced signaling pathway In this paper, we use the example of the IL-1 induced signaling pathway (see Fig. 6) to demonstrate our proposed method. IL-1 is a proinflammatory cytokine, and plays an important role in regulating the mechanism of inflammation. The IL-1 signaling pathway is composed of the NF-κB (nuclear transcription factor-κB) pathway and the MAPK (mitogen activated protein kinase) pathway [13]. We construct the IL-1 signaling pathway by using our modeling method. Figure 7 illustrates the Petri net model of the IL-1 signaling pathway. DETERMINING FIRING DELAY TIME OF TRANSITIONS As shown in Fig. 7, the Petri net model without firing delay time of transitions is constructed. This model describes the structural information of connection relationships. The next task is to validate the model by means of simulation, i.e., to investigate whether the constructed model is consistent with biological facts obtained from biological experiments and scientific publications. Unfortunately, such reliable data describing detailed reactions are not reported in most cases. It thus leads us to develop a general methodology to determine the delay time of the transitions representing reaction rates.
212
Y. Miwa et al. / On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways
Fig. 6. Biological diagram of the IL-1 induced signaling pathway. The detailed biological interpretation and corresponding Petri net models of the IL-1 signaling pathway are demonstrated by means of interactive-animation, and are freely available at [14]. Full legend of the symbols used in this diagram is given at the website of [15].
Y. Miwa et al. / On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways
Fig. 7. Petri net model of IL-1 induced signaling pathway of Fig. 6.
213
214
Y. Miwa et al. / On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways
Fig. 8. Schematization of a conflict-free Petri net model where the number of input transitions of each place is one.
Here we propose a new method of dealing with the delay time of transitions for a subclass of Petri nets, retention-free Petri nets, under two conditions. These are the conflict-free condition and the normal one including conflicts. We introduce a stochastic approach to determine the firings of transitions in conflict. Note that inhibitory arcs and cyclic structures (e.g., feedback loop, self-loop) are not taken into account in this contribution. That is, the Petri net dealt with here is an acyclic one without inhibitory arcs. As defined above, in a times Petri net, the delay time d i of ti is the reciprocal of the maximum firing frequency f i , and it is obvious that the token amount flowed-in per time unit is no more than that flowed-out with f i . We thus decide di by calculating f i . Strategy for determining delay time in the case of conflict-free In a conflict-free Petri net, each place may have one or more input arcs but at most one output arc. Without loss of generality, we consider two cases: (1) each transition has exact one input and one output arc and they construct a path from a source transition to a sink transition, as shown in Fig. 8; (2) There are l paths as in case (1) and these paths merge into a place p that is connected to a sink transition t, as shown in Fig. 9. Delay times of transitions of case (1) pi−1 To keep retention-free behavior, each transition t i on the path of Fig. 8 must satisfy K i−1 fi · αi , pi−1 is the token amount flowed into place p i−1 per time unit and f i is the maximum firing where Ki−1 pi−1 frequency of ti . It is obvious that Ki−1 = fi−1 · βi−1 is determined by firing frequency f i−1 (not fi−1 ) that is further determined by the firing frequencies of previous transitions. Since the source transition t 1 can fire f 1 times per time unit, K1p1 = f1 · β1 that enables t2 to fire f2 = K1p1 /α2 times per time unit. pi−1 pi−2 = fi−1 · βi−1 = (Ki−2 /αi−1 )βi−1 . Resultantly In this way, K2p2 = f2 · β2 = (K1p1 /α2 )β2 , . . . , Ki−1 the following condition related to the maximum firing frequency of t i can be obtained: f1 ·
β1 . . . βi−1 f i · αi α2 . . . αi−1
(2)
where f1 is the maximum firing frequency of the source transition t 1 , fi is the maximum firing frequency of ti that is an output transition of place p i−1 , β j (j = 1, . . ., i−1) and αj (j = 2, . . ., i) are the weights of arcs connected from the input transitions and to the out transition as shown in Fig. 8.
Y. Miwa et al. / On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways
215
Fig. 9. Schematization of conflict-free Petri net model where a place (p) possessing multiple input transitions (tk1 , tk2 , . . . , tkl ) is included.
Delay times of transitions of case (2) For the transitions on each path of Fig. 9, the delay times can be determined according to inequality (2). And the token amount flowed into place p can be computed as discussed above and the maximum firing frequency of t can be determined according to the following inequality: f1 ·
βl . . . βkl β1 . . . βk1 β2 . . . βk2 + f2 · + . . . + fl · f ·α αl+1 . . . αk1 αl+2 . . . αk2 α2l . . . αkl
(3)
The left-hand of inequality (3) is designed to calculate total token amount flowed into p from its multiple input transitions per time unit. Strategy for determining delay time in the case of conflict Here, we propose a new firing rule of a timed Petri net by introducing a stochastic approach to determine firings for a series of transitions in conflict. Suppose a place p possesses output transitions, t1 , t2 , . . . tk then the firing rule is as follows. New firing rule of timed Petri nets TPN *: 1. Each unreserved token deposited to input place p is assigned to be reserved by the transition t i that satisfies the following mathematical expression: ⎫ ⎧ k ⎬ ⎨ n nj i − si min (4) ⎭ ⎩ αi αj j=1
2. When the number of reserved tokens of t i is not less than the required token number for the firing, the firing of ti is decided. 3. After the delay time di of ti passed, t i fires to remove the reserved tokens from the input place of ti and deposit unreserved tokens into the output places of t i . In the above expression (4), α i is the arc weight of e(p, ti ); si is the firing probability of transition ti , which represents the proportion of the firing frequency of each transition in the total firing frequency of the transitions in conflict. As shown in Fig. 10, s i is assigned to corresponding transition t i , which is
216
Y. Miwa et al. / On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways
Fig. 10. An illustration to show the transitions in conflict and their parameters.
Fig. 11. An example illustrating conflict state.
given as a constant in advance according to the biological facts provided in the published literature. n i is used to record the number of tokens that t i has reserved so far; αnii represents the total firing number of transition ti from the beginning. Expression (4) is designed to reserve the token to such a transition n ti that has the largest difference between calculated firing probabilities αnii / kj=1 αjj and given firing probability si among all the transitions in conflict. Delay times of transitions in conflict We consider the determination of delay times for the transitions in conflict as shown in Fig. 11. The maximum firing frequency f Oj must satisfy the following inequality according to the above firing rule: m sj · αj · fIi · βi f Oj · αj n sk · αk i=1
(5)
k=1
where αj and β i are the weights of e(pi , tOj ) and e(tIi , pi ), respectively; sj is the firing probability of tOj ; fIi is the firing frequency and f Oj is the maximum firing frequency of t Ii and Oj , respectively, as shown in Fig. 11. The left-hand mof expression (5) derives the token amount reserved to the transition t Oj from the total token amount i=1 fIi · βi coming from each input transition connected to the place p i , while the right-hand represents the possible token amount from p i to its output transition tOj per time s ·α unit. n j sj ·α represents the ratio of the token amount deposited to t Oj to the total token amount from k=1 k
k
pi to each transition per time unit.
Y. Miwa et al. / On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways
217
Fig. 12. Schematization of the structure of synchronous transition tsync . o
A synchronous transition t sync connected from two or more input places becomes firable if all the i input places have a sufficient amount of tokens satisfying the firing condition. Due to the feature of becomes retention-free Petri net, it is necessary to synchronize the timing that each input place of t sync i firable. It leads us to consider the condition to determine the delay time for the synchronous transitions in order to avoid token-retention at multiple input places merged to the same synchronous transition. The as shown Fig. 12, the maximum firing frequency condition is, if there is a synchronous transition t sync o satisfies the following equation: 1 β1.11 . . . β1.k1 β1.2 . . . β1.k2 β1.I1 . . . β1 .kl1 · f1.1 · + f1.2 · + . . . + f1.I · α1.I1 +1 . . . α1 .k1 α1.I1 +2 . . . α1 .k2 α1.2I1 . . . α1 .kl1 αo.1 .. .
= fm.1 ·
(6)
βm.1 . . . βm.k1 βm.2 . . . βm.k2 βm.In . . . βm .kln 1 +fm.2 · +. . .+fm.In · · αm.In +1 . . . αm.k1 αm.In +2 . . . αm .k2 αm.2In . . . αm.kln αo.m
Equation (6) is an expression formulated by the firing frequency of each input place of the synchronous transition obtained by using expressions (2), (3) and (5). Algorithm of automatically determining delay time of each transition We develop an algorithm to automatically determine delay time of transitions according to expressions (2), (3) and (5). First we give the explanation of the notions used in our algorithm. · tsync · . . . · tsink · tsink . . ., containing synchronous and sink transitions in such order LT is a list, tsync 1 2 1 2 that if i < j then there is no directed path from t sync to tsync . Tsync and Tsour are sets of synchronous j i and source transitions, respectively. EQ1(t sync ) is a function that outputs conditional expression of a synchronous transition t sync based on Eq. (6). EQ2(p) is to produce expression (5) for a conflict place p and EQ3(p) is to produce expressions (2) and (3) for a conflict-free place p. push(stack,p) is a stack operation pushing a place p into the stack stack. pop(stack) is a stack operation pulling out from stack. Te is a set of transitions whose delay times have been derived from EQ2(p) and EQ3(p).
218
Y. Miwa et al. / On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways
In the above algorithm, the main routine Eq-Produce(PN) is designed to find all the synchronous transitions for delay time determination. In Step 1◦ , synchronous transitions are sequentially saved to the list LT from the source transitions to the sink transitions. The sets of synchronous and sink transitions are initialized in 2◦ and 3◦ , respectively. In 5◦ , the sub-routine dfs-push(t, stack, PN) is executed, in which first argument t is a transition pulled from the head of list LT in 4 ◦ . In 6◦ , the function EQ1(tsync ) outputs conditional expression regarding t sync by using Eq. (6). The procedures in 4 ◦ – 8◦ are executed until LT is empty. That is, if all the synchronous transitions stored in LT are searched, the algorithm will stop. The sub-routine dfs-push(t, stack, PN) is a function recursively carrying out an operation of Depth First Search from selected transition t (given as the first argument in the main function) towards the source transitions. In 2◦ , a judgment procedure if the input place p of t is in a conflict place or not, is
Y. Miwa et al. / On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways
Table 1 Conditional expressions of firing delay time for each transition of model (Fig. 7) determined by using proposed algorithm <> Tsync , Inequality Tsink (2), (3), (5) t3 f1 f3 f2 f3 t5 f1 f5 f4 f5 f6 f8 f7 f8 t9 f1 f9 f6 f9 t11 f1 f11 f10 f11 t13 f1 f13 f12 f13 t17 f1 f14 f1 f15 f1 f17 f16 f17 t20 f1 f20 t24 f21 f24 f22 f24 f23 f24 t25 f1 f18 f1 f19 f1 f25 f21 f25 t28 f1 f28 t30 f1 f26 f1 f27 f1 f30 f29 f30 t34 f1 f31 f1 f32 f1 f33 f1 f34 t35 f1 f35 t36 f1 f36 t40 f37 f38 f38 f39 f38 f40 t47 f41 f46 f41 f47 t50 s42 ∗ f41 s43 ∗ f41 s42 ∗ f41 s43 ∗ f41 f41 f48 f41 f50 f49 f50 t52 f41 f51 f41 f52
Equation (6) f1 = f2
Tsync , Tsink t62
f1 = f4 f6 = f7 f1 = f6
t63 t72
f1 = f10 f1 = f12 f1 = f16
t75 t82
f21 = f22 = f23 f1 = f21 t86 f1 = f29 t88 t91 t96
f42 f43 f44 f45
f41 = f49
t97 t98 t99 t100 t103 t107 t108 t109 t116
Inequality (2), (3), (5) f58 f59 f58 f60 f58 f61 f58 f62 f58 f63 s65 ∗ f64 f65 s66 ∗ f64 f66 s67 ∗ f64 f67 s72 ∗ f64 f72 s70 ∗ f69 f70 s71 ∗ f69 f71 f69 f72 f69 f73 f69 f75 f74 f75 s67 * f64 f68 f74 f76 s67 * f64 + f74 f82 s78 * f77 f78 s79 ∗ f77 f79 f77 f80 f77 f81 f77 f82 f77 f83 f77 f84 f77 f86 f85 f86 f85 f87 f85 f88 f89 f90 f89 f91 f92 f93 f92 f94 f92 f95 f92 f96 f92 f97 f92 f98 f92 f99 f92 f100 f101 f102 f101 f103 f104 f105 f104 f106 f104 f107 f104 f108 f104 f109 f110 f111 f110 f112 f110 *2 f113
Equation (6)
s72 ∗ f64 = f69
f69 = f74 s67 * f64 + f74 = f77
f77 = f85
f112 *2 = f116 = f117
219
220
Y. Miwa et al. / On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways Table 1, continued Tsync , Tsink t54 t57
Inequality (2), (3), (5) f53 f54 f55 f56 f55 f57
Equation (6)
Tsync , Tsink
t118
Inequality (2), (3), (5) f110 *2 f116 f114 f116 f115 f116 f114 f117 f114 f118
Equation (6)
Tsync = {t3 , t5 , t8 , t9 , t11 , t13 , t17 , t24 , t25 , t30 , t50 , t72 , t75 , t82 , t86 , t116 }, Tsink = { t20 , t28 , t34 , t35 , t36 , t40 , t47 , t52 , t54 , t57 , t62 , t63 , t88 , t91 , t96 , t97 , t98 , t99 , t100 , t103 , t107 , t108 , t109 , t118 }, Tsour = {t1 , t2 , t4 , t6 , t7 , t10 , t12 , t16 , t21 , t22 , t23 , t29 , t37 , t41 , t49 , t53 , t55 , t58 , t64 , t69 , t74 , t77 , t85 , t89 , t92 , t101 , t104 , t110 , t114 , t115 }.
executed. When p is in a conflict state, it will be labeled and pushed to stack. Steps 4 ◦ – 7◦ are the procedures for the input transitions t of p. If the condition t ∈ T sour ∪ Tsync ∪ Te holds, dfs-push(t, stack, PN) will stop and the other sub-routine dfs-pop(stack, PN) will be invoked; otherwise, dfs-push(t, stack, PN) will be invoked recursively. The sub-routine dfs-push(t, stack, PN) is a function used to produce conditional expression with the use of places saved in stack based on expressions (2), (3) and (5). In 3 ◦ , if the input transitions of p are marked, p will be pulled out from stack. In the case that p is a conflict place, EQ2(p) is used to calculate conditional expression (in 4 ◦ ); in the case that p is a normal place, EQ3(p) is used (in 5 ◦ ). Application of proposed algorithm to IL-1 signaling pathway As a case study, we applied our algorithm to a IL-1 signaling pathway model. Here, we show the results by applying our algorithm to automatically produce delay times of all the transitions to the IL-1 signaling pathway model shown in Fig. 7. In the IL-1 signaling pathway, it can be found that there exist several self-loops. We thus break down the IL-1 signaling pathway at the places of self-loops, and derive all the delay times of the transitions by applying the algorithm <>. Calculated delay times of the transitions are shown in Table 1, which are given in the form of conditional expression of firing frequency. The simulation can be executed without token-retention by using the delay time decided obeying the conditional expressions listed in Table 1. If the simulation is conducted in sufficient time period, it can be observed that firings of the transitions follow the given stochastic values. CONCLUSIONS We have proposed a method of determining the delay time of transitions satisfying derived conditional expressions for a class of Petri nets, which is acyclic and of no inhibitory arcs. To resolve nondeterministic firings, we have introduced a stochastic approach to determine firings of transitions in conflict. In this contribution, we have first presented basic definitions of Petri net and introduced a Petri net based modeling method for signaling pathways by taking notice on the molecular interactions and mechanisms. Then we have presented conditional expressions for two cases, conflict-free transitions and conflict transitions, which express the conditions for smooth token flows in a timed Petri net, i.e., smooth signal transductions in a signaling pathway. Based on these expressions, we have proposed an algorithm to determine the delay times of transitions in a timed Petri net model of a signaling pathway.
Y. Miwa et al. / On Determining Firing Delay Time of Transitions for Petri Net Based Signaling Pathways
221
The availability of the algorithm has been confirmed by the results of application to the IL-1 signaling pathway model. Using our method, the range of delay times of transitions can be decided by introducing the concept of probability to the conflict transitions. Our method makes it possible to automatically determine the delay times of all the transitions representing biological reaction rates in signaling pathways according to the delay times of reliably determined transitions. In the meanwhile, when several delay times of the transitions have been given in advance based on confirmed biological facts, the delay time of other transitions can also be determined mechanically. The Petri net dealt with in this paper is constrained to an acyclic one without inhibitor arcs. However, it is obvious that there exist cyclic structures in signaling pathways such as feed-back loops and self-loops. Therefore, as the future work, we will aim to develop our current method further to deal with the Petri net model including various kinds of net structures. ACKNOWLEDGEMENTS This work was partially supported by Grant-in-Aid for Scientific Research on Priority Areas “Systems Genomics” (17017008) from Ministry of Education, Culture, Sports, Science and Technology of Japan. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]
Peterson, J. L. (1981) Petri net theory and the modeling of systems. Prentice Hall, New Jersey. Pinney, J. W., Westhead, D. R. and McConkey, G.A. (2003). Petri net representations in systems biology. Biochem. Soc. Trans. 31, 1513-1515. Narahari, Y., Suryanarayanan, K. and Reddy, N. V. S. (1989). Discrete event simulation of distributed systems using stochastic Petri nets. Energy, Electronics, Computers, Communications, 622-625. Peccoud, J. (1998). Stochastic Petri nets for genetic networks. Med. Sci. (Paris) 14, 991-993. Matsuno, H., Tanaka, Y., Aoshima, H., Doi, A., Matsui, M. and Miyano, S. (2003). Biopathways representation and simulation on hybrid functional Petri net. In Silico Biol. 3, 0032. Matsuno, H., Fujita, S., Doi, A., Nagasaki, M. and Miyano, S. (2003). Towards biopathway modeling and simulation. In: Applications and Theory of Petri Nets 2003, Goos, G. Hartmanis, J. and van Leeuwen, J. (eds.), Lecture Notes in Computer Science 2679, 3-22. Genrich, H., K¨uffner, R. and Voss, K. (2001). Executable Petri net models for the analysis of metabolic pathways. International Journal on Software Tools for Technology Transfer 3, 394-404. Li, C., Suzuki, S., Ge, Q. W., Nakata, M., Matsuno, H. and Miyano, S. (2006). Structural modeling and analysis of signaling pathways based on Petri nets. J. Bioinform. Comput. Biol. 4, 1119-1140. Li, C., Ge, Q. W., Nakata, M., Matsuno, H. and Miyano, S. (2007). Modelling and simulation of signal transductions in an apoptosis pathway by using timed Petri nets. J. Biosci. 32, 113-127. Matsuno, H., Li, C. and Miyano, S. (2006). Petri net based descriptions for systematic understanding of biological pathways. IEICE Trans. Fundamentals of Electronics, Communications and Computer Sciences E89-A, 3166-3174. van der Aalst, W. M. P., van Hee, K. M. and Reijers, H. A. (2000). Analysis of discrete-time stochastic Petri nets. Statistica Neerlandica 54, 237-255. David, R. and Alla, H. (2005). Discrete, continuous, and hybrid Petri nets. Springer-Verlag, Berlin Heidelberg. Berenbaum, F. (2000). Proinflammatory cytokines, prostaglandins, and the chondrocyte: mechanisms of intracellular activation. Joint Bone Spine 67, 561-564. http://genome.ib.sci.yamaguchi-u.ac.jp/∼pnp/frame petri net pathway il-1.html http://genome.ib.sci.yamaguchi-u.ac.jp/∼pnp/bp symbols.html.
222
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2010, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-222
Impact of Delays and Noise on Dopamine Signal Transduction Jialiang Wua , Zhen Qib and Eberhard O. Voitb,∗ a
School of Mathematics, Georgia Institute of Technology, Atlanta, Georgia, USA Department of Biomedical Engineering, Georgia Institute of Technology and Emory University Medical School, Atlanta, Georgia, USA b
ABSTRACT: Dopamine is a critical neurotransmitter for the normal functioning of the central nervous system. Abnormal dopamine signal transmission in the brain has been implicated in diseases such as Parkinson’s disease (PD) and schizophrenia, as well as in various types of drug addition. It is therefore important to understand the dopamine signaling dynamics in the presynaptic neuron of the striatum and the synaptic cleft, where dopamine synthesis, degradation, compartmentalization, release, reuptake, and numerous regulatory processes occur. The biochemical and biological processes governing this dynamics consist of interacting discrete and continuous components, operate at different time scales, and must function effectively in spite of intrinsic stochasticity and external perturbations. Not fitting into the realm of purely deterministic phenomena, the hybrid nature of the system requires special means of mathematical modeling, simulation and analysis. We show here how hybrid functional Petri-nets (HFPNs) and the software Cell Illustrator facilitate computational analyses of systems that simultaneously contain deterministic, stochastic, and delay components. We evaluate the robustness of dopamine signaling in the presence of delays and noise and discuss implications for normal and abnormal states of the system. KEYWORDS: Amphetamine, Biochemical System Theory (BST), delay, dopamine signaling, HFPN, hybrid modeling, Parkinson’s disease, Petri nets, schizophrenia, stochasticity
INTRODUCTION Dopamine is a neurotransmitter of enormous physiological, pathological, and pharmacological importance. It is a crucial contributor to several diseases, such as Parkinson’s disease (PD) and schizophrenia. Dopamine is furthermore associated with addiction to a variety of drugs, because it has a direct effect on the body’s reward system. PD is the most common neurodegenerative movement disorder, affecting more than 1% of the world population of age 65 or higher [1,2]. Because loss of dopaminergic neurons is responsible for the majority of the motor symptoms of Parkinson’s disease, treatment options have mostly targeted the restoration of dopamine function by replacement of dopamine precursors, administration of dopamine agonists, or inhibition of its degradative enzymes. Schizophrenia is a mental disorder with a worldwide lifetime prevalence of about 0.7%. As with PD, the large number of schizophrenia cases translates into enormous economical and societal losses [3]. While the root causes of schizophrenia are still obscure, the so-called dopamine hypothesis suggests ∗
Corresponding author. Fax: +1 404 894 4243; E-mail: [email protected].
J. Wu et al. / Impact of Delays and Noise on Dopamine Signal Transduction
223
that dopamine imbalance is the underlying mechanism for symptoms of the disease. Accordingly, the main medication has been the administration of antipsychotics that reduce dopaminergic activity through blockade of the dopamine D2 receptor. Finally, the dopamine signaling system is compromised in many types of drug addiction, for instance to cocaine or methamphetamine, either through competition between the drug and dopamine in the presynaptic neuron or by competition for receptors exposed to the synaptic cleft. The abnormal activity of dopamine signaling in PD, schizophrenia, and drug addiction demonstrates the important role of dopamine dynamics in the presynaptic neuron of the striatum and the synaptic cleft, where dopamine synthesis, degradation, compartmentalization, release, reuptake, and numerous regulatory processes occur (Fig. 1). On the presynaptic side, the biosynthetic dopamine pathway begins with the precursor tyrosine, which is converted into L-DOPA and subsequently into the key neurotransmitter dopamine. Dopamine is packed into intracellular vesicles by the vesicular monoamine transporter. The packed dopamine is released into the synaptic cleft, where it can bind to dopamine receptors on the postsynaptic membrane. Some of the dopamine in the cleft also diffuses out of the cleft, or is transported back into the cytosol of the presynaptic neuron by the dopamine transporter. In addition to this cycle, dopamine can be degraded by the enzymes catechol O-methyltransferase and monoamine oxidase. Under normal, unstimulated conditions, relatively small amounts of dopamine cycle between the presynaptic cytosol, vesicle, and synaptic cleft. However, if there is a stimulus, an action potential is produced at the dopaminergic terminal and induces calcium influx into the cytosol. The calcium influx spikes dopamine release into the cleft, where the neurotransmitter binds to specific receptors on the postsynaptic membrane. Inside the postsynapse, various downstream signaling cascades are triggered, and the signal is successfully transduced. The functionality of dopaminergic neurons is altered when these are exposed to certain drugs like amphetamine and methamphetamine. According to current observations, amphetamine and methamphetamine cause dopamine to leak from the vesicles into the cytosol, resulting in significant increases in the cytosolic dopamine level. This excess consequently produces an efflux of dopamine into the synaptic cleft through dopamine transporters instead of the reverse flux that is typical for intact neurons [4– 7]. At the same time, amphetamine and methamphetamine regulate the generation and degradation of dopamine [8,9]. Through these mechanisms, the drugs substantially alter the neurotransmission characteristics of the dopaminergic synapse. The dopamine recycling process involves neurotransmitter packaging, release, binding, disassociation, and reuptake. Some of the biological steps associated with these processes take an appreciable amount of time to perform, thereby in effect causing delays. Since these delays must be expected to affect the dynamics of dopamine transmission, and since such effects might be different between intact and diseased systems, it is important to study the consequences of delays on dopamine transmission systematically. Moreover, like most every biological process, dopamine signaling is subject to germane stochasticity and external perturbations. For example, the rate of an enzymatic or transport process is typically considered constant, leading to a deterministic kinetic description. However, in reality the dynamics of the reaction is a random process, which is strongly exacerbated if only small numbers of reactants are involved (e.g., [10]). Accounting for stochasticity in a dynamic system requires specialized software, and if the system also contains delays, it is difficult to find a suitable modeling framework, along with supporting software. We show here how Hybrid Functional Petri Nets (HFPNs) and the software Cell Illustrator facilitate the computational analysis of systems that contain deterministic, stochastic, and delay components and demonstrate their utility with an analysis of dopamine signaling. The model describing these aspects has been submitted to a publically accessible model database (see [11]).
224
J. Wu et al. / Impact of Delays and Noise on Dopamine Signal Transduction
Fig. 1. The chemical synapse of a dopaminergic neuron and its role in signal transduction. The top part of the diagram schematically shows key features of the dopamine pathway in the presynapse and, below, in the synaptic cleft of the dopaminergic neuron. Triangles represent the neurotransmitter dopamine, while circles indicate calcium (Ca2+ ) ions. An external stimulus triggers an action potential at the dopaminergic terminal and induces calcium influx into the cytosol. The calcium influx spikes dopamine release into the cleft, where the neurotransmitter binds to its receptor on the postsynaptic membrane. Inside the postsynapse, various downstream signaling cascades are triggered, and the signal is successfully transduced.
METHODS The dopamine signaling system as described above consists of interacting discrete and continuous components and processes that operate at different time scales. The hybrid nature of this system requires special means of analysis, because it does not fit neatly into the realm of deterministic models, where sets of ordinary differential equations (ODEs) are used to represent the continuous changes of the participating system components over time. Furthermore, the system is too complicated to permit comprehensive stochastic models and simulations that are based on the Chemical Master Equation, even if these are implemented as stochastic kinetic systems in advanced versions of the Gillespie algorithm [12]. Instead, it is necessary to develop a hybrid modeling methodology that allows us to merge seamlessly the deterministic and stochastic aspects of the system into a unifying framework for integrative systems analyses. A good candidate for this task is a Hybrid Functional Petri Net (HFPN, [13]). In previous work, we showed that HFPNs can be combined effectively with methods of Biochemical Systems Theory (BST, [14]) for the analysis of largely continuous systems, which however are affected significantly by delays and stochastic noise [15,16]. We show here how this hybrid methodology can be used to explore the performance of the dopamine signaling system under typical perturbations as well as under disturbances
J. Wu et al. / Impact of Delays and Noise on Dopamine Signal Transduction
225
from noise and delays. The starting point for the analysis is an entirely deterministic model [17] that captured the dopamine dynamics quite well, but did not account for stochastic perturbations. Biochemical Systems Theory Biochemical Systems Theory (BST) is a firmly established mathematical modeling framework for the analysis of biological systems. BST is based on ordinary differential equations in which all dynamical processes are represented with products of power-law functions. Each of these functions consists of a non-negative rate constant, as a multiplicative coefficient, and of all contributing substrates, enzymes, and modifiers as variables. Each variable is raised to a real-valued kinetic order that quantifies the effect of the variable on a given reaction. A positive kinetic order signifies activation, while a negative value signifies inhibition and a value of zero corresponds to no contribution at all. BST permits several variants, among which the format of a Generalized Mass Action (GMA) system is most intuitive. In this format, each process is separately represented by a power-law term, while the alternative S-system format first groups all influxes and all effluxes into one term each [18]. The GMA format directly reflects the stoichiometric connectivity of the system and also indicates in its kinetic orders the strengths of interactions among the system variables. Its generic format is thus ⎛ ⎞ ⎛ ⎞ Qi Pi n n f g ⎝aip ⎝biq X˙ i = Xj ijp ⎠ − Xj ijq ⎠ , i = 1, . . . , n (1) p=1
j=1
q=1
j=1
where variable Xi is affected by Pi production and Q i degradation processes; a ip and biq are rate constants, while f ijp and gijq are kinetic orders. While models within BST consist entirely of ODEs, Mocek et al. showed that processes with a constant delay can be approximated with arbitrary accuracy within the BST format [19]. This approximation is accomplished through the introduction of auxiliary variables and equations, which however do not require additional biological parameters. Wu and Voit further extended Mocek’s method to allow for multiple delays of different types, including discrete, distributed, time dependent, and random delays [15]. Implementation of a GMA model as a Hybrid Functional Petri Net A Petri net is a mathematical modeling tool for the representation of systems with concurrent processes. One appealing feature is the graphical depiction of all system components and processes, which facilitates intuitive, targeted manipulations and simulations. Originally designed for discrete systems, Petri nets have recently been extended to account for hybrid systems combining both discrete and continuous events. These Hybrid Functional Petri Nets (HFPN) [13] can be simulated conveniently with the software package Cell Illustrator [20]. As indicated in Fig. 2, it is straightforward to implement a GMA model in the HFPN framework: each time dependent variable X i is represented in the HFPN as a continuous place with the name of the molecular species, whereas every time independent variable is coded either as a discrete or continuous place, depending on its value type. Every production term aip
n j=1
f
Xj ijp
226
J. Wu et al. / Impact of Delays and Noise on Dopamine Signal Transduction
Fig. 2. Generic GMA system and its HFPN representation. Each continuous variable Xi is represented by a double-lined circle, while the continuous transitions are given as open rectangles.
associated with variable X i is regarded as the speed of an input transition for X i and every degradation term biq
n
g
Xj ijq
j=1
is regarded as the speed of an output transition. According to Petri Net philosophy, direct connectivity exclusively reflects mass flow between places. A continuous place is graphically represented by a double-lined circle, and a continuous transition by an open rectangle. By contrast, a discrete place is depicted by a single-lined circle and a discrete transition by a solid rectangle. Representation of delays and noise in the model of dopamine dynamics In previous work [15,16] we developed a hybrid approach combining BST and HFPN that facilitates simulations of biological systems containing different effects, including feedback regulations, switches, randomness, and various delays. This modeling strategy can be applied directly to the dopamine signaling system of interest here. The main delay in the dopamine system is due to the fact that the transport of vesicular dopamine to the synaptic cleft is slower than the biochemical reactions that are governing dopamine biosynthesis and degradation [17]. Specifically, an appropriate signal received by the presynapse causes vesicle movement and an upsurge in dopamine release from the vesicle (modeled as the DA-v pool) into the extracellular, synaptic pool (modeled as DA-e), where it serves as a signaling molecule that binds to specific receptors on the postsynaptic membrane (see Fig. 1). Figure 3 shows the HFPN implementation of the delay due to dopamine translocation, as well as the representation of stochastic variations in the dopamine flux. Because the exact extent of the delay and the magnitude of stochastic noise are not known, we will explore several scenarios that appear to be most relevant. The HFPN implementation of the dopamine signaling model with noise and delays has been submitted to a publically accessible database of models (see [11]).
J. Wu et al. / Impact of Delays and Noise on Dopamine Signal Transduction
227
Fig. 3. Implementation of noisy and delayed processes in a hybrid GMA-HFPN model. Top panel: The diagram indicates a mechanism with which we simulate random perturbations. The procedure generates a sequence of random numbers, representing noise N of frequency f and amplitude Amp. SMass(1, Amp) is a Cell Illustrator function producing Gaussian distributed random numbers with mean 1 and standard deviation Amp. Center panel: The diagram indicates how a non-delayed flux with rate constant r is affected by noise N . Bottom panel: On the left, noise is applied to dopamine release. On the right, delay is assumed to have happened during dopamine translocation, and Ndelay and Xl delay are the delayed values of N and Xl , respectively. Due to space limitations, the specific implementation of delays is not shown here; however, it follows directly the principles discussed in [15,16]. It is noted that the creation of noise can also be accomplished with generic elements in the more flexible HFPNe variant of hybrid Petri nets.
Simulation of dopamine flux in response to calcium signals As indicated in Fig. 1, the presynaptic neuron receives signals from other neurons in the form of action potentials, which in turn lead to a rapid calcium influx that ultimately causes a release of dopamine from the presynaptic vesicle pool (DA-v) into the extracellular pool (DA-e). The mechanistic details of this Hodgkin-Huxley type activation are immaterial here, and only the overall effect needs to be modeled. Typically, the dopamine response follows a regular train of signals, which is often described as a spiking pattern. To represent this repeated triggering effect in our model, we multiply dopamine efflux (from DA-v to DA-e) with the following function:
signal(t) =
bolus sin 1,
π(t−t0 ) w1
+ 1, t ∈ [t0 , t0 + w1 ] else
(2)
The function has a baseline level of 1, which represents the resting state of the signaling system. A true signal appears during the time window [t 0 , t0 + w1 ], where it rises and falls according to the positive portion of the sine function and up to a maximum height of (1+ bolus). For a train of n signals, we assume the distance between two subsequent signals to be w 2 . According to [21], realistic values for w1 and w2 are at the order of tens or hundreds of milliseconds (ms) and a suitable bolus value is between 20 and 40.
228
J. Wu et al. / Impact of Delays and Noise on Dopamine Signal Transduction
Fig. 4. Effects of different delays on dopamine signaling. Lower panel: Train of 10 signals with w1 = 0.01s w2 = 0.01s, and bolus = 20. Upper panel: The response to the signal train is a single, extended peak of extracellular dopamine (DA-e). The peak is ragged for no or small delays, but essentially smooth for longer delays.
RESULTS Effects of different delays Most steps in the dopamine signaling system are biochemical reactions, and therefore fast. In comparison, the movement of vesicles, their attachment to the presynaptic membrane, and the subsequent release of dopamine into the synaptic cleft are slower. As detailed in the Methods section, we model this relatively slow process with a time delay [22]. Zhang and coworkers [23] suggested that the most relevant range for this delay is between tens and a few hundred milliseconds. We investigated the effects of different delays within this range in two typical signaling scenarios, as shown in Figs 4 and 5. When the distance between subsequent signals and the signal width are equal or similar (e.g., w 1 = 0.01s and w2 = 0.01s in Fig. 4), the extracellular dopamine accumulates in the form of a single extended peak, consisting of a fast rise and a relatively slow return to the baseline. If the distance between two consecutive signals is much greater than the signal width (e.g., w 1 = 0.01s and w2 = 0.2s in Fig. 5), the dopamine responses consist of individual, separated peaks. As demonstrated in Figs 4 and 5, the dopamine system with realistic delays has the following features: 1. The larger the delay is, the more the peaks of response are reduced in height. 2. The larger the delay is, the less the valleys between peaks are pronounced. These two features imply that large delays may lead to impairments in the efficiency of the signaling system and that it is possible that signals are lost in the process. Although the delayed responses may have a similar appearance as the responses in the non-delayed system, it is possible that the delayed responses no longer exceed the necessary threshold that is typical for signal transduction in all-or-nothing responses
J. Wu et al. / Impact of Delays and Noise on Dopamine Signal Transduction
229
Table 1 The number of effective responses changes with different delays 165% baseline 185% baseline 210% baseline
Non-delayed 10 9 9
0.01s delay 10 9 9
0.05s delay 3 9 8
0.10 s delay 2 9 0
Fig. 5. Effects of delays when signals are sparse. Lower panel: Train of 10 signals with w1 = 0.01s, w2 = 0.2s and bolus = 20. Upper panel: DA-e responds in the form of 10 separated peaks. Different delays (corresponding to those in Fig. 4) result in some smoothing in the responses, but the peaks remain separated.
of neurons. For instance, if the threshold in Fig. 5 is 165% of the baseline value, the non-delayed system has ten effective responses (as seen in separable peaks above the threshold line), among which one lasts for 0.07s and the other nine between 0.15s to 0.2s. By contrast, when the dopamine release is delayed by 0.1s, the system only fires twice: once for 0.17s and once for 1.7s. The abnormally long second peak is the result of merging peaks and the fact that the last eight responses do not fall below the threshold. As a second scenario, suppose that the threshold is 210% of the baseline. In this case, the non-delayed system fires nine signals, lasting from 0.02s to 0.04s, whereas there is no effective response at all when delay is 0.1s. Some representative results are summarized in Table 1. While these results show that delays may affect signaling accuracy, one should also note that the system does retain its signaling capacity if the delay is relatively short (such as 0.01s in our simulations) and if the threshold is positioned differently, for instance, at 185% of the baseline value. Effects of stochastic noise Signal transduction in the dopamine system depends on the attachment of vesicles to the presynaptic membrane and their subsequent release of dopamine into the synaptic cleft [24]. Of course, the intracellular environment is heterogeneous, and changes in metabolic state, temperature, and pH, as well as
230
J. Wu et al. / Impact of Delays and Noise on Dopamine Signal Transduction
Fig. 6. Statistics of effective responses in dopamine signaling systems subjected to noise of various amplitudes and frequencies. The bars in each group are sorted from left to right by noise frequency [Hz]. The statistics for each bar was calculated from 20 simulation samples.
various crowding effects must be expected to create stochasticity that cannot be ignored in the context of vesicle dynamics and dopamine release. Because the details of stochasticity in the cytosol cannot be characterized in mechanistic detail, it seems reasonable to let the simulations account for randomness in the dopamine flux from the vesicle pool to the synaptic cleft. This section addresses the effects of this stochasticity with an exploration of noise of different frequencies and amplitudes. Most processes in dopamine signal transduction are fast, with events such as biochemical reactions and ion flux transduction lasting from several milliseconds to tens of milliseconds. The randomness of such fast events results in noise of high frequency which is at the order of about one hundred Hertz. At the same time, there are also much slower events such as vesicle exocytosis, which may last for one hundred or more milliseconds and whose stochasticity corresponds to noise of several Hertz in frequency. Therefore, the frequency of realistic random perturbations ranges from several to hundreds of events per second (Hz), and it seems that an appropriate amplitude may be up to +50% of the baseline. As was shown in the Methods section, we model the noise by imposing a sequence of discrete Gaussian random numbers on the rate of dopamine flux. To evaluate the impact of this noise on signaling, we assume again, for illustration purposes, a threshold of 185% of the DA-e concentration level in comparison to the baseline (see Fig. 5) and record a response as effective if the actual DA-e level surpasses this threshold. As an example, we use the dopamine model again under a train of 10 signals characterized by w 1 = 0.01s, w2 = 0.2s and bolus = 20. When the signaling system is not subjected to noise, it always yields nine effective responses. The first response does not reach the threshold and is therefore not deemed effective. This “omission” may serve in the system as a filter that effectively ignores spurious firing. If the system is corrupted by noise, signal responses may be amplified or diminished, or they may merge
J. Wu et al. / Impact of Delays and Noise on Dopamine Signal Transduction
231
Fig. 7. Signaling dynamics of dopamine systems subjected to noise. Noise frequencies are 100 Hz, 40 Hz and 10 Hz in rows 1, 2, and 3, respectively. Left column: w1 = 0.01s; w2 = 0.01s; right column: w1 = 0.01s; w2 = 0.2s. The lowest sub-panel for each example shows a train of 10 signals; the center sub-panel visualizes the discrete Gaussian noise applied to the system, while the upper sub-panel shows the system response (DA-e level) to the signal train. Smoother lines in upper sub-panels show results for the unperturbed system, while more ragged lines show results for systems exposed to noise. Unit for x-axes: seconds (s); units for y-axes, from bottom to top: signal strength; percentage of noise to baseline; DA-e ratio to baseline.
232
J. Wu et al. / Impact of Delays and Noise on Dopamine Signal Transduction
with one another, resulting in a suboptimal number of effective responses. As indicated by the results in Figs 6 and 7, noise may have the following effects: 1. Not surprisingly, for a fixed noise frequency, noise with larger amplitude imposes more serious distortions on signal transduction than small amplitude noise. Specifically, it results in a significantly reduced number of effective responses. 2. For fixed amplitude, decreased noise frequency of the type tested here leads to reductions in effective responses. 3. In the absence of a signal, noise alone does not significantly change the system dynamics for wide ranges of frequencies and amplitudes (Fig. 7). These results indicate that the signaling system is very robust to noise of high frequency, such as 100 Hz, while it is much more vulnerable to perturbations with frequencies lower than 10 Hz. Combined effects of delays and noise The discussions in the previous sections have demonstrated that both noise and delay, when separately in effect, have the tendency to distort signal transduction. These findings raise the question of whether the combination of delay and noise would make the situation even worse. This answer is very difficult to obtain with intuition and hard thinking alone. Thus, we systematically investigated combined effects of various delays and noise using representative signals with w 1 = 0.01s, w2 = 0.2s. As before, a response was counted as effective when the DA-e level surpassed the threshold of 185% of the baseline. Figure 8 shows that the two effects may counteract each other and that, surprisingly, the signaling system is more effective in its responses when noise is accompanied by short delays. For instance, the influence of 100 Hz noise is best reduced in a system with 0.05s delay, while 40 Hz and 20 Hz noise is well counteracted by a 0.1s delay. However, if the delay is very long, such as 0.2s, noise and delay exacerbate each other’s effects and lead to misfiring that appears to be quite unreliable. DISCUSSION Every signal transduction process starts with an initial stimulus that triggers one or more signaling cascades. The final component of the signaling cascade has a direct effect on gene expression or on the activation of relevant metabolic pathways. In the case of dopamine signaling, the initial stimulus is an action potential that is converted into a rapid influx of calcium into the presynaptic neuron. This influx triggers the release of dopamine into the synaptic cleft, binding to receptors on the postsynaptic membrane, and signal processing by the postsynaptic DARPP-32 protein, which ultimately leads to genomic, metabolic or neurophysiological responses. Predicting the functioning of the dopamine signaling system is difficult, because many molecular components are involved and because dopamine itself is subject to biosynthesis, degradation, diffusion out of the synaptic cleft, and other processes that change over time and are adaptive in nature. For instance, dopamine may affect the proper functioning of dopamine receptors on the postsynaptic cell membrane. These receptors are normally stable, but exhibit greatly diminished receptor activity in response to sharp or prolonged increases in dopamine concentration. In cases of amphetamine and cocaine abuse, this type of down-regulation of dopamine receptors has been associated with a shortened attention span, further drug craving, and loss of interest in social activities even if they are otherwise considered pleasant.
J. Wu et al. / Impact of Delays and Noise on Dopamine Signal Transduction
233
Fig. 8. Combined effects of different delays and noise. For short-but not for long-delays, the system is able to counteract high-frequency noise. Bars in each group are sorted from left to right by frequency [Hz].
Biological and clinical experiments have shown that different behavioral stimuli can induce various patterns of dopamine release (e.g., [25]). These patterns of neurotransmission can be simulated with mathematical models that demonstrate the induction of different biochemical, cellular, and physiological effects (Qi et al., submitted). Computational models of this type have so far not accounted for stochastic noise, which must be expected to affect the dopamine signaling system on a regular basis. The consideration of noise is not trivial, because the system is highly nonlinear, contains a fair number of molecular components and in addition spans different time scales, which require the inclusion of time delays. The resulting hybrid nature of the system is notoriously difficult to implement and analyze. We have shown here that the combination of Biochemical Systems Theory (BST) with Hybrid Functional Petri Nets (HFPN) affords us with a powerful method for exploring ill-defined hybrid systems. BST requires only a minimum of assumptions for the representation of a biological system and offers strong guidance for default settings and for the selection of parameter values [14]. These features are crucial for designing models of a phenomenon like dopamine signaling that occurs deep within the mammalian or human brain, where precise measurements are very difficult to obtain. While BST does not per se allow delays and noise, BST models can be implemented with relative ease as HFPNs, which subsequently permit the seamless integration of deterministic methods of systems analysis with delays, switches, and stochastic effects [15,16]. Our HFPN simulations show that noise and delays can affect the signaling function of the dopamine system in a significant manner. For instance, in situations of low-frequency noise and large delays at the order of hundreds of milliseconds, the dopamine responses to signal trains may degrade into one abnormally long response, thus impairing the normal functioning of the dopamine signaling system. While the simulations show that noise and delays can corrupt a true signal, our results also show that the signaling system is surprisingly robust. Most processes involved in the dopamine dynamics are fast events such as biochemical reactions and ion fluxes, which occur at the order of a few or tens of milliseconds, while the important transport and release of dopamine into the cleft is somewhat slower. Much of the noise associated with small numbers of molecules contributing to the governing reactions
234
J. Wu et al. / Impact of Delays and Noise on Dopamine Signal Transduction
can be expected to be at the order of maybe tens to one hundred Hertz. As demonstrated in the Results section, the dopamine signaling system can successfully tolerate noise of such frequencies even if the noise amplitude is large (the simulations allowed for noise corresponding to 50% of the baseline). As a natural system in an ever-changing environment, the dopamine system is always exposed to noise of various frequencies. There is not much that an organism can do to avoid these perturbations. Interestingly, the results of our combined noise-delay analysis show that unavoidable delays in the system, which are due to the relatively slow physical processes of vesicle dynamics and dopamine release, are not always detrimental and may even be advantageous, but only if they are short. Specifically, we found that small delays, at the order of 10 milliseconds, effectively remove the negative effects of fast noise, whereas much longer delays exaggerate the problems caused by noise, up to a point where the signal is no longer reliably transduced. Thus, the cell must assure that delays are not overly long. Indeed, it has been observed that vesicles filled with dopamine are primarily located close to the presynaptic membrane [26], thereby minimizing unavoidable delays, and that vesicles elsewhere in the presynapse primarily serve as back-up dopamine pools that move to the membrane when needed. Like all mathematical models, the model proposed here is rather simplistic, and it remains to be seen whether the investigated delays and noise frequencies constitute the most relevant combinations. With the advent of in vivo imaging and measuring technologies [23,27], the near future will reveal more biological details concerning noise and delays, and these will allow us to elucidate with greater accuracy the types and features of perturbations that the dopamine signaling system is facing on a daily basis. In spite of these uncertainties, the article demonstrates that the effects of combined noise and delays are not easy to predict and may even lead to counterintuitive outcomes. Secondly, the article shows that the embedding of a canonical formalism like BST in a hybrid framework like HFPN can substantially and beneficially expand the repertoire of analytical tools for systems biology. ACKNOWLEDGEMENTS This work was supported by a grant from the National Institutes of Health (P01-ES016731, G.W. Miller, PI) and an endowment from the Georgia Research Alliance (E.O.V). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsoring institutions. REFERENCES Olanow, C. W. and Tatton, W. G. (1999). Etiology and pathogenesis of Parkinson’s disease. Annu. Rev. Neurosci. 22, 123-144. [2] von Campenhausen, S., Bornschein, B., Wick, R., B¨otzel, K., Sampaio, C., Poewe, W., Oertel, W., Siebert, U., Berger, K. and Dodel, R. (2005). Prevalence and incidence of Parkinson’s disease in Europe. Eur. Neuropsychopharmacol. 15, 473-490. [3] MacDonald, A. W. and Schulz, S. C. (2009). What we know: findings that every theory of schizophrenia should explain. Schizophr. Bull. 35, 493-508. [4] McCann, U. D., Wong, D. F., Yokoi, F., Villemagne, V., Dannals, R. F. and Ricaurte, G. A. (1998). Reduced striatal dopamine transporter density in abstinent methamphetamine and methcathinone users: evidence from positron emission tomography studies with [11 C]WIN-35,428. J. Neurosci. 18, 8417-22. [5] Hanson, G. R., Sandoval, V., Riddle, E. and Fleckenstein, A. E. (2004). Psychostimulants and vesicle trafficking: a novel mechanism and therapeutic implications. Ann. N. Y. Acad. Sci. 1025, 146-150. [6] Sulzer, D., Chen, T.-K., Lau, Y. Y., Kristensen, H., Rayport, S. and Ewing, A. (1995). Amphetamine redistributes dopamine from synaptic vesicles to the cytosol and promotes reverse transport. J. Neurosci. 15, 4102-4108. [1]
J. Wu et al. / Impact of Delays and Noise on Dopamine Signal Transduction [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27]
235
Schmitz, Y., Lee, C. J., Schmauss, C., Gonon, F. and Sulzer, D. (2001). Amphetamine distorts stimulation-dependent dopamine overflow: effects on D2 autoreceptors, transporters, and synaptic vesicle stores. J. Neurosci. 21, 5916-5924. Sulzer, D., Sonders, M. S., Poulsen, N. W. and Galli, A. (2005). Mechanisms of neurotransmitter release by amphetamines: a review. Prog. Neurobiol. 75, 406-433. Costa, E., Groppetti, A. and Naimzada, M. K. (1972). Effects of amphetamine on the turnover rate of brain catecholamines and motor activity. Br. J. Pharmacol. 44, 742-751. Goutsias, J. (2007). Classical versus stochastic kinetics modeling of biochemical reaction systems. Biophys. J. 92, 2350-2365. Wu, J., Qi, Z. and Voit, E. (2009). Dopamine signaling with noise and delays. https://cionline.hgc.jp/cifileserver/ launchCIOPlayer?model=http://www.csml.org/download/model/csml 30/dopamine signaling with noise and delays.csml. gz. Gillespie, D. T. (2007). Stochastic Simulation of Chemical Kinetics. Annu. Rev. Phys. Chem. 58, 35-55. Nagasaki, M., Doi, A., Matsuno, H. and Miyano, S. (2003). Genomic Object Net: I. A platform for modeling and simulating biopathways. Appl. Bioinformatics 2, 181-184. Voit, E. O. (2000). Computational analysis of biochemical systems: a practical guide for biochemists and molecular biologists. Vol. xii, Cambridge University Press, Cambridge, UK. Wu, J. and Voit, E. (2009). Hybrid modeling in biochemical systems theory by means of functional Petri nets. J. Bioinform. Comput. Biol. 7, 107-134. Wu, J. and Voit, E. (2009). Integrative biological systems modeling: challenges and opportunities. Front. Comput. Sci. China 3, 92-100. Qi, Z., Miller, G. W. and Voit, E. O. (2008). Computational systems analysis of dopamine metabolism. PLoS ONE 3, e2444. Shiraishi, F. and Savageau, M. A. (1992). The tricarboxylic acid cycle in Dictyostelium discoideum. I. Formulation of alternative kinetic representations. J. Biol. Chem. 267, 22912-22918. Mocek, W. T., Rudnicki, R. and Voit, E. O. (2005). Approximation of delays in biochemical systems. Math. Biosci. 198, 190-216. Miyano, S. (2008). Cell Illustrator website. http://www.cellillustrator.com/. Sun, J.-Y., Wu, X.-S. and Wu, L.-G. (2002). Single and multiple vesicle fusion induce different rates of endocytosis at a central synapse. Nature 417, 555-559. Ryan, T. A., Smith, S. J. and Reuter, H. (1996). The timing of synaptic vesicle endocytosis. Proc. Natl. Acad. Sci. USA 93, 5567-5571. Zhang, Q., Li, Y. and Tsien, R. W. (2009). The dynamic control of kiss-and-run and vesicular reuse probed with single nanoparticles. Science 323, 1448-1453. Fern´andez-Alfonso, T. and Ryan, T. A. (2006). The efficiency of the synaptic vesicle cycle at central nervous system synapses. Trends Cell Biol. 16, 413-420. Grace, A. A. (1991). Phasic versus tonic dopamine release and the modulation of dopamine system responsivity: a hypothesis for the etiology of schizophrenia. Neuroscience 41, 1-24. S¨udhof, T. C. (2004). The synaptic vesicle cycle. Annu. Rev. Neurosci. 27, 509-547. Gaffield, M. A., Rizzoli, S. O. and Betz, W. J. (2006). Mobility of synaptic vesicles in different pools in resting and stimulated frog motor nerve terminals. Neuron 51, 317-325.
236
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2010, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-236
Role of mRNA Gestation and Senescence in Noise Reduction during the Cell Cycle Attila Csik´asz-Nagy∗ and Ivan Mura The Microsoft Research – University of Trento Centre for Computational and Systems Biology, Povo, Italy
ABSTRACT: Recent innovations in experimental techniques on single molecule detection resulted in advances in the quantification of molecular noise in several systems, and provide suitable data for defining stochastic computational models of biological processes. Some of the latest stochastic models of cell cycle regulation analyzed the effect of noise on cell cycle variability. In their study, Kar et al. (Proc. Natl. Acad. Sci. USA 106, 6471–6476, 2009) found that the observed variances of cell cycle time and cell division size distributions cannot be matched with the measured long half-lives of mRNAs. Here, we investigate through modeling and simulation how the noise created by the transcription and degradation processes of a key cell cycle controller mRNA affect the statistics of cell cycle time and cell size at division. Our model consists of an encoding of the model of Kar et al. into a stochastic Petri net, with the extensions necessary to represent multiple synthesis (gestation) and degradation (senescence) steps in the regulation of mRNAs. We found that few steps of gestation and senescence of mRNA are enough to give a good match for both the measured half-lives and variability of cell cycle-statistics. This result suggests that the complex process of transcription can be more accurately approximated by multi-step linear processes. KEYWORDS: Cell cycle, noise, stochastic Petri nets, gene expression, mRNA gestation, mRNA senescence, systems biology, yeast
INTRODUCTION The noise in gene expression has been investigated extensively in the last decade [1–4]. From these results we learned how temporal binding of the transcriptional machinery can induce bursts of mRNA production [5], and how intrinsic and extrinsic noise can be separated [4]; there are also ideas about the role of noise in transcriptional regulation [6–8]. Furthermore, the ways how system level interactions can reduce noise were also discussed [9–11] and gave us some hints on how various fluctuations in biological systems can be trimmed down by clever network wirings. Negative feedback loops, dimerization and feed-forward loops were all suggested to be able to attenuate noise [12,13]; moreover, Pedraza and Paulsson [10] showed that multi-step mRNA production (gestation) or removal (senescence) can significantly decrease protein level fluctuations. Advances in experimental techniques on single molecules [14] and large scale measurements on the transcriptome [15] and proteome [16] of budding yeast allow researchers to establish detailed computational models of this organism. The cell cycle regulation of Saccharomyces cerevisiae is one of the most deeply investigated topic in computational systems biology [17]. After the successes of ∗
Corresponding author. E-mail: [email protected].
A. Csik´asz-Nagy and I. Mura / Role of mRNA Gestation and Senescence in Noise Reduction during the Cell Cycle 237
deterministic models [18], several groups recently started to investigate the role of noise in cell cycle regulation through stochastic modeling [19–21]. Some of these models use the measured molecule numbers of critical cell cycle regulators [15,16,22] and reproduce the fluctuations in their level through the Gillespie stochastic simulation algorithm [23]. The model of Kar et al. [20] is built on the deterministic model of Tyson and Novak [24] and unpacks the original system into elementary reactions. The stochastic model is capturing the experimental findings that the coefficient of variation (CV = standard deviation/mean) of the cell cycle period is approximately 14% while the CV of the cell size at cell division is about the half of this value [25,26]. The authors could get a good fit to these low fluctuations by assuming that the half-lives of the key cell cycle regulator mRNAs are approximately 12 seconds, which is much shorter than any earlier measured value (4–70 minutes in [27]). Kar et al. [20] commented on this discrepancy that the originally fitted mRNA levels [15] might come from experiments that underestimated the real average number of mRNA molecules, as suggested in [28], but even with higher values of mRNA levels they could not fit the data with realistically long mRNA half-lives. Most previous models of protein synthesis considered transcription, translation and degradation of mRNAs as first order chemical reactions [6], neglecting the observed multiple steps of transcriptional and translational complex formation and senescence by deadenylation of mRNAs [29] or polyubiquitination of proteins [30]. Kar et al. [20] followed these same lines and assumed that production of cell cycle regulatory proteins are made up of first order elementary reactions of transcription, translation and degradations of mRNAs and proteins. They found that with realistic mRNA half-lives the period of the cell cycle and the cell size at division showed too large fluctuations compared to experimental observations. In this work, we started to investigate how this large noise can be reduced by taking into account the multi-step gestation and senescence of mRNAs, which have been shown to have a role in this variability reduction [10]. We did not incorporate mRNA bursting into the model since it increases noise [3] and we neglected the effects of extrinsic noise that originates from the uneven partitioning of cell mass and molecular content at cell division [20,31], rather focusing on the intrinsic noise coming from the fluctuations in molecule numbers [4]. MATERIALS AND METHODS Our in silico approach is based on a Petri net representation of the model of stochastic molecular dynamics presented in [20], which we modified and extended to represent the multi-step gestation and senescence processes of mRNAs. Our choice of the modeling formalism stems from the intuitive mapping existing between reaction-oriented description of biochemical systems and the modeling elements of Petri nets, a correspondence that we already exploited in a previous work [21] still dealing with a cell-cycle regulation network. We encoded the model of Kar et al. [20] as a stochastic Petri net [32]. The model in [20], which we do not report here entirely for the sake of conciseness, describes the regulatory network of cell cycle in yeast through a set of coupled reactions involving 19 distinct biochemical entities (genes, mRNAs and proteins). Figure 1 provides a diagram of a small portion of the overall model, which we use to explain the rationale of the encoding with stochastic Petri nets. The diagram shows some of the key reactions involving the cell cycle core regulator CyclinB-Cdk1 complex and one of its antagonists, the anaphase promoting complex related protein Cdh1 [18]. The model includes synthesis and degradation of the mRNAs and translation and degradation of the proteins of these two key regulators. The kinetics of all reactions shown in figure1, as well as that of the reactions appearing in the model in [20] that are not reported here, follow mass-action law.
238
A. Csika´ sz-Nagy and I. Mura / Role of mRNA Gestation and Senescence in Noise Reduction during the Cell Cycle
Methods of stochastic modeling Basic Petri nets are depicted as diagrams that only include the following four modeling elements: – Places, which represent variables of the model – molecular species of the biological system – Tokens, which are contained within places and provide the numerical value associated to the variable – molecular numbers – Transitions, which represent events affecting the variables. In the Petri net terminology, the occurrence of the event associated with a transition is called transition firing, which can be correlated with individual reaction steps – Arcs, which link transitions to places and places to transitions (but not places to places nor transitions to transitions), define the changes that occur on variables as a result of transitions firings. Incoming arcs to a place add tokens to the place, whereas outgoing arcs remove tokens – representing how reactions change the molecular numbers of the reacting molecular species. The reader interested in a more complete description of the Petri net modeling formalism and a more detailed explanation of how a set of reactions can be modeled with it is referred to [21]. To provide some hints on the overall modeling process, we describe the encoding of the reactions in Fig. 1A into the Petri net model shown in Fig. 1B, which has been built using the M o¨ bius tool [33]. The model includes 5 places representing the various biochemical species and 11 transitions, each one having associated a specific reaction event. The structural description of M o¨ bius models, which is the one graphically rendered by the tool, is to be completed with the details of the kinetics associated to the firing times of transitions. M¨obius supports models in which the firing times of reactions are random variables that follow a user-selectable distribution, which makes it particularly useful to represent stochastic molecular dynamics. To reproduce a kinetics as the one used in Kar model in [20], we selected for each transition a firing time distributed according to a negative exponential distribution whose rate is the propensity function defined as per Gillespie’s formulation of stochastic chemical kinetics [23]. The values of rate constants and the exact form of each propensity function were taken from [20]. The complete stochastic Petri net model includes the 19 species and 41 reactions of the model in [20], plus the places and transitions necessary to model the dynamics of cellular mass growth and division [21]. Methods of in silico experiments Starting from the Petri net defined as described above, we produced a set of modified models, whose evaluation allows exploring the effects of the mRNA gestation and senescence processes on cell cycle variability. First of all, we modified the synthesis and degradation rates of Cdh1 mRNA to get a realistic half-life of 10 minutes together with the realistic average mRNA levels of around 8 molecules per cell [22, 28]. The new rates are k smy = 0.5545 molec min−1 for the mRNA synthesis and kdmy = 0.0693 min−1 for the mRNA degradation. This led us to modify the Cdh1 mRNA translation rates as well. Since the average number of mRNA molecules is four times larger this way as in the original model [20] we had to set the translation rate to one fourth of the value (ksy = 0.40175 molec min−1 ) to leave unaltered the average Cdh1 protein numbers. These three rate changes are the only differences compared to the originals used in [20]. The model was then extended in two different ways. A first modified model considers an N step linear gestation process on the transcription of Cdh1 mRNAs (model “GES”). Another model deploys an M steps linear senescence of Cdh1 mRNAs (model “SEN”). Finally, we also checked the behavior of that the system with both extensions (model “GES-SEN”). As a basal model we used the system with
A. Csik´asz-Nagy and I. Mura / Role of mRNA Gestation and Senescence in Noise Reduction during the Cell Cycle 239
A
B
Fig. 1. Different representations of Cdh1 – Cyclin B-Cdk1 interactions. (A) Schematic diagram of CycB and Cdh1 reactions for the model of Kar et al. [20]. Bidirectional arrows indicate reversible reactions, dashed arrows stand for the mRNA level dependent rate of protein synthesis, the five grey dots represent the product of molecular degradations. Note that we do not show here the possible production of the phosphorylated form of Cdh1 from the CycB-Cdk1:Cdh1 complex. (B) M¨obius model of the reactions shown in panel A. Places are depicted as the filled circles, transitions as rounded bars, tokens are not graphically shown (a common choice unless there are very few of them). Because negative number of tokens do not make in a biological context, the arcs are also providing rules on the enabling of transitions. For instance, transition mRNA Cdh1 deg (degradation of one Cdh1 mRNA molecule) cannot fire if the number of tokens in place mRNA Cdh1 is 0. Notice that the number of tokens in place mRNA Cdh1 is not affected by the firing of transition Cdh1 syn (synthesis of one Cdh1 protein) as the one token is removed and one is simultaneously added at each firing. Though, the incoming and outgoing arcs to transition Cdh1 syn are modeling the fact that the translation process occurs at a rate that depends on the number of available mRNA molecules.
240
A. Csika´ sz-Nagy and I. Mura / Role of mRNA Gestation and Senescence in Noise Reduction during the Cell Cycle
Fig. 2. Simulating fluctuations in Cdh1 mRNA and unphosphorylated protein levels. (A) Screenshot of the M¨obius tool [33], showing the implementation of five step gestation and five step senescence of Cdh1 mRNA and single step translation and degradation of Cdh1 proteins, with the enable syn. input gate (triangle) to stop the translation process when the total number of mRNA molecules is 0. (B) Simulation curves of Cdh1 mRNA levels in two variants of the model (KAR-solid line, GES+SEN-dotted line). (C) Simulated time courses of total unphosphorylated Cdh1 protein levels (the sum of the depicted forms on Fig. 1) for model KAR and GES+SEN.
N = M = 1 (model “KAR”), which is structurally equivalent to the one originally studied by Kar et al. in [20]. To encode N > 0 steps of Cdh1 mRNA gestation in the Petri net model, we assign transition Cdh1 mRNA syn in Fig. 2A (modeling the Cdh1 mRNA synthesis), a firing time that follows the Erlang (N ,1/ksmy ) distribution. The Erlang distribution is a convenient modeling shorthand for representing cascades of identical exponential stages [34]. Thus, if a random event occurs after N consecutive steps, each one following the same negative exponential distribution of its occurrence time, the overall occurrence time of the event (the sum of the N identically distributed exponential times) will be distributed as an Erlang random variable. For the purposes of our modeling, the stages are N , each stage having an expected time of 1/N · ksmy . Therefore, the overall average firing time of transition Cdh1 mRNA syn (the sum of the N average times of the constituting steps) is 1/k smy , equal to the one of the KAR model. A clear difference between the two models is that the CV of an Erlang of N stages is N −1/2 , and thus decreases as N grows, whereas the CV of a negative exponential random variable is equal to 1, always larger or equal than that of an Erlang variable with the same average value. When M > 0, the multiple steps of senescence are explicitly represented in the Petri net model through the exponentially distributed transitions sen1, sen2, . . ., senM, each step of senescence occurring at an equal rate constant given by M · k dmy . The model in Fig. 2A shows M = 5 steps of mRNA senescence. Notice that again, the average degradation time of each mRNA molecule is not changed with respect to the one used in the KAR model. The M o¨ bius modeling tool supports the Erlang distributed transitions, but
A. Csik´asz-Nagy and I. Mura / Role of mRNA Gestation and Senescence in Noise Reduction during the Cell Cycle 241 Table 1 Effects of the addition of five step gestation (GES), five step senescence (SEN) or both (GES+SEN) into the updated model of Kar et al. (KAR) [20] on the variability of cell cycle events and regulatory molecule levels Model
Cell cycle period
Average KAR 115.31 ± 0.32 GES 114.91 ± 0.20 SEN 115.78 ± 0.41 GES+SEN 115.34 ± 0.37
CV(%) 23.31 ± 0.62 20.02 ± 0.96 18.69 ± 0.55 12.66 ± 0.47
Cell size at division Average 28.30 ± 0.41 28.41 ± 0.26 29.02 ± 0.43 29.17 ± 0.61
CV(%) 15.72 ± 0.40 13.74 ± 0.74 12.29 ± 0.38 8.14 ± 0.27
Number of Cdh1 molecules mRNA proteinc Average CV(%) Average CV(%) a b 8.00 35.35 2620 ± 97 25.64 ± 1.24 7.98 ± 0.04 27.73 ± 0.65 2691 ± 83 21.09 ± 1.91 7.99 ± 0.05 35.11 ± 0.88 2687 ± 67 20.54 ± 1.83 8.00 ± 0.03 22.74 ± 0.72 2665 ± 77 11.95 ± 0.67
In the KAR model, the number of mRNAs follows a Poisson distribution of parameter ksmy /kdmy ; we report here the exact theoretical value, which is given by ksmy /kdmy . b Theoretical CV for the number of mRNAs, which for a Poisson distribution is given by (ksmy /kdmy )−1/2 . c Sum of the unphosphorylated forms. a
only when the reaction they model is zero-order. This explains why we can model gestation with a single zero-order Erlang distributed reaction, but we have to unpack the identical stages of mRNA senescence in model SEN. The translation of Cdh1 protein molecules is modeled by transition Cdh1 syn, whose firing rate is given by the product of the translation rate constant k sy times the total number of molecules of Cdh1 mRNA – that is by the total number of tokens in places mRNA Cdh1, mRNA Cdh1 old1, mRNA Cdh1 old2, . . ., mRNA Cdh1 old4. The models were studied by simulation, to determine the average values and the CVs of a set of metrics. For each estimated measure, we computed statistics with 95% confidence level. The confidence intervals for the CV statistic have been computed using the method described in [35]. Our measures are for a synchronous population of cells (calculated from long time averages and variations of 2,000 cell cycles). The M¨obius code of the models is available upon request from the authors. RESULTS AND DISCUSSION Our simulations show that even though the average amount of Cdh1 mRNA is stable during the cell cycle, its fluctuations around the equilibrium value have profound consequences on the variability of the Cdh1 protein levels, which in turn affect the variability of the cell cycle time and of the cell size at division. These could be clearly noticed on the simulation plots of Figs 2B and 2C. The molecular noise in Cdh1 mRNA and protein levels look quite much the same in the KAR model and in the GES-SEN model, but the amplitude in mRNA noise and irregularity in protein peaks is greatly reduced in the GES-SEN model. In Table 1 we report the detailed statistics obtained by solving all the four model variants. The outcomes of simulations show that the average number of Cdh1 mRNA molecules is the same (around 8) across all models (sixth column in Table 1). On the contrary, the CV of the number of Cdh1 mRNA (seventh column in Table 1) is affected by the multi-stage transcription process (model GES) and by the combined multi-stage transcription and degradation (model GES+SEN), validating the visually noticeable differences on Fig. 2B. It is also interesting to notice that in model SEN the variance of Cdh1 mRNA level stays practically unchanged with respect to those in model KAR. Though, when combined with the multi-stage transcription process, the senescence contributes significantly to reduce the variability of the mRNA levels (compare the mRNA CV results for GES and GES+SEN models). To evaluate the effect that the noise of Cdh1 mRNA levels has on the variability of Cdh1 protein levels we computed the average and CV of the number of unphosphorylated Cdh1 molecules as well
242
A. Csika´ sz-Nagy and I. Mura / Role of mRNA Gestation and Senescence in Noise Reduction during the Cell Cycle
(Table 1, last two columns). Active, unphosphorylated Cdh1 molecules exist in free and in two various CycB-Cdk1 bound forms (species Y, XY and YX in [20]). The average number of Cdh1 molecules stays quite constant across models, while model variants GES, SEN and GES+SEN reduce its CV compared to model KAR. The most effective noise containment occurs in GES+SEN model, which reduces the CV by more than 50% compared to model KAR, whereas GES and SEN models both show a reduction of around 20%. It is worthwhile noticing that even though in model SEN the CV of the mRNA level is practically the same as the one in model KAR, there is a significant difference on the noise in protein levels between the two models. The reason is to be found in the different holding times of the mRNA levels, with model SEN having a slower pattern of variation than that of model KAR. The effect of the variability of active Cdh1 protein level on the cell cycle time and cell size at division statistics are also shown in Table 1. The average values of these two measures are practically constant across all models, and match the values reported in [20]. Instead, the CV values obtained for GES and SEN model show a reduced variability with respect to the one returned by model KAR, although they are still far from realistic values. Not surprisingly, model GES+SEN, providing the smallest variability of Cdh1 protein levels, is the one that also results in the smallest CVs of cell cycle time and cell mass at division time. Model GES+SEN simulation outcomes indicate values of the CV for cell cycle time and cell size at division (12.66% and 8.14%) that are very close to those reported in [20] (13.0% and 8.2%), but in our case we got these results with realistic mRNA half-lives. This further supports the idea, that multi-step gestation and senescence could represent a more accurate way of modeling the complex processes of mRNA transcription and degradation. In this paper we focused only on the multi-step production and removal of Cdh1 mRNA, but as cyclins are even more important regulators of the cell cycle, their production and degradation should be also approximated by such multistep processes. As we showed above, this modeling can be effectively supported by the usage of reaction times that follow properly selected Erlang distributions. ACKNOWLEDGEMENTS The authors thank Sandip Kar and John J. Tyson for helpful discussions and acknowledge support from the Italian research fund FIRB (RBPR0523C3). REFERENCES [1] [2] [3] [4] [5] [6] [7] [8]
Elowitz, M. B., Levine, A. J., Siggia, E. D. and Swain, P. S. (2002). Stochastic gene expression in a single cell. Science 297, 1183-1186. McAdams, H. H. and Arkin, A. (1997). Stochastic mechanisms in gene expression. Proc. Natl. Acad. Sci. USA 94, 814-819. Ozbudak, E. M., Thattai, M., Kurtser, I., Grossman, A. D. and van Oudenaarden, A. (2002). Regulation of noise in the expression of a single gene. Nat. Genet. 31, 69-73. Swain, P. S., Elowitz, M. B, and Siggia, E. D. (2002). Intrinsic and extrinsic contributions to stochasticity in gene expression. Proc. Natl. Acad. Sci. USA 99, 12795-12800. Cai, L., Friedman, N. and Xie, X. S. (2006). Stochastic protein expression in individual cells at the single molecule level. Nature 440, 358-362. Kaern, M., Elston, T. C., Blake, W. J. and Collins, J. J. (2005). Stochasticity in gene expression: from theories to phenotypes. Nat. Rev. Genet. 6, 451-464. Kepler, T. B. and Elston, T. C. (2001). Stochasticity in transcriptional regulation: origins, consequences, and mathematical representations. Biophys. J. 81, 3116-3136. Raj, A. and van Oudenaarden, A. (2008). Nature, nurture, or chance: stochastic gene expression and its consequences. Cell 135, 216-226.
A. Csik´asz-Nagy and I. Mura / Role of mRNA Gestation and Senescence in Noise Reduction during the Cell Cycle 243 [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35]
Acar, M., Becskei, A. and van Oudenaarden, A. (2005). Enhancement of cellular memory by reducing stochastic transitions. Nature 435, 228-232. Pedraza, J. M. and Paulsson, J. (2008). Effects of molecular memory and bursting on fluctuations in gene expression. Science 319, 339-343. Thattai, M. and van Oudenaarden, A. (2002). Attenuation of noise in ultrasensitive signaling cascades. Biophys. J. 82, 2943-2950. Shahrezaei, V., Ollivier, J. F. and Swain, P. S. (2008). Colored extrinsic fluctuations and stochastic gene expression. Mol. Syst. Biol. 4, 196. Swain, P. S. (2004). Efficient attenuation of stochasticity in gene expression through post-transcriptional control. J. Mol. Biol. 344, 965-976. Raj, A. and van Oudenaarden, A. (2009). Single-molecule approaches to stochastic gene expression. Annu. Rev. Biophys. 38, 255-270. Holstege, F. C. P., Jennings, E. G., Wyrick, J. J., Lee, T. I., Hengartner, C. J., Green, M. R., Golub, T. R., Lander, E. S. and Young, R. A. (1998). Dissecting the regulatory circuitry of a eukaryotic genome. Cell 95, 717-728. Ghaemmaghami, S., Huh, W.-K., Bower, K., Howson, R. W., Belle, A., Dephoure, N., O’Shea, E. K. and Weissman, J. S. (2003). Global analysis of protein expression in yeast. Nature 425, 737-741. Csik´asz-Nagy, A. (2009). Computational systems biology of the cell cycle. Brief. Bioinform. 10, 424-434. Chen, K. C., Calzone, L., Csikasz-Nagy, A., Cross, F. R., Novak, B. and Tyson, J. J. (2004). Integrative analysis of cell cycle control in budding yeast. Mol. Biol. Cell 15, 3841-3862. Braunewell, S. and Bornholdt, S. (2007). Superstability of the yeast cell-cycle dynamics: ensuring causality in the presence of biochemical stochasticity. J. Theor. Biol. 245, 638-643. Kar, S., Baumann, W. T., Paul M. R. and Tyson, J. J. (2009). Exploring the roles of noise in the eukaryotic cell cycle. Proc. Natl. Acad. Sci. USA 106, 6471-6476. Mura, I. and Csik´asz-Nagy, A. (2008). Stochastic Petri Net extension of a yeast cell cycle model. J. Theor. Biol. 254, 850-860. Cross, F. R., Archambault, V., Miller, M. and Klovstad, M. (2002). Testing a mathematical model of the yeast cell cycle. Mol. Biol. Cell 13, 52-70. Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81, 2340-2361. Tyson, J. J. and Novak, B. (2001). Regulation of the eukaryotic cell cycle: molecular antagonism, hysteresis, and irreversible transitions. J. Theor. Biol. 210, 249-263. Koch, A. L. and Schaechter, M. (1962). A model for statistics of the cell division process. J. Gen. Microbiol. 29, 435-454. Tyson, J. J. (1985). The coordination of cell growth and division – intentional or incidental? BioEssays 2, 72-77. Koch, H. and Friesen, J. D. (1979). Individual messenger RNA half lives in Saccharomyces cerevisiae. Mol. Gen. Genet. 170, 129-135. Zenklusen, D., Larson, D. R. and Singer, R. H. (2008). Single-RNA counting reveals alternative modes of gene expression in yeast. Nat. Struct. Mol. Biol. 15, 1263-1271. Decker, C. J. and Parker, R. (1993). A turnover pathway for both stable and unstable mRNAs in yeast: evidence for a requirement for deadenylation. Genes Dev. 7, 1632-1643. Ciechanover, A. (1994). The ubiquitin-proteasome proteolytic pathway. Cell 79, 13-21. Sveiczer, A., Tyson, J. J. and Novak, B. (2001). A stochastic, molecular model of the fission yeast cell cycle: role of the nucleocytoplasmic ratio in cycle time regulation. Biophys. Chem. 92, 1-15. Chaouiya, C. (2007). Petri net modelling of biological networks. Brief. Bioinform. 8, 210-219. Peccoud, J., Courtney, T. and Sanders, W. H. (2007). M¨obius: an integrated discrete-event modeling environment. Bioinformatics 23, 3412-3414. Doob, J. L. (1953). Stochastic Processes, John Wiley and Sons, New York. Miller, E. G. (1991). Asymptotic test statistics for coefficient of variation. Commun. Stat. Theory Methods 20, 3351-3363.
244
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2010, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-244
Exhaustive Analysis of the Modular Structure of the Spliceosomal Assembly Network: A Petri Net Approach Ralf H. Bortfeldta,1,∗ , Stefan Schustera and Ina Kochb,c a
Chair of Bioinformatics, Friedrich-Schiller University Jena, Jena, Germany Institute for Computer Science WG Molecular Bioinformatics, Johann Wolfgans Goethe University Frankfurt, Frankfurt, Germany c Department Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany b
ABSTRACT: Spliceosomes are macro-complexes involving hundreds of proteins with many functional interactions. Spliceosome assembly belongs to the key processes that enable splicing of mRNA and modulate alternative splicing. A detailed list of factors involved in spliceosomal reactions has been assorted over the past decade, but, their functional interplay is often unknown and most of the present biological models cover only parts of the complete assembly process. It is a challenging task to build a computational model that integrates dispersed knowledge and combines a multitude of reaction schemes proposed earlier. Because for most reactions involved in spliceosome assembly kinetic parameters are not available, we propose a discrete modeling using Petri nets, through which we are enabled to get insights into the system’s behavior via computation of structural and dynamic properties. In this paper, we compile and examine reactions from experimental reports that contribute to a functional spliceosome. All these reactions form a network, which describes the inventory and conditions necessary to perform the splicing process. The analysis is mainly based on system invariants. Transition invariants (T-invariants) can be interpreted as signaling routes through the network. Due to the huge number of T-invariants that arise with increasing network size and complexity, maximal common transition sets (MCTS) and T-clusters were used for further analysis. Additionally, we introduce a false color map representation, which allows a quick survey of network modules and the visual detection of single reactions or reaction sequences, which participate in more than one signaling route. We designed a structured model of spliceosome assembly, which combines the demands on a platform that i) can display involved factors and concurrent processes, ii) offers the possibility to run computational methods for knowledge extraction, and iii) is successively extendable as new insights into spliceosome function are reported by experimental reports. The network consists of 161 transitions (reactions) and 140 places (reactants). All reactions are part of at least one of the 71 T-invariants. These T-invariants define pathways, which are in good agreement with the current knowledge and known hypotheses on reaction sequences during spliceosome assembly, hence contributing to a functional spliceosome. We demonstrate that present knowledge, in particular of the initial part of the assembly process, describes parallelism and interaction of signaling routes, which indicate functional redundancy and reflect the dependency of spliceosome assembly initiation on different cellular conditions. The complexity of the network is further increased by two switches, which introduce alternative routes during A-complex formation in early spliceosome assembly and upon transition from the B-complex to the C-complex. By compiling known reactions into a complete network, the combinatorial nature of invariant computation leads to pathways that have previously not been described as connected routes, although their constituents were known. T-clusters divide the network into
∗
Corresponding author: Ralf H. Bortfeldt, Friedrich-Schiller University Jena, Ernst Abbe Platz 2, 07743 Jena, Germany. E-mail: [email protected]. 1 Humboldt-University of Berlin, Breeding Biology and Molecular Genetics Berlin, Germany.
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
245
modules, which we interpret as building blocks in spliceosome maturation. We conclude that Petri net representations of large biological networks and system invariants, are well-suited as a means for validating the integration of experimental knowledge into a consistent model. Based on this network model, the design of further experiments is facilitated. KEYWORDS: Spliceosome, pathway analysis, Petri net theory, T-invariants, T-clusters, MCTS, regulated splicing, alternative splicing, signal transduction networks
INTRODUCTION Splicing is a process of mRNA maturation in which parts (introns) of the pre-mRNA are removed and which significantly increases the coding capacity of a gene [Graveley, 2001]. Splicing activities have been observed in many metazoans but also in less complex organisms such as the immunodeficiency virus or yeast cells. It has been shown that, along with the evolution of metazoans towards higher complexity, the number of genes increased comparably slowly. This has been attributed to the effect of alternative splicing (AS) [Brett et al., 2000; Modrek, 2001; Johnson et al., 2003], i.e., the variable recognition of splicing signals that leads to different mature mRNA sequences from the same gene. Understanding alternative splicing (AS) is preceded and accompanied by the fundamental mechanism and function of the spliceosomal protein complex. The mechanism of splicing has been studied for almost three decades, and many details have been elucidated through experimental work with yeast strains. Many of the protein factors involved in the splicing of yeast genes exhibit homologous counterparts in the human spliceosome [Brow, 2002] and are organized in functional subcomplexes named U1, U2, U4, U5, and U6 snRNPs [Stevens et al., 2001, Will and L u¨ hrmann, 2001]. The spliceosome itself belongs to the most complex machineries that exist in eukaryotic cells, involving more than 150 proteins [Zhou et al., 2002; Jurica and Moore, 2003; Chen et al., 2007]. It constitutes an important regulatory unit that on the one side offers many potential targets for control by external stimuli, and on the other side emits itself regulatory effects through realization of specific AS patterns. The outcome of different splicing events indicates that the maturation of RNAs is a complicated process that can be influenced by events as simple as mutations in splice sites [Venables, 2004], or as subtle as a few-nucleotides shift of splicing signals [Hiller et al., 2004; Bortfeldt et al., 2008] or as complex as signal cascades induced by cell external factors [Stamm, 2002]. The frequent occurrence of subtle splice events (e.g., TASSDB [Hiller et al., 2007]) requires to reconsider and to deeper explore the function and control of spliceosome assembly. However, compilations of data about the compositions of spliceosomes, the involved proteins and their interactions, which participate in alternative splicing events, is missing up to now. Splicing decisions are controlled by two major determinants – the pre-mRNA sequence and its inherent signals as well as the protein-complement of the spliceosome where a signal transduction is frequently maintained via arginine-serine rich domains (RS domains) of the participating proteins [Shen and Green, 2004]. Hence, the dependence of gene expression on developmental stage or tissue type is modulated to a major extent by a network of protein-protein and protein-RNA interactions that influence the assembly and localization of active spliceosomes. One of the major difficulties in the functional characterization of the spliceosome arises from the dynamic interactions between its subcomplexes and the huge number of proteins organized within those. Although extensive knowledge has been gathered about the factors involved in spliceosomal activities, their functional interplay and regulatory impact is not comprehensively understood and even discussed controversially. For example, it is not clear whether the assembly process occurs mainly co- or posttranscriptional and whether a stepwise assembly [G o¨ rnemann et al., 2005; Behzadnia et al., 2006] rules out the possibility of a pre-assembled holospliceosome [Stevens et al., 2002; Malca et al., 2003]. In
246
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
Fig. 1. Two types of protein interaction networks: (a) An undirected network of binary interactions, e.g., modeled in [Zhang et al., 2007]. (b) A directed hierarchical molecular interaction network represented as Petri net.
this context, the term “transcription factory model” has been coined, denoting the concerted action of transcription and splicing machinery at the places of active gene transcription [Bentley, 2002]. The connection and coupling of different protein machines in the cell emphasizes the complex environment in which spliceosomal assembly is embedded. The vast amount of experimental data makes it necessary to structure the information for deriving new hypotheses about the underlying signal transduction processes. This task requires the development of theoretical models, which are able to integrate much of the existing data while being suited for rigorous validation and stepwise extension. Modeling goals can be summarized in i) visualization, ii) comprehensible data annotation iii) data abstraction, and iv) model simulation, allowing mathematical description of the model. In particular, alternative splicing often involves different sets of splicing factors in addition to the spliceosomal core components. Hence, variations of the network should provide a sound base for testing new hypotheses on regulation of splicing and AS events. Several structural and kinetic factors were proposed to influence splicing patterns of a gene such as i) precise balance and concentrations of regulatory proteins ii) the nature of interaction like inhibiting or cooperative effects iii) the number of interactions, which define the connectivity of a network iv) the speed of transcriptional elongation or recruitment of splicing factors [Park et al., 2004; House and Lynch, 2007]. Additionally, a temporal component can be accounted as the assembly pathway bears a number of timed dependencies. These key points pose an essential base for modeling spliceosomal processes, but most of them can presently not be comprehensively be applied. Incompleteness of experimental data, for example, the lack of knowledge about interaction kinetics, concentrations or the exact temporal order of reactions is accompanied by heterogeneity of proposed mechanisms for different stages of spliceosome assembly. Protein interaction databases such as STRING [von Mering et al., 2007], cross-referencing to MINT, HPRD, BIOGRID, DIP AND REACTOME, APID [Prieto and Rivas, 2006] or INTACT [Kerrien et al., 2007], already provide an extensive organisation and integration of experimental and predicted protein interaction data. However, most of them represent static interactions (Fig. 1a) without providing information about the temporal order and direction of interactions within hierarchical networks. This underscores the need of finding ways to incorporate additional knowledge from literature, what in turn helps in the analysis of hierarchical processes by providing insights into the progression of signals and protein interactions in networks such as spliceosome assembly. Figure 1a depicts an unordered-undirected protein-protein interaction network as reviewed in [Zhang et al., 2007], and a directed ordered interaction network presented in this work (Fig. 1b). Due to lack
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
247
of kinetic properties, the development of a discrete structural model was considered as a good initial choice for the computational analysis of the spliceosome. We chose Petri net (PN) theory, because it offers the advantage to combine knowledge at different abstraction levels, an intuitive vizualization and mathematical formalism, using graph-theoretical elements. PNs have been applied to model, analyse, and simulate biochemical networks [Reddy et al., 1996; 1997; Sim˜ao et al., 2005; Koch and Heiner, 2008] of different types. Meanwhile, metabolic networks [Hofest¨adt, 1994; Hofest¨adt and Thelen, 1998; Koch et al., 2005; Zevedei-Oancea and Schuster, 2003] as well as signal transduction [Heiner et al., 2004; Sackmann et al., 2006] and gene regulatory networks [Matsuno et al., 2000; 2006; Grunwald et al., 2008] were successfully modeled using PNs. There are parallel approaches to structural modeling such as those based on elementary flux modes of [Schuster and Hilgetag, 1994; Schuster et al., 1999] and on extreme pathways [Schilling et al. 1999]. They have primarily been applied to metabolic pathways (for reviews see [Papin et al., 2004; Schuster et al., 2007]) but also to signaling networks [Xiong et al. 2004]. An interesting question is, whether the model of stoichiometrically quantifiable mass flow inherent to metabolic networks, can be transferred to models of information flow as occurring in signaling networks. This has been addressed by methods modeling the mating pheromone response pathway [Sackmann et al., 2006], the human iron homeostasis pathway [Sackmann et al., 2007] and apoptosis pathway [Heiner et al., 2004]. Recently, a model of the U1 snRNP subcomplex assembly demonstrated the applicability of PN theory [Kielbassa et al., 2008]. The present work provides the first theoretical model on the entire spliceosomal assembly pathway. Experimental evidence from literature had to be translated into single reactions steps according to PN language. By connecting these reactions a model network was formed, consisting of protein and RNA species, which relay signals rather than mass. The structural network exhibits dependencies and concurrencies of reactions, but no kinetics, which is still unknown for most stages of spliceosome assembly. Spliceosome assembly has been shown to form a complex interacting network of subcomplex formation [Makarov et al., 2002] best described as an allosteric cascade [Brow, 2002] that frequently involves cooperativity [Berglund et al., 1998]. Already a decade ago, it was proposed that the composition of spliceosomal snRNPs could be different on different splicing substrates in a context-dependent manner (tissue type, developmental stage). This has the consequence that many different combinations of factors could potentially give rise to different types of active spliceosomes, thus, increasing the potential for splicing regulation [Fortes et al., 1999]. In light of the many proteins that up to now have been identified around the spliceosome, it is necessary to go beyond cataloging the factors. Putting protein and mRNA factors together into the context of a larger network poses a difficult task and requires the integration of a standardized vocabulary to unambiguously describe all possible reactions. While this task will remain a challenge for ontology and text mining specialists, we started to summarize reactions in a PN model. We present a set of analysis strategies accustomed to Petri net representation of signaling networks. The paper is organized as follows. In the next chapter biological foundations of spliceosomal assembly are presented. The following chapter gives methods and definitions used, including MCTS and T-cluster. We continue with a description of PN modules, which are suitable to describe biochemical reactions of signaling processes. Next, the biological reactions referring to different stages of spliceosome assembly are described and formalized. In the results chapter, T-invariants and MCTS are described and interpreted. Finally, a dendrogram and color map representation are introduced to further explore the model network. Biological background Spliceosome assembly is a hierarchical process that progresses through several main stages designated (P →) E → A → B → C (→ P). Each stage E, A, B, C describes a complex that is built from a pool (P)
248
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
of proteins present in the nucleus, which are recycled for repeated rounds of spliceosome assembly. The mature spliceosome catalyzes the splicing reaction in eukyaryotic messenger RNAs. E-complex (commitment complex) The recognition of the 5’ splice site (5’ss) is initiated by the early interaction of the U1 specific protein U1C with the 5’ss sequence [Du and Rosbash, 2002]. Additionally, interactions via phosphorylated RS domains of the protein U1-70K with splicing enhancing factors, which bind to nearby located specific enhancer motifs, can direct the U1 snRNP to a specific 5’ss [Cao and Garcia-Blanco, 1998]. It was demonstrated that 5’ss with low complementarity to the canonical eukaryotic 5’ss are still selected by U1 snRNP due to interactions between U1 snRNP proteins and splice enhancing proteins, such as ASF/SF2 or TIA1 [Cao and Garcia-Blanco, 1998; F o¨ rch et al., 2002]. This is in agreement with the earlier observation that donor site selection by U1 snRNP can be maintained by the SR protein, SC35, in case of an impaired donor recognition motif in the U1 snRNA sequence [Tarn and Steitz, 1994]. In contrast, a weak 5’ss might also be selected by variants of the U1 snRNA [Kyriakopoulou et al., 2006]. This led to the interesting hypothesis that 5’ss selection within the E-complex might be possible even without the binding of U1 snRNP. Experiments indicate that the presence of higher concentrations of SR proteins enable such a spliceosomal pathway [Crispino et al., 1996]. At the 3’ end of the intron, E-complex assembly involves U2 auxiliary factor, U2AF, which is a heterodimer of a 65 and 35 kDa subunit. Together with SF1 (in mammals designated branch point binding protein, mBBP), these factors recognize via their RNA recognition motifs (RRM) the polypyrimidine tract, 3’ss and branch point, respectively. This step was shown to occur in a coordinated way involving cooperativity between SF1 and U2AF65 [Berglund et al., 1998], which is mediated by interactions between the RS domains of these proteins [Shen and Green, 2004]. Additional interactions reported during E-complex assembly include the bridging of U1 snRNP at the 5’ss with factors bound to the branch point and 3’ss, for example, by action of the SR protein FBP11 [Abovich and Rosbash, 1997]. It was also shown that at this stage, the U2 snRNP can be in close proximity to the U1 snRNP, suggesting the dependency of E-complex formation on the presence of factors required for the next assembly stage [Das et al., 2000, D o¨ nmez et al., 2007]. A summary of models of E-complex assembly is depicted in Fig. 2. A-complex (pre-spliceosome) The A-complex contains the stably integrated U2 snRNP, which is assembled out of two heteromeric subcomplexes SF3a and SF3b via an intermediate 17S complex. Prior to their contacts to the U2 snRNA, SF3a and SF3b are formed by several interactions between U2 core components (e.g., SF3b155 with SF3b14 or SF3a120 with SF3a60) [Dybkov et al., 2006]. In this stage, U2 snRNP interacts with the branch point site via base pairing between U2 snRNA and the pre-mRNA under consumption of ATP [D¨onmez et al., 2007]. Additionally, the transition from the E-complex to the A-complex is supported by the SF3b protein, SF3b155, which binds at both sides of the branch point and interacts simultaneously with the U2AF65 subunit and the SF3b14 protein [Gozani et al., 1998; Spadaccini et al., 2006]. The ATP requirement during A-complex formation has been reasoned by two other U2 snRNP proteins, SF3b125 and hPrp5, which are members of the DExD/H family and may function either as helicases or RNPases [Xu et al., 2004]. Such auxiliary enzymes could be recruited from proximal cajal bodies or speckles [Will et al., 2002]. These helicase like proteins have been shown to function as generic ATPases that bind and hydrolyze NTP to unwind double-stranded RNAs (dsRNAs) [Staley and Guthrie, 1998]. In contrast, the ADPbound form can modulate the annealing of complementary RNA strands. In context of the spliceosome
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
Fig. 2. Processes of E-complex assembly as drafted before transfer into the Petri net model. Numbers at the edges refer to the column “ID” in the table of reactions (Supplementary Table S1). The figure summarizes four different ways of E-complex assembly, described in literature and centering around donor and acceptor splice cite recognition. Unbranched or unlabeled arrows represent an intermediate step for better visualization, without an explicitly given reaction. Dashed arrows denote interactions within subcomplexes. Abbreviations: PTT = polypyrimidine track, 5’ss = donor splice site, 3’ss = acceptor splice site, BPS = branchpoint sequence, 5term = 5’ terminal exon. (Colours are visible in the online version of the article at www.iospress.nl.)
249
250
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
DEAxD/H box helicases are governing structural rearrangements to shape the active complex. However, many of these ATPases function in a generic way hydrolyzing also other RNA species. Thus, they have to be specifically activated for catalyzing the correct structural rearrangements, not least to ensure fidelity of the splicing reaction. It was proposed that additional splicing factors interact with DExD/H box helicases to direct their activity to specific substrates during the assembly process. SF3b125 was detected only in low amounts in the SF3b subcomplex, but not associated with the 17S U2 snRNP, and hPrp5 was present in the 17S U2 snRNP, but not in the SF3b subcomplex [Will et al., 2002]. The protein hPrp5 exhibits an ATP independent function, which stabilizes U2 snRNP to the BP [Perriman et al., 2003], and was proposed to function as bridge between the U1 snRNP and the U2 snRNP at the time of A-complex formation [Xu et al., 2004]. Accompanying the U2 snRNP rearrangements, catalyzed by hPrp5 under ATP hydrolization, the contacts of the protein SF3a60 to the U2 snRNA are significantly reduced upon association of U2 with the BP in the A-complex [Dybkov et al., 2006]. Another DExD/H box helicase, which is required for U2 snRNP / BP interaction, is UAP56, whose recruitment also depends on the protein U2AF65 [Fleckner et al., 1997]. The reactions of the individual assembly stages are described and formalized in Supplementary Table S1. B-complex (active spliceosome) In this stage, U4, U5, and U6 snRNPs enter the assembly pathway as tri-snRNP complex. This subcomplex is formed in a separate way, where U4 and U5 snRNP assemble, similar to U1 snRNP, from a family of seven RNA binding proteins, termed Sm proteins [Will and L u¨ hrmann, 2001; Liu et al., 2006], whereas the U6 snRNA is bound by a different set of proteins termed Sm-like proteins [Beggs, 2005]. Subsequently, the U4 snRNP and the U6 snRNP form a duplex via base pairing of their snRNAs, resulting in a structural conformation, which is stabilized by several additional proteins, for example hSnu13, Prp3, Prp4, CypH and Prp31 [Liu et al., 2006]. The U5 snRNP contains several proteins, important for structural rearrangements prior to the first catalytic step of splicing, most important the DExD/H box helicases Brr2 and Prp28 and the two proteins Snu114 and Prp8 [Laggerbauer et al., 1998; Liu et al., 2006]. An interaction between Prp6 and Prp31 was proposed to serve as bridging step between the U4/U6 snRNP complex and the U5 snRNP, preparing the formation of the U4/U6 U5 tri-snRNP complex [Schaffert et al., 2004]. The additional proteins Snu66, Sad1 and 27K, help to stabilize the intermediate tri-snRNP complex. The latter is recruited to the spliceosome which, however, is still catalytically inactive. An intermediate state, designated pre-catalytic B0 complex, was shown to accomodate a highly flexible tri-snRNP structure, possibly for integrating other components, for example, the protein Prp19 [Boehringer et al., 2004]. Prior to the conformational rearrangement required for spliceosome activation, U1 snRNP is dissociating from the 5’ss enabling U6 snRNP to contact the donor splice site. This step involves the ATPase, Prp28, which was found to counteract the stabilizing effect of the U1 component U1C with the U1 snRNA [Chen et al., 2001]. Prp5 could leave the spliceosome at this stage as it was demonstrated to function mainly before or during A-complex assembly [Will et al., 2002]. The U5 snRNP components Brr2, Prp8 and Snu114 are critically for unwinding the U4/U6 snRNA stemloop. This step resembles a G-protein activating mechanism, where Snu114 enters a GTP dependent state, and subsequently activates the helicase Brr2 [Turner et al., 2004; Small et al., 2006]. This conformational change also involves Prp19, a protein of the Nineteenth complex (NTC) [Makarova et al., 2004]. Brr2 is further involved in interactions with Prp16 and U1-70K [van Nues and Beggs, 2001; Liu et al., 2006]. After unwinding, the U4/U6 snRNA duplex, U2 and U6 snRNP establish several interactions with the premRNA via their snRNAs, whereas a binding motif within the U6 snRNA directly contacts the 5’ss. Subsequent release of
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
251
U1 and U4 snRNP results in the B* complex, which forms the catalytic core of the spliceosome [Staley and Guthrie, 1998, Liu et al., 2006]. The B-complex, composed of U4/U6 U5 tri-snRNP in close contact with the U2 snRNP, performs the first catalytic step of splicing by nucleophilic attack of the branch point adenosine to the phosphate ester bond of the 5’ss, and is subsequently converted into the C-complex. C-complex The C-complex contains U2, U5 and U6 snRNA at a stage subsequent to the first catalytical step since splicing intermediates can be found in this complex. The conformation is centered around the protein Prp8, which is thought to serve as a “surgery table”, connecting the already free 5’ss and fixating the 3’ss such that the second transesterification can open the downstream intron-exon junction [Turner et al., 2004]. With the ligation of the free exon ends, the intron and its bound snRNPs are released as lariat complex. Subsequently, bonds between the U2, U6 and U5 snRNA are broken, involving the helicase Prp43, and the U5 snRNP is dissociating into its components, being available for a new cycle of spliceosome assembly [Makarov et al., 2002]. Since Prp43 can be found in the 17S U2 complex [Will et al., 2002], which forms during early A-complex assembly, it is conceivable that this protein is present in several stages of the spliceosome assembly pathway. Additional factors support the recycling process, for example, Prp24, which reanneals the U4 and U6 snRNAs and allows regeneration of the U4/U6 snRNP duplex [Gottschalk et al., 2001]. Two important helicases, Prp16 and Prp22, impose kinetic proofreading activity and can subject suboptimal splicing substrates into a proposed discard pathway [Burgess and Guthrie, 1993; Villa and Guthrie, 2005; Mayas et al., 2006]. It is important to note that the catalyzing function of the spliceosome can experimentally be reduced to its RNA parts, thus, making it functioning as a ribozyme [Valadkhan et al., 2007]. However, the protein scaffold is necessary to form the structural environment (RNA conformations) necessary to enable the splicing reaction. Moreover, the participating proteins establish important links to other cellular processes, for example, transcription or nuclear export. These known experimental results were taken as base to extract reactions applicable for designing the PN model. This often requires to consult several literature reports to model reactions, which are only vaguely or contradictory described. Depending on available data, reactions and their participating factors were summarized or abstracted. We introduce a mnemonic labeling for reactions (e.g., “ bdg” for binding “ matur” for maturation, “ ass” for assignment). All reactions used for the model are summarized in Supplementary Table S1. METHODS AND DEFINITIONS Definition of Petri nets We modeled the spliceosomal assembly network as a P/T net (see Box 1). Places correspond to biological objects (e.g., RNA regions, protein factors, protein complexes etc.), whereas transitions correspond to processes, which act upon objects (e.g., protein interaction, phosphorylation reactions, proteinmRNA binding etc.). The direction of arcs defines pre-places (pre-transitions) and post-places (post-transitions). Tokens represent movable objects. They are used to model the equivalent of signal or mass flow units as a number of molecules (e.g., mole) and are symbolized by black dots on places. The maximal number of tokens that a place can hold is defined by its capacity. The distribution of tokens over all places is called a marking. Each marking defines a certain state of the system. Transitions without pre-places are called input transitions and represent sources. Transitions without post-places
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
252
Box 1: PN-Definition
Box 2: PN firing rules for P/T-nets
A Petri net is a directed, labeled, bipartite graph consisting of places (circles), transitions (rectangles) and weighted arcs (arrows). It is defined as a six-tuple Y = (P, T, F, K, W, M0) [Baumgarten, 1996]: i.The tuple (P, T, F) is a net graph N with
Activity of reactions in a P/T-net (defined in Box 1) is simulated by firing of transitions, which is symbolized by a token change. A transition t can only fire, i.e. if it is enabled by satisfying following two conditions: 1. M(t ) W(pi, t) : all its pre-places are occupied by at least as many tokens as the weights of the incoming arcs prescribe.
a. P = {p1, p2, ..., pn}; set of places b. T = {t1, t2, ..., tn}; set of transitions c. F (P T) (T P) 0/ set of arcs (flux relations of N) d. P
T
0/
e. P
T
0/
{ }; capacity of places
ii.K : P iii.W : F
2. K(t ) W(t, pi) : all its post-places must have at least a capacity as the the weights of the outgoing arcs dictate
; arc weights
iv.Dynamics of PN is realized by movable objects, the tokens, which define a M:P { }; marking of Y if p P : M(p) K(p) with:
M0 : P
0
: initial
marking
Transition firing behavior: a) Transition t1 is enabled because pre-places p1 and p2 are occupied by as many tokens as their arc weights prescribe. Missing arc weight means per default an arc weight of one, b) After firing of t1 two tokens of p1 and one token of p2 are consumed and three tokens of p3 are produced. Note that the number of consumed tokens must not necessarily be equal to the number of produced tokens.
M0 M’: the consecutive marking given a firing sequence w M0 defines a firing sequence, which is activated under M0 for which holds: w = 0/ M’ = M0 or M’
M(Y) : M0
M’
are called output transitions and represent sinks. These transitions model the interface to the system’s environment. The dynamic behavior of the network is realized through the firing of transitions, which model the activity of biochemical reactions. A transition can fire (is enabled) if all pre-places are covered by at least as many tokens as indicated by the corresponding arc weights. During firing of a transition according to the corresponding arc weights and firing rules (Box 2), the number of tokens is decreased on the preplaces and increased on the post-places at the same time (Box 2). Consequently, the marking of the net is changing, resulting in a new state of the PN. Note that in signal transduction networks without mass flow and reaction stoichiometry, it is reasonable to interpret arc weights rather as “information units” with a default value of one than as reaction quantities. Starting from an initial marking M 0 , we define w a firing sequence M 0 →, w = t1 . . . tk ∈ T as a subset of transitions of the PN, which corresponds to a specific signal propagation through the biological network. For each firing sequence w, a frequency = (#(t1 , w) . . . #(tn , w)) (also called Parikh-vector) can be assigned, which indicates how often vector w each transition fires. The change of the net marking can be determined by: M0 → M = M0 + C · w w
(1)
whereat C is the incidence matrix of the PN, in which rows and columns correspond to P and T ,
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
253
respectively, and the matrix elements describe the change of token number on a place, when a transition fires. For metabolic networks, the incidence matrix corresponds to the stoichiometric matrix. The firing sequence w can be determined from the solution of the homogeneous equation system C ·w = 0, which is valid if the signal or information flow within the network is assumed to be conserved. This forms the base to compute system’s invariants from the incidence matrix, which can be divided in two different vectors w = x or w = y depending on the orientation of C . The vector x is called nonnegative place invariant (P-invariant) if it solves the homogeneous equation system C T x = 0 : x1 . . . xn ∈ N0
(2)
The elements of a P-invariant can be interpreted as conservation relation for tokens. For an initial marking M0 holds: [M0 ] > N := {M0 w|w ∈ T ∗}
(3)
∀M ∈ [M0 ] > N : M · x = M0 · x
whereat [M0 ] > N defines the set of reachable markings and M is a consecutive marking (see Box 1) that can be reached from M 0 by firing of w, i.e., a subset of the reachability set. The solution vector y is called non-negative transition invariant (T-invariant) if the following equation holds: C · y = 0 : y1 . . . yn ∈ N0
(4)
A T-invariant is a transition sequence that after firing reproduces a marking (state) of the network Eq. (4). A T-invariant’s Parikh-vector indicates how often each transition has to fire in order to reach the same state (marking) again. In the following, we write y as y for short and Y = {y 1 , . . ., yn } for the set of all T-Invariants that can be computed from the incidence matrix C . The support of y , written as supp(y ), contains the nodes corresponding to the non-zero entries of y , i.e., supp(y ) denotes all transitions that belong to a T-invariant. T-invariants are minimal if there exists no smaller positive T-invariant y : (y− y ) > 0 [Baumgarten, 1996], i.e., the support of y does not contain the support of any other invariant y and the largest common divisor of all non-zero entries of y is equal to 1. Hence, minimal T-invariants are not further decomposable into smaller T-invariants. The same holds for minimal P-invariants. In the following the term “T-invariant” (“P-invariant”) stands as abbreviation for minimal non-negative T-invariant (Pinvariant). T-invariants can be interpreted as flux vectors. In biochemical terms, it was shown that under steady state conditions a metabolic network can be decomposed into sets of minimal meaningful reaction sequences (elementary flux modes), which form a variety of flux patterns when expressed as nonnegative linear combinations [Schuster and Hilgetag, 1994; Schuster et al., 1999]. Elementary flux modes correspond to T-invariants. The spliceosomal assembly net was modeled as transition-bounded net, i.e., no places without pre- or post-transitions exist, but transitions without pre- and post-places. The reactants of sinks and sources are assumed as buffered substances at fixed concentrations [Schuster and Hilgetag, 1994]. In biological interpretation that means, all reactants feeding input transitions or leaving output transitions are considered to be external. All others are internal.
254
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
In our model, all reactants (places) are modeled in non-limited amounts with a capacity of K(p) → ∞ and an initial marking of one token per place to enable each transition. The model has been validated using PN analysis techniques. We determine the static and dynamic properties, using the programs INA [Starke, 1998] and TINANET [Thormann et al., 2009]. With increasing network size and complexity, the number of T-invariants can exponentially grow. Two approaches were employed to further decompose the network and thus, to facilitate the validation of the model: i) decomposition into disjunctive subnetworks (MCTS) and ii) decomposition into overlapping subnetworks (T-clusters). Partitioning of T-invariants into MCTS MCTS [Sackmann et al., 2006] are based on a matrix D in which rows and columns correspond to T and Y , respectively, with T defining the set of transitions and Y defining the set of T-invariants. Each row constitutes a subset I ⊆ Y of T-invariants that share a considered transition t. Hence, I(t i ) denotes the subset of T-Invariants, which share the transition t i . Biologically, this means that a specific reaction is part of a certain number of all possible and minimal steady state signaling pathways within the network. All transitions, which in this way are exclusively shared by the same set of T-invariants, form an MCTS A ⊆ T for which holds: ∀ti, tj ∈ T : ti , tj ∈ A ↔ I(ti ) = I(tj ) :
(5)
This can be also expressed by: ∀y ∈ Y : A ⊆ supp (y) ∨ A ∩ supp (y) = ∅
(6)
All reactions of an MCTS occur always together, i.e., reactants and enzymes have to follow a similar scheme of regulation. The transitions of an MCTS and the places in between describe a subnetwork. Clustering of T-invariants We performed a clustering of T-invariants to find similarities imposed by transitions that are shared between different T-invariants. T-clusters define subnetworks that can overlap or contain each other [Grafahrend-Belau et al., 2008]. They facilitate the identification of traversed routes, which are formed by common subsets of reactions and highlight more important structures within the net. The clustering was computed using the Tanimoto coefficient [Grafahrend-Belau et al., 2008] as distance measure, which is also known as binary distance or Jaccard index [Cormack, 1971]. The corresponding distance tree of related T-invariants was constructed using the UPGMA-algorithm [The R Development Core Team, 2005]. A threshold of 80% was chosen to merge T-invariants with less than 20% difference into the same subtree. The same distance measure and clustering algorithm was used to create a color map. A color map is a graphical way of displaying matrices by using colors to represent the numerical values. Due to the binary (on/off = present / non-present) representation of transitions within the support of T-invariants, a simplified two color mode was chosen, where dark (light) blue tones indicate the presence (absence) of a reaction. The color map also re-arranges rows and columns of the distance matrix such that similar rows, and similar columns, are grouped together, according to the distance tree. This representation facilitates the visualization of block patterns of transitions, shared by different T-invariants.
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
255
Model refinement Many biological signaling processes in the human cell are well documented, for example the caspase cascade of apoptosis [Jin and El-Deiry, 2005] or the communication network of cytokines between immune cells [Haddad, 2002]. Reactions of these signaling pathways can be found in databases as KEGG [Kanehisa et al., 2008] or TRANSPATH [Krull et al., 2006]. Although the spliceosome is for many years under investigation, no consistent and wholistic network has been published so far. Reactions involved in spliceosome assembly were biochemically described, but not formalized (see Fig. 2). After setting up the model of E-complex assembly, we iteratively included further reactions (assembly stages) and revalidated the model. All PN representations in this study were created using the PN editor SNOOPY [Rohr et al., 2010]. RESULTS Representation of functional modules as Petri nets Inspired by previous suggestions [Reddy et al., 1996; Takai-Igarashi, 2005; Chaouiya, 2007], we designed at first a number of small net modules, which describe different reactions or interactions between spliceosomal components. To reach a valid model, these net modules are useful for testing modeling strategies, which appropriately reflect observed biologically behavior of parts of the network. In general, the modeling of biological meaningful modules within signaling pathways strongly depends on the depth of experimentally verified knowledge of the described mechanism. The following net modules have been used for modeling the basic reactions of spliceosomal assembly: 1. Allosteric interaction describes the process in which a protein binds to a specific domain of a target protein, induces a conformational change at a distant site, and hence rendering the target protein itself active or inactive. In spliceosomal processes this concept can be extended to the level of protein complex association, where the binding of special factors is crucial for subsequent progress through intermediate assembly stages. A module for this biochemical process is decomposable into four T-invariants, two of which being cycles that describe the repeated association and disintegration of the intermediate complex, AB, and the final complex, ABC (see Fig. 3). Note that dissociation is restricted to AB +C or A + B + C , since AC or BC are “forbidden” by the allosteric rule imposed during complex formation. The same model strongly reduces structural complexity by exhibiting one T-invariant, covering all transitions. Further special cases of allostery can be distinguished, and accordingly different net modules were designed. (a) Allosteric inhibition depends on the presence of a specific domain within a subcomplex. An inhibitor may bind to the complex, inducing either disassembly or non-availability of the complex for specific downstream interactions (see Fig. 4). This leads to an extension of the module depicted in Fig. 3b. The heterodimer AB can either participate in further reactions (AB +C ), dissociate or bind to an inhibitory factor I . After sequestration of complex AB, the non-functional complex IAB can not participate in the assembly pathway anymore. Thus, it is modeled as output transition (IAB out). The module designed with dissociation reactions again results in several T-invariants (data not shown), including sustained cycles of associations and dissociations. In contrast, the module reduced for dissociation reactions exhibits two
256
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
Fig. 3. Example of a module for protein complex association where two molecules A and B form a heterodimer, AB, which defines a necessary step for binding the factor, C, in progressing the complex assembly through complex ABC. (a) PN module with reactions, forming (solid lines) and decomposing (dashed lines) intermediate complexes. Two of four T-invariant pathways, completely covering this module, are colored, and define the main signaling route (blue) and a cycle (red); green = source factors, gray = intermediate factors (complexes); blue = target complex; (b) The same model without dissociation reactions strongly reduces structural complexity, leading to one T-invariant. (Colours are visible in the online version of the article at www.iospress.nl.)
minimal T-invariants, reflecting the important aspects of functional ABC and non-functional IAB formation. The simplified version may suffice in many cases, in particular when time points of protein activities are unknown, for example, the time when a specific factor dissociates from an intermediate complex. This net module is easily derived from the basic allosteric cascade shown in Fig. 3b) by adding the node I and introducing an additional edge that purges AB.
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
257
Fig. 4. Reduced PN module which describes the formation of an inhibiting intermediate complex (IAB). The blue highlighted pathway covers reactions, which result in a functional target complex ABC, while the red colored pathway describes a T-invariant covered by reactions, which result in a non-functional complex IAB. (Colours are visible in the online version of the article at www.iospress.nl.)
(b) Allosteric enhancement describes the presence of a specific protein factor, which increases the affinity of other proteins to participate in subsequent reactions (e.g., subcomplex formation, RNA recognition etc.). The structural analysis gives four T-invariants, two of which produce the target complex AB. The interaction of factors A and B can result in a dimerized complex AB (named “assoc AB”, Fig. 5a). However, the enhancer may be necessary for the protein (complex) to be active. The model accounts for the presence of enhancer E with a higher output of AB (Fig. 5a, blue pathway) due to an increased arc weight. Hence, transition AB out has to fire twice to reproduce the initial marking. Biologically, this can be interpreted as an increased signal transduction as AB reaches a state of higher disposition for participating in downstream reactions. The reduction of the system for the dissociation reaction of the dimer AB decreases the number of T-invariants by one (Fig. 5b). Two T-invariants involve transition assoc AB, but only one produces AB, while the other purges AB from the network (Fig. 5b, red pathway). The pathway involving E via reaction assoc ABE produces an increased amount of AB. Thus, the reduced model captures all essential aspects of the enhancer dependent complex formation. 2. Enzymatic reactions describe the reactions, in which a catalytically active enzyme acts on molecular groups of spliceosomal proteins, e.g., kinases phosphorylate proteins. This behavior was modeled as loop, which preserves the marking of the respective place and results in a simple conservation relation (Fig. 6a). Also, helicase-like proteins with DExD/H box domains have been frequently found in purified spliceosomes [Jurica and Moore, 2003], and were considered as separate module. This module was extended by another reaction, representing the enhancement of substrate specificity of the helicase (see Fig. 6b). In this context, the rate of NTP hydrolyzation has been proposed as a crucial parameter for splicing fidelity, since fast kinetics on weak or incorrect protein-substrate interactions increases the chance of dissociation of essential spliceosomal proteins. In consequence, such defective substrates could be submitted into a degradation pathway [Staley and Guthrie, 1998]. While a putative degradation pathway was integrated as a branch into
258
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
Fig. 5. PN module for enhanced protein-protein interaction. The presence of enhancer proteins stabilizes complex formation and results in an increased output of dimer, AB, compared to dimerization without the influence of enhancer protein, E. Note that in contrast to Fig. 3, enhancer E only mediates a temporary effect until the dimer, AB, has stabilized its interaction. (a) The module with a dissociation transition for dimer, AB (dashed line), which results in four T-invariants, including one with an arc weight of two, which corresponds to a higher output of AB (blue pathway), and one producing AB via a possibly undirected self-stabilized interaction (red pathway). Enhancer is regenerated (dashed line). (b) Replacement of the dissociation reaction by an output transition (red pathway) reduces the number of T-invariants, while preserving the essential model function of enhancer dependent complex formation. (Colours are visible in the online version of the article at www.iospress.nl.)
the PN, the hydrolyzation activity could not be modeled, because of the lack of kinetic parameters. Finally, since several DExD/H box proteins are involved in spliceosome assembly, the total accuracy of splicing may depend on the cumulative success of enzyme modulated signal transduction along the pathway, further complicating the corresponding kinetic model.
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
(a) Phosphorylation
259
(b) Specific activation of helicases
Fig. 6. PN modules, modeling enzymatic activities found in protein complex assembly. (a) The model of protein activation where phosphorylation of a specific domain mediates interacting capabilities, thus, influencing subsequent complex rearrangements; (b) a specific factor binds to a generic helicase, thus, mediating substrate specificity. (Colours are visible in the online version of the article at www.iospress.nl.)
Signal reactions of the spliceosome assembly pathway In an iterative process, the literature was inspected to isolate reactions involved in spliceosome assembly. Reactions and involved proteins are based on human spliceosomal processes. All reactions are ordered according to the four major stages of spliceosome assembly, E-, A-, B- and C-complex (see Supplementary Table S1). After drafting each major stage the structural composition of the assembly network was evaluated by T-invariant analysis. The net was only extended if each reaction is part of at least one T-invariant. Reactions are only included if the factors could be integrated in a causal order. As consequence, some spliceosomal factors, e.g., SPF27, CypE, HSP73 and others, are omitted from the model due to lack of evidence for a specific time point at which they participate in the spliceosomal assembly process. Further, it is assumed that all factors for which no evidence of leaving or destabilization in spliceosome assembly is given, implicitly remain in the model, until, finally a dissociation into substructures (e.g., snRNPs) and their recycling takes place. Hence, not all factors modeled with an input transition have an explicit output transition. This is reasonable since many substructures remain intact for repeated rounds of spliceosomal assembly [Pandit et al., 2006]. This results in a model with 140 reacting species (places) comprised of RNA, proteins and intermediate complexes was established, which covers about half of the currently known spliceosomal proteins (Fig. 7). The final network consists of 161 transitions. 92 (57%) of them are boundary transitions, with 68 (74%) input and 24 (26%) output transitions. In total, 69 (43%) transitions describe the internal reactions of the assembly network. This biological network was modeled by reactions, which reflect a certain hierarchy that is characteristic for spliceosome assembly. Hence, it can be seen as a process, in which almost all events are part of a causal relationship, thus we expect the model not to reflect concurrent events, that are, e.g., signaling pathways that occur independent from each other.
260
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
Fig. 7. The complete network of spliceosome assembly modeled as transition bounded Petri net. The different stages are labeled with capital letters: E = E-complex, A = A-complex, B = B-complex, TRI = tri-snRNP complex, C = C-complex and R = recycling pathways. Places are colored according to different functions: orange = proteins with RS domains; blue = RNA; magenta = DExD/H box proteins functioning as ATP dependent “RNA unwindases”; gray or hatched = logical places, indicating equal places occurring more than once in the figure, but not in the underlying graph structure of the PN. Transitions represented by two squares introduce hierarchical nodes, which connect to further reactions at a lower network level (see Fig. 9). (Colours are visible in the online version of the article at www.iospress.nl.)
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network ASFp_ex_in
U1_in
5ss_in
U1_CBC_bdg
TIA_intron_bdg
Prp5_in
3ss_in
BPS_in
PPT_in ASFp_ex
U1_CBC
5ss
Prp5
TIA_int 5ss
U2AF65_in
3ss
U1
U2
SF1_in
PPT_inhib
U2A_in
U2AF35
U1_5ss_bdg
U1C_TIA1_bdg2
5ss
U1_Prp5_U2_bdg
U1
SF3b_in
12S_U2_core_in
SF1_BPS_bdg
SF1
SF3b
15S_U2_matur
U2AF35_3ss_bdg ADP
ATP
U1C_TIA1_bdg1
U2B 12S_U2_core
U2AF_dim U1_CBC_5ss_bdg
U2B_in
U2A
U2AF
PPT
U1 ASFp_U170K_bdg
U2AF35_in
U2AF65
BPS
261
SF1_BPS
U1_5ss
17S_U2_matur1
ASFp_U1_5ss
15S_U2
SF3a_in U1_5ss
SF3a SC35
SF1_BPS
U2AF_PPT_3ss
FBP11_in TIA_int
U1_SC35_SF1_bdg
SF1_U2AF_bdg
5ss
SF3b125
SF1_U2AF SPF30_in
5ss FBP11
SPF30
U1_5ss
SF3b155_U2AF65_bdg U1_SC35_SF1
SF1_U2AF U1_Prp5_U2
SC35
U1_indep_5ss_act
ATP
U1_SF1_bdg
U170K_U2AF35_bdg
17S_U2_matur2
hPrp43
U1_5ss_U2_U2AF_bdg
SF1_BPS
SC35
SC35_in
U2 ADP
SPF45_in
U2
SPF45
SF3a
U2_PPT
U2_SC35_bdg U2AF_SC35_5ss
SF3a60 U1_SC35_U2
U1_SF1
unwind1_U2_stl2
U1U2_bridge
Prp5
ATP unwind2_U2_stl2
U2AF_PPT_3ss
ATP
SF3a60
Prp5
ADP
U1U2_remod
U1_SF1_U2AF_bdg
SF3a60 ADP
U1_SF1_U2AF
UAP56 UAP56_in
unwind3_U2_stl2
U2_remod1
UAP56_U2AF65_bdg2
U4_in
U6_in
U2_remod2 ATP
UAP56
SF1_out
SF1
U6
U4
U1U2_UAP56_ass
U2_BPS_bdg2 UAP56
U2_UAP56_ass U2AF
UAP56_U2AF65_bdg1
SF1
SF1
U2AF
Prp24 Prp31
U2_BPS_bdg3 ADP
SF1
Prp24_in U4_U6_bdg
U2_UAP56_ass
U2AF
U2_BPS_bdg1
U4U6_complex
U2_UAP56_ass
U2_BPS
U1U2_BPS_bdg U1U2_BPS
20S_U5
A Complex Prp38
Prp5 ATP
Prp28
ADP Prp5_out
U5_in
ADP
ATP
U4U6_U5_bdg
U1C_diss
CypH_trimer_in
hPrp6_in hPrp6
CypH_trimer
U5_Prp8 U4U6U5_conf1 U1C
hSad1 ADP
ATP
hLin1
hSad1_in
U1_5ss_destab
tris_27K_in
hSnu66_in
U4U6U5_stab tris_27K
hSnu66 U4U6U5_conf3
hDib1
hDib1_out
U4U6U5_conf3 hDib1
Prp28
Prp28_out U1
Prp38
U4U6U5_conf2
U1_out
U6_5ss_bdg2
U4_out
U4
GTP
Prp28
U6_5ss_bdg1
GTP_in Snu114_Brr2_act
Prp38 U4
GDP_out
U2U6U5_5ss
GDP
NTC_form
Brr2_ass
Lsm
NTC_heteromer
Prp3
ATP CypH_trimer
B_compl_act
Prp19
Prp19_in 14S_NTC_Prp19
SKIP_in
Prp19_stab
CypH_trimer_out U4U6_uwd
Prp19
ADP
Prp31 SKIP U2_3ss_U6_5ss_U5
U4U6U5_conf3 HSP73_in
DDX35_in
ASF_SF2_out SC35_out
Spp2 Spp2_in
Spp2_Prp2_sact
ATP
1st_catal_step
ASF_SF2
Prp2_ass
HSP73
Prp2_in
SC35
ATP
ADP
DDX35
ADP Prp2
U2_3ss_U6_5m_discard
Prp16
discard_pathway U2_3ss_U6_5ss_U5_remod
U2_3ss_U6_5ssfree_U5
SF3b14b_out
premature_ATP_hydrol
SF3b49_out ADP
ATP
SF3b14b SF3b49
SF3b14a_out SF3b14a SF3b10_out
U2_5ss_U6_U5_conf1
Prp16_remod_step
SF3b10
hPrp6_out Spp382_in U2_5ss_U6_U5_conf2
Prp16
hPrp6 ADP
ATP Prp18_in Prp18
Prp22
ATP
Spp382 hPrp43
post_splsom_complex Prp17_in
Prp22_in
Prp43_in intron_release
Prp17
Prp43_ass
Spp382_hPrp43_act
U2U6U5_3ss_remod Slu7_in ADP
U2U6U5_3ss
Slu7
U2U6_lariat
35S_U5 Prp28 hDib1 GTP
U6 U6_out
2nd_catal_step
20S_U5
U2_release
35S_U5_conv
20S_U5_out
GDP spliced_mRNA
U2
intron
14S_NTC_Prp19_out
SKIP_out SKIP
mRNA_out
U2_out
14S_NTC_Prp19
5ss_out
Fig. 8. Pathway of spliceosome assembly. The red transitions belong to T-invariant 13 (see Supplementary Table S2) and describe a scenario of E-complex assembly, which is one of several redundant partial pathways of spliceosome assembly. Here, the 5’ splice site is recognized via an U1 snRNP independent pathway. (Colours are visible in the online version of the article at www.iospress.nl.)
262
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
Invariant signaling pathways in spliceosome assembly Structural analyses resulted in a complete coverage of the network by 71 T-invariants (Supplementary Table S2). All T-invariants describe at least one partial pathway within the spliceosomal assembly process and are biologically meaningful. 12 T-invariants (17%) are trivial as they describe solely the in- and efflux of the DExD/H box helicases Prp5, Prp28, the proteins hPrp6, SKIP and hDib1, the splicing factors SF1, ASF/SF2, SC35 and the SF3b components SF3b10, SF3b14a, SF3b14b and SF3b49. However, these proteins also constitute a set of spliceosomal components for which it is possible to narrow the putative point of pullout from the assemply process, which so far is unknown for the majority of proteins participating in spliceosome assembly. For example, the four smaller SF3b proteins are known to be required for SF3b formation, but are not detected at the stage of C-complex formation (see supplements of Makarov et al., 2002). Due to the non-decomposability of T-invariants, the removal of a single reaction from an invariant signaling pathway disables the pathway. In case of spliceosome assembly, this does not mean that the entire assembly process is stalled. The model clearly illustrates that the different results from experimental studies shape up to a network with an inherent redundancy to sustain specific check points which represent crucial intermediate assembly stages. For example, a critical step of early spliceosome assembly is the donor or 5’ splice site recognition, for which several parallely occurring signaling pathways were modeled. These pathways contribute to the intermediate stage of E-complex assembly and are reflected in the model by several sets of T-invariants, which are listed below. Note that enumerated T-invariants are labeled by “i”. T-invariants in brackets describe spliceosomal subpathways that involve 5’ss recognition, but do not proceed via the productive branch of C-complex formation that results in exon ligation: 1. T-invariants i13, i14 (i68, i69) describe the U1 snRNP independent 5’ss recognition (see Figs. 8 and 9, Supplementary Table S2, for i13 and Tab. 2). The central function of this branch is transition t16.U1 indep 5ss act, which models the activity of the SR protein SC35. The presence of SC35 is sufficient to define a 5’ss in absence of a functional U1 snRNP [Crispino and Sharp, 1995], and initiates contacts to the BP occupying proteins SF1 and U2AF. This can result in selection of competing 5’ss, which render this pathway a potential candidate for the production of alternatively spliced mRNAs [Tarn and Steitz, 1994]. The remaining transitions of this branch are t31.U2 BPS bdg2 and t32.U6 5ss bdg2, which feed U2 snRNP and U4U5U6 tri-snRNP respectively and proceed the assembly pathway to the B-complex stage. Further differences in the T-invariants, sharing this otherwise unique branch of 5’ selection, exist in two different ways of A-complex assembly via early t13.17S U2 matur2 (i13, i68) and late t22.17S U1 matur1 (i14, i69) action of the enzyme SF3b125. However, for i14 it is not clear at which time SF3b125 leaves the assembly process. Hence, this T-invariant additionally includes the input reaction t109.SF3b125 in. Another variation between these similar T-invariants is the different proceeding during C-complex assembly after the remodeling step t93.U2 3ss U6 5ss U5 remod. Either a productive spliceosome is formed via one type of pathway (i13, i14, Figs. 8 and 9) or the premature disassembly occurs via another type of discard pathway (i68, i69). 2. T-invariants i15-i20 (i62-i67) describe the ASF/SF2 dependent 5’ss recognition. They model the 5’ss recognition by contacts of the U1 snRNP factor U170K with the exon-bound splicing factor ASF/SF2 (t12.ASFp U170K bdg) via RS domains [Cao and Garcia-Blanco, 1998]. Additionally, contacts of U1C to another splicing factor, TIA1, which binds intronic elements close to the 5’ splice site (t9.U1C TIA1 bdg1), can direct the splice site selection [Puig et al., 1999; Del Gatto-Konczak
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
263
Fig. 9. Subhierarchy levels of the spliceosomal assembly network. Red highlighted transitions depict exemplarily T-invariant 13, which describes one possible scenario of E-complex assembly as shown in Fig. 8. (Colours are visible in the online version of the article at www.iospress.nl.)
et al., 2000; Fo¨ rch et al., 2002]. Consequently, these T-invariants can be differentiated into three groups: (a) T-invariants i15, i16 (i66, i67) describe the E-complex formation via U1 contacts to the branch point bound splicing factor SF1 (t58.U1 SF1 bdg) and the joining of the auxiliary factor U2AF, after its recognition of the polypyrimidine tract t59.U1 SF1 U2AF bdg. Dependent on the U2 maturation via SF3b125, there exist again two different T-invariants for this mode of E-complex formation. (b) The presence of the splicing factor SC35, has been found to facilitate 5’ss recognition (t17.U170K U2AF35 bdg), but required the U1-complex and U2AF [MacMillan et al., 1997]. Hereby, the protein FBP11 helps to bridge U1 and SF1, a constellation, which is in agreement with the observation that SF1 and U2AF bind cooperatively to the branch point and polypyrim-
264
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
idine tract, respectively [Abovich and Rosbash, 1997; Berglund et al., 1998]. This mode of E-complex assembly is reflected by the T-invariants i17, i18 (i64, i65). (c) The T-invariants i19, i20 (i62, i63) describe a pathway of E-complex assembly, in which 5’ss selection is modulated by the presence of two alternative splicing factors, ASF/SF2 and TIA1, which both have been reported to interact with U1 snRNP proteins, U170K (t12.ASFp U170K bdg) and U1C (t9.U1C TIA1 bdg1), respectively [Foerch et al., 2002]. The splicing factor SC35 compensates for the requisite of the auxiliary factor U2AF and bridges the 5’ss to the branch point via contacts to U1 snRNP and SF1 (t56.U1 SC35 SF1 bdg). Subsequently, U2 snRNP is directly bound to this intermediate complex (t152.U2 SC35 bdg). Note that the ATP dependent helicase UAP56 is prerequisite for conformational changes in transition of the spliceosome from E to A-complex [Staley and Guthrie, 1998], based on its ATP hydrolyzing capacity [Shen et al., 2007]. However, although the U2AF component U2AF65 is an essential cofactor for positioning UAP56 in the branch site region [Fleckner et al., 1997], U2AF65 seems not to influence UAP56’s ATPase activity [Shen et al., 2007], raising the question how UAP56 unfolds its activity in an U2AF independent mode of A-complex assembly. 3. T-invariants i21-i26 (i56-i61) describe 5’ss recognition in the 5’ terminal exon. The first donor splice site within a transcript follows a different mode of recognition. Here, the interaction of U1 snRNP proteins (U1C) to proteins of the cap binding complex (CBC) via LUC7 has been shown and hence was modeled with a separate reaction (t10.U1 CBC 5ss bdg), which triggers another set of T-invariants. However, except for the interaction with the cap binding complex, these Tinvariants are the same as above (2 a-c) and may constitute novel alternative pathways for the initial spliceosome assembly at newly synthesized transcripts. 4. T-invariants i27-i32 (i50-i55) describe 5’ss recognition via U1C contacts to intron bound splicing factor TIA1. Alternatively to the proposed joint action of ASF/SF2 and TIA1 in alternative 5’ss selection and U1 snRNP recruitment, TIA1 was modeled as sole contributing splicing factor to 5’ss selection (t30.U1C TIA1 bdg2) [F o¨ rch et al., 2002] initiating different ways of downstream A-complex formation. Thus, two invariants describe A-complex formation independent of U2AF via exclusive action of SC35 (i27, i54, similar as in 2c), and four invariants involve U2AF. The latter set splits further into two groups, one involving FBP11 additionally to SC35 (i29, i52) and one involving exclusively FBP11 without SC35 in A-complex formation (i31, i50). 5. T-invariants i33-i38 (i44-i49) describe 5’ss recognition without additional splicing factors. These T-invariants model the 5’ss recognition without the parallel binding of the U1 stabilizing factors ASF/SF2 or TIA1, which could be described as a mode for strong donor splice site selection. However, the transition from the E-complex to the A-complex may require conditions as described for the T-invariants in 2 a-c. 6. T-invariants i39, i40 (i70, i71) describe binding of U1 snRNP to the 5’ss after initiating contact to U2 snRNP via Prp5. These T-invariants model a pathway of E-complex formation, where the ATPase Prp5 bridges U1 and U2 snRNP prior to their binding to the pre-mRNA. Thus, U2 snRNP is already associated with U1 snRNP and Prp5 when it binds to the intron branch site. This was suggested based on the observation that binding of Prp5 to U1 and U2 snRNP occurs also in absence of pre-mRNA but depends on the availability of ATP [Xu et al., 2004]. In accordance, it was shown that Prp5 is required for pre-spliceosome assembly [Xu et al., 2004]. Hence, the T-invariants i39, i40 (i70, i71) contain the reaction t27.U1 Prp5 U2 bdg as bridging step, followed by t28.U1 5ss U2 U2AF bdg, describing the contacts with the 5’ss and the branch point/polypyrimidine tract associated proteins. The ATP dependent structural rearrangements
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
265
towards A-complex assembly are modeled by t55.unwind1 U2 stl2 – a step that releases the U2 factor SF3a60 [Staley and Guthrie, 1998]. The UAP56-catalyzed conformational changes in the U1/U2 pre-mRNA complex are modeled by t29.U1U2 BPS bdg and complete the transition from the E-complex to the A-complex. Shortly after or in parallel to the recognition of 5’ss by U1 snRNP, U2 snRNP joins the assembly pathway and defines the branch point region. Maturation of the 17S U2 snRNP was proposed to proceed via two different actions of the putative DExD/H box helicase SF3b125. This enzyme either acts in early stage of the 17S U2 maturation pathway by catalyzing a conformational change, when SF3b is integrated into the 12S U2 snRNP to form the intermediate 15S U2 snRNP subcomplex (t13.17S U2 matur2). Alternatively, SF3b125 may act subsequent to the binding of SF3b, in this way supporting the conformational rearrangement to integrate the SF3a subcomplex into the U2 snRNP (22.17S U2 matur1). Experimental evidence suggests that this putative enzyme is largely dissociating during 17S U2 assembly [Will et al., 2002]. Hence, at least one of the alternative reactions was modeled to set SF3b125 free from the U2 snRNP maturation subpathway. Due to the two different U2 snRNP maturation scenarios, the number of T-invariants is doubled for all subsystems, which require the presence of a mature U2 snRNP. This demonstrates the emergence of combinatorial complexity in the modeled system. A similar behavior can be observed during late spliceosome assembly, where a branching of the pathway was modeled according to the proposed function of the Prp16 DExD/H box helicase. Although the model does not reflect the kinetic behavior of Prp16 in detail, the outcome of two different possible kinetics can be described. The proper kinetics of Prp16 requires a specific substrate of pre-mRNA and snRNP conformations and may channel spliceosome assembly into a productive pathway of C-complex assembly, such that the second step of splicing and exon ligation can proceed (via t101.Prp16 remod step). In contrast, mutations in the involved RNA species or unfavorable conformations due to missing proteins, can result in slowed Prp16 kinetics, which was proposed to activate a discard pathway [Burgess and Guthrie, 1993; Konarska and Query, 2005; Pandit et al., 2006] reflected by transition t100.premature ATP hydrol. This must not necessarily be a degradation pathway as some of the involved factors (Spp382, Prp43) are also active in recycling spliceosomal components [Staley and Guthrie, 1998; Villa and Guthrie, 2005] modeled by transition t96.Spp382 hPrp43 act. In consequence, two possible and different major outcomes of spliceosome assembly are captured by the model and cause a doubling of observed T-invariants: i) the productive (T-invariants i13-i43) or ii) the unproductive (T-invariants i44-i71) branch of late spliceosome assembly, which fall into line with all subpathways passing through E- and A-complex assembly. Conservation relations corresponding to P-invariants Compared to the number of T-invariants the network structure generates only four place invariants. A trivial P-invariant exists for the serine protein kinase 1 (SRPK1), which was modeled as a loop connected to the transition that describes the phosphorylation of the splicing factor ASF/SF2. In contrast to other putative enzymes, which act within the spliceosomal subcomplexes, SRPK1 is active at an early stage, activating individual splicing factors. Thus, it is assumed not to participate in further spliceosome assembly and has been modeled as available in a non-limited amount. Two other P-invariants are related to the factors Prp31 and Prp38, which are present in the B-complex. Prp31 binds the U4 snRNA and the U4/U6 snRNA duplex in presence of another factor, Snu13. Hence, Prp31 enters spliceosomal assembly at least in the stage of U4/U6 subcomplex formation [Nottrott et al., 2002; Schaffert et al., 2004]. Furthermore it was shown that Prp31 is destabilized at the time of catalytic activation of the spliceosome. Thus, it was modeled to leave the spliceosomal main complex through the reaction t68.B complex act.
266
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network Table 1 Exemplary description of T-invariants i13 and i14, describing signaling pathways during the process of spliceosome assembly ID #t Biological Interpretation 13 91 U1 independent 50ss activation, early SF3b125 action (t13.17S U2 matur2) in U2 maturation 14 92 As i13, but via late SF3b125 action (t22.17S U2 matur1) in U2 snRNP maturation The complete table of all T-Invariants is given in the supplemental material.
This results in seven places, describing the tri-snRNP and B-complex formation, which form a Pinvariant for conservation of Prp31 in the system. The fact that Prp31 is required for successive rounds of tri-snRNP and spliceosome formation suggests this protein to be abundantly available, which in turn justifies a conservation relation. Furthermore, Prp31 is a crucial factor in spliceosome assembly, because mutations in its gene are related to the blindness causing disease retinitis pigmentosa [Schaffert et al., 2004]. Since all T-invariants, involving the reaction t47.U4 U6 bdg (approximately 79% of all T-invariants) can only fire in the presence of Prp31, the model demonstrates that more than three quarter of the network would fail if this protein would be knocked out. Prp38 (yeast ortholog of human protein 27K) forms a similar albeit smaller P-invariant of five places, which, except for Prp38, is itself a complete subset of the Prp31 P-invariant. The time of appearance and release of Prp38 is less clear, but it was shown to associate with higher affinity (and thus stability) with the assembled U4/U6 U5 tri-snRNP than with an individual U snRNP [Xie et al., 1998]. Hence, it was modeled to enter tri-snRNP formation at the time of U5 snRNP integration. Its involvement in structural rearrangement of the U4/U6 complex without possessing a DExD/H domain to actively participate in the required hydrolyzation reactions, makes it a putative auxiliary factor for the DExD/H box helicase Prp28 [Xie et al., 1998; Lybarger et al., 1999]. The possible connection between Prp38 and the helicase Prp28, which catalyzes the unwinding of the U4/U6 snRNA duplex upon U2 snRNP integration [Xie et al., 1998], implies that both proteins exit from the active assembly pathway after this step. In this way, Prp38 might as well as Prp31 form a conservation relation at the stage of B-complex formation. The fourth P-invariant defines the cycling of the DExD/H box helicase, Prp16, which is a crucial determinant of the second catalytic step, by catalyzing the initial conformational changes in transition from the first to the second catalytic step [Staley and Guthrie, 1998]. Prp16 binds transiently, being no integral snRNP component, and leaves the spliceosome upon ATP hydrolyzation [Schwer and Guthrie, 1991]. This P-invariant consists only of three places (including free Prp16), denoting the intermediate complexes p90.U2 5ss U6 U5 conf1 and p100.U2 5ss U6 U5 conf2, in which Prp16 unfolds its catalytic activity. In contrast to other helicases involved in structural rearrangements (e.g., Prp28), this enzyme occurs not in different branches, and, thus, explaining its appearance in a conservation relation. In general, the appearance of essential enzymes in conservation relations can be meaningful to reflect their availability for subsequent cycles of spliceosome assembly, which may require a constant presence proximal to the location of spliceosome formation. The identified P-invariants suggests that more spliceosomal factors exists whose relative concentration do not change markedly via repeated cycles of spliceosome assembly. Lack of evidence at which time other catalytically active proteins specifically (re)enter the assembly process or how long they remain associated with the main complex, presently limit the modeling of further P-invariants. Note that in contrast to metabolic networks, where commonly only low molecular substances appear in conservation relations, here also enzymes or intermediate complexes can be conserved within a defined signaling network.
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
267
Table 2 Maximal common transition sets (MCTS) as determined from the 71 T-invariants that cover the network MCTS ID ID 1 t0, t1 2
Transition IDs # 2
% 1.24
3
t3, t7, t11, t18-t21, t26, 54 33.54 t42, t44-t54, t68-t72, t86, t87, t93, t96-t98, t104-t106, t110, t111, t113, t114, t118-t129, t144, t145, t150, t151 t4-t6 3 1.86
4
t8, t23, t57, t158, t159
5
3.11
5
t9, t12, t141-t143
5
3.11
6
t10, t137-t139
4
2.48
7
t15, t24
2
1.24
8
t16, t31, t32
3
1.86
9
t22, t109
2
1.24
10
t25, t27-t29, t55
5
3.11
11
t33, t34, t133-t136
6
3.73
12
t35, t56, t152, t155
4
2.48
13
t36, t146, t147, t149
4
2.48
14
t58, t59
2
1.24
15
t60-t63
4
2.48
16
t65, t66
2
1.24
17
t76, t77, t81, t82, t84, 15 9.32 t85, t88-t92, t94, t95, t101, t160
T-Invariant IDs Biological Interpretation ID # % i13-i18, i23-i26, i29-i32, i35- 40 56.34 Recognition of 3’ss by U2AF35 i40, i44-i47, i50-i53, i56-i59, i64-i71 i13-i40, i44-i71 56 78.87 Branch point definition, 15S U2 snRNP assembly, SF3a formation, partial SF3b formation, U5-and U6-maturation, CypH trimer formation, ATP-in-ADP-efflux i13-i18, i21-i26, i29-i32, i35- 44 61.97 U2AF dimerization i40, i44-i47, i50-i53, i56-i61, i64-i71 i13-i18, i23-i26, i29-i32, i35- 36 50.70 Reactions of U2 snRNP remodi38, i44-i47, i50-i53, i56-i59, eling variant 1, involving hPrp43 i64-i69 and UAP56 i15-i20, i62-i67 12 16.90 ASF phosphorylation and 5’ss definition via ASF/U1 snRNP/TIA1 interactions i21-i26, i56-i61 12 16.90 5’Terminal 5’ss definition via U1 snRNP interactions with cap binding complex i15-i18, i23-i26, i29-i32, i35- 32 45.07 FBP11 dependent U2 snRNP i38, i44-i47, i50-i53, i56-i59, binding to the branch point i64-i67 i13, i14, i68, i69 4 5.63 U1 snRNP independent 5’ss definition and A complex formation i14, i16, i18, i20, i22, i24, i26, 28 39.44 SF3b125 dependent 17S U2 i28, i30, i32, i34, i36, i38, i40, snRNP maturation i45, i47, i49, i51, i53, i55, i57, i59, i61, i63, i65, i67, i69, i71 i39, i40, i70, i71 4 5.63 U1/U2 snRNP bridging by Prp5 and simultaneous binding to 5’ss and branch point i15-i40, i44-i67, i70, i71 52 73.24 U1 snRNP maturation and recycling i19-i22, i27, i28, i33, i34, i48, 16 22.54 U2AF independent A complex i49, i54, i55, i60-i63 formation i7 1 1.41 U4 snRNP maturation and recycling i15, i16, i25, i26, i31, i32, i37, 16 22.54 FBP11 supported U1 snRNP/SF1 i38, i44, i45, i50, i51, i56, i57, binding and subsequent interaci66, i67 tion with U2AF bound PPT-3’ss i3 1 1.41 PTB inhibitory pathway (without t2.PPT in) i41, i44-i71 29 40.85 NTC-complex formation and stable Prp19 integration i13-i40 28 39.44 Prp16 dependent remodeling of C-complex, 2nd catalytic step of splicing, release of ligated exons and disassembly of postspliceosomal complex
268
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network Table 2, continued
MCTS Transition IDs ID ID # 18 t99, t100 2
T-Invariant IDs
Biological Interpretation # % 28 39.44 Prp16 induced and slowed ATP hydrolysis and commitment of Ccomplex to discard pathway, disassembly supported by hPrp43 19 t130-t132 3 1.86 i15-i20, i27-i32, i50-i55, i62- 24 33.80 TIA1 intron binding i67 Σ – 127 78.86 – – – – Each MCTS comprises reactions that are exclusively shared by several T-invariants and hence describe frequently used routes through the spliceosomal assembly network. % 1.24
ID i44-i71
Decomposition of the spliceosomal network into functional units Analysis of maximal common transition sets Maximal common transition sets (MCTS) have been defined as a more generalized concept of enzyme subsets, which define enzymes in a biochemical network that operate under steady state conditions always together, in one or several different metabolic fluxes. Enzyme subsets further require that the enzymes involved are all regulated in the same direction and that their fluxes behave proportional [Pfeiffer et al., 1999]. MCTS, in contrast, are based on the support of T-invariants, and, thus, do not take stoichiometric relations into consideration. Due to missing stoichiometric coefficients the constraint of proportionality does not apply. They describe sets of reactions that are exclusively in a maximal number of T-invariants (“signaling fluxes”) present, hence, being shared by different signaling pathways. MCTS and the places and arcs in between form disjunctive subnetworks and can be interpreted as functional building blocks. Given the correctness or biological meaning of T-invariants, MCTS emphasize key parts of signaling routes and facilitate the description of functional parts of a network. Vice versa, they can indicate modeling flaws if they combine transitions without proven biological relationship. Table 2 summarizes the computed MCTS. There exist six smaller MCTS composed of only two transitions, which nevertheless represent crucial elements of the assembly pathway. MCTS 1 (t0.U2AF35 3ss bdg, t1.3ss in) describes the recognition of the 3’ss by the factor U2AF35, which occurs in more than 56% of all T-invariants. Next frequently, MCTS 7 is formed by two reactions shared by 32 T-invariants (45%), which describe the influx of the bridging factor FBP11 (t15.FBP11 in) and subsequent binding of the U2 snRNP to the branch site t24.U2 BPS bdg1). Taken together, MCTS 4 and MCTS 7 form a subpathway that covers the FBP11 supported interaction of U1 snRNP with the branch point bound factor SF1 and the subsequent joining and structural rearrangement of U2 snRNP with replacement of U2AF at the polypyrimidin tract. The remaining small MCTS still occurs in more than one quarter of all invariant pathways and cover the NTC-complex and Prp19 integration (MCTS 16), the late SF3b125 action in 17S U2 maturation (MCTS 9), and the slowed ATP hydolysis by Prp16 with initiation of the discard pathway (MCTS 18). The largest shared transition set (MCTS 2) covers 54/161 (33%) of all transitions, which are part of more than three quarter of all T-invariants. This MCTS represents a fundamental building block that defines biological functions at essential stages of spliceosome assembly, for example, branch point definition, 15S U2 snRNP assembly, SF3a and SF3b subcomplex formation, U5- and U6 snRNP maturation and the cyclophilin trimer formation. The energy supply by ATP and removal of ADP is part of MCTS 2, which naturally has to be shared by many T-invariants because each stage requires ATP either for phosphorylation or hydrolyzation reactions. The maturation and recycling of the U1 and U4 snRNP
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
269
is described by individual MCTS (11 and 13), the former consisting of reactions, which are shared by alomst three quarter (73%) of all T-invariants. All MCTS further validate the model network, capturing crucial parts of the U1 and U2 snRNP maturation, U1 independent 5’ss recognition and the Prp16 involved discard pathway. For a more detailed model of U1 snRNP assembly see Kielbassa et al., 2008. Clustering of T-invariants and MCTS Computed T-invariants (see Supplementary Table S2) were aligned to determine the individual distance between the signaling pathways. Similar to sequence alignments the pairwise comparisons among all T-invariants can be used to build a distance matrix, based on which a clustering can be performed. Clusters reflect groups of signaling pathways, which share a given percentage of reactions. Here, a threshold of 80% was chosen to merge T-invariants with less than 20% difference into the same subtree. For example, the T-invariants i13 and i14 show a difference in four reactions in a total pathway length of 92 transitions, i.e., only one transition (t13.17S U2 matur2) is missing in i14 and three transitions (t12.ASFp U170K bdg, t22.17S U2 matur1, t109.SF3b125 in) are absent in i13, which makes both invariant to 96% similar. Comparing different subclusters helps to identify those reactions, which define different functions in different stages of spliceosome assembly. The cluster representation depicts all trivial T-invariants in one group in the lower part of the tree (Fig. 10, C1-C13, C22, C23), which is reasonable since they share maximal two transitions with the remaining T-invariants (cluster I). Also three short T-invariants, which describe the NTC-complex formation, the subpathway of U4 snRNP maturation and the PTB inhibition pathway group separately, indicating that these reactions model side-pathways, which are not shared by other signaling fluxes. In contrast to the outgroup, cluster I combines all T-invariants of at least four reactions. Subclusters can contain complete or partial MCTS. For example, cluster C17 and C18 together constitute MCTS 10, which is composed of five reactions. This MCTS describes the subpathway of bridging the U1/U2 snRNP by Prp5 and occurs in four T-invariants. In contrast, T-invariants i15-i20 and i62-i67 share five reactions involving ASF/SF2 within MCTS 5 but are part of the two different major clusters, I and IV. These clusters partition the T-invariants in two reaction sets: one is reaching the productive end of the spliceosomal assembly pathway (resulting in spliced mRNA) and the other one is representing the discard pathway during C-Complex stage. This splitting can also be seen by visualizing all invariant pathways and their shared reactions via a color map (see Fig. 11). The color map representation was used to aid and accelerate the visual identification of groups of reactions that participate in different T-invariants in conjunction with the dendrogram (Fig. 10). It is thought to introduce a compact representation of network structure to facilitate the interpretation of differences between signaling pathways. Here, darker colors in vertical direction accentuate the transitions of each individual T-invariant, while in horizintal direction the participation of an individual transition in different T-invariants can be read. Bright colors mean that a reaction is not present in T-invariants and consequently also not in MCTSs. This representation suggests to extend the analysis of MCTSs also for reactions, which do not occur in specific MCTSs. For example, in Fig. 11 one can easily recognize two brighter colored horizontal areas within the line of transition t103.ASF SF2 out, which stretches exactly over the columns that contain the T-invariants of MCTS 5 (see dashed rectangle in Fig. 11). This means, transition t103.ASF SF2 out is specifically not involved in 12 T-invariants, i15-i20 and i62-i67. Further analysis of the line within the color map (Fig. 11), which represents the transition t103.ASF SF2 out reveals that this transition is present in most of the other T-invariants, but why is it absent from a specific set of T-Invariants? We expect this reaction in the trivial T-Invariant i12, which
270
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
Fig. 10. Clustering of T-invariants with trivial and small T-invariants shown in the bottom of the tree. The two main clusters are colored in blue and red. Latin enumeration of clusters refers to the output as obtained from TINANET [Thormann et al., 2009] using the Tanimoto distance measure and 80% similarity cutoff. Roman enumeration labels clusters that form larger groups. (Colours are visible in the online version of the article at www.iospress.nl.)
constitutes the influx and efflux of ASF/SF2 by t140.ASF SF2 out and t103.ASF SF2 out respectively. However, the T-Invariants, which contribute to MCTS 5 (Tab. 2) are more complex and describe partially the recognition of the 5’ss aided by the splicing factor ASF/SF2. They differ in the mode how other splicing auxiliary factors like SC35, SF1, U2AF or FBP11, are involved in 5’ss recognition. Thus, all T-Invariants missing t103.ASF SF2 out have in common that they support the binding of U1 snRNP. It is known that E-complex assembly can heavily rely on the presence of SR proteins (auxiliar splicing factors), which organize the cross-talk between 5’ss, branch point and 3’ss [Shen and Green, 2004]. In such spliceosomes a critical level of SR proteins like ASF/SF2 or SC35 should be beneficial and path-
Transitions
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
tt0 tt1 tt2 tt3 tt4 tt5 tt6 tt7 tt8 tt9 tt10 tt11 tt12 tt13 tt14 tt15 tt16 tt17 tt18 tt19 tt20 tt21 tt22 tt23 tt24 tt25 tt26 tt27 tt28 tt29 tt30 tt31 tt32 tt33 tt34 tt35 tt36 tt37 tt38 tt39 tt40 tt41 tt42 tt43 tt44 tt45 tt46 tt47 tt48 tt49 tt50 tt51 tt52 tt53 tt54 tt55 tt56 tt57 tt58 tt59 tt60 tt61 tt62 tt63 tt64 tt65 tt66 tt67 tt68 tt69 tt70 tt71 tt72 tt73 tt74 tt75 tt76 tt77 tt78 tt79 tt80 tt81 tt82 tt83 tt84 tt85 tt86 tt87 tt88 tt89 tt90 tt91 tt92 tt93 tt94 tt95 tt96 tt97 tt98 tt99 tt100 tt101 tt102 tt103 tt104 tt105 tt106 tt107 tt108 tt109 tt110 tt111 tt112 tt113 tt114 tt115 tt116 tt117 tt118 tt119 tt120 tt121 tt122 tt123 tt124 tt125 tt126 tt127 tt128 tt129 tt130 tt131 tt132 tt133 tt134 tt135 tt136 tt137 tt138 tt139 tt140 tt141 tt142 tt143 tt144 tt145 tt146 tt147 tt148 tt149 tt150 tt151 tt152 tt153 tt154 tt155 tt156 tt157 tt158 tt159 tt160
271
t103 t107,t108 t112 t115
i47 i46 i45 i44 i53 i52 i51 i50 i59 i58 i57 i56 i65 i64 i67 i66 i69 i68 i49 i48 i55 i54 i63 i62 i61 i60 i71 i70 i40 i39 i36 i35 i38 i37 i24 i23 i26 i25 i30 i29 i32 i31 i18 i17 i16 i15 i14 i13 i34 i33 i28 i27 i20 i19 i22 i21 i41 i7 i1 i12 i3 i8 i9 i10 i11 i42 i43 i2 i6 i5 i4
T-invariants Fig. 11. Color map to visualize which transitions (reactions) occur within similar pathways (T-invariants) through the network. The similarity can be compared via the dendrogram at top of the figure, which is the same as shown in Fig. 10. Dark (light) colors indicate presence (absence) of one or several reactions among the total set of modeled reactions within one or several T-invariants. The red dashed rectangles show an example of groups of T-invariants, which specifically lack the reactions indicated at the right side. Note that these groups of T-invariants appear in different clusters and their common feature is easier detectable via the color map compared to the dendrogram. (Colours are visible in the online version of the article at www.iospress.nl.)
ways without an explicit efflux reaction might constitute an intermediate process to reach such levels. Another explanation, which is not reflected in this model but has to be considered is the fact that intron or exon bound splicing factors can remain bound either at the lariat or the mature mRNA, leaving the spliceosome in this condition. In contrast, other T-Invariants, for example, the two similar T-invariants i13 (see Fig. 8 for i13) and i14, contain the ASF/SF2 efflux reaction (t103.ASF SF2 out), but not the pertinent influx reaction (t140.ASF SF2 in, see Fig. 9a). Both T-invariants describe the U1 snRNP independent 5’ss recognition, by interaction of SC35 with SF1/U2AF. A pathway with missing ASF/SF2 influx reaction still makes sense, considering that it can enter spliceosomal assembly indirect in a condition bound to exonic sequence near the 5’ss. In agreement with that, the transition t11.5ss in, which allocates the 5’ss exon, is part of i13 and i14 respectively.
272
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
Another observation that could be made from the color map concerns the absence of four reactions, t107.SF3b10 in, t108.SF3b14a in, t112.SF3b49 in and t115.SF3b14b in in, from a large body of Tinvariants (i13-i38), independently of the trivial T-invariants. These reactions model the influx of four SF3b specific factors. They were drawn as logical places, each with a specified efflux reaction prior to the C-complex stage, because experiments failed to detect these factors within spliceosomal C-complexes [Makarov et al., 2002] (see supplementary material). The T-invariants, i13-i38, describe the signaling pathways that reach the final stage of spliceosome assembly. Hence, these routes pass the stage, where the SF3b factors are leaving the active assembly process. Among others, the specified SF3b factors are required for subsequent rounds of spliceosome assembly, thus the absence of their influx reactions from T-invariants, i13-i38, can be interpreted as a way to remain within range of the spliceosome assembly site. A different scenario occurs for the T-invariants that enter the discard pathway, which is triggered before C-complex formation. Here, no explicit efflux reactions could be adapted from literature for the SF3b factors, hence their influx reaction is part of the T-invariants i44-i71 (Fig. 11). This raises the question, what happens with these factors and when, if the discard pathway is activated during spliceosome assembly. DISCUSSION The present work describes a Petri net model, which combines different scenarios of spliceosome assembly over the basic assembly stages of this multi-protein complex. The spliceosome is a component of the nucleus, which is newly built after or as soon as a precursor RNA emerges from the RNA polymerase II transcription complex. The assembly process involves biochemical reactions, which can be distinguished in enzymatic and association reactions. Unlike in metabolic networks, which commonly model the conversion of low molecular compounds to produce energy or target metabolites, e.g., amino acids [Schuster and Hilgetag, 1994; Koch et al., 2005], the spliceosome assembly involves many enzymatic reactions, which act on double stranded RNAs as substrate and proteins or NTPs as co-factors [Mayas et al., 2006, Staley and Guthrie, 1998]. This is due to the snRNA containing core components of the spliceosome, which interact via multiple RNA-RNA contacts making it necessary to re-open intermediate conformations at several stages during the assembly process. Additionally, phosphorylation reactions as known from signal transduction networks [Pawson and Nash, 2000] assure the specificity and localization of splicing factors. Consequently, the model presented here, consists of several types of molecules (RNA, proteins and compounds of both) and reactions, which have been designed and tested individually prior to the setup of the complete network. Hundreds of individual studies have investigated components of the spliceosome or individual biochemical reactions. The knowledge of many years of laboratory work is available but needs to be translated into a machine readable and human comprehensible language. Thus, one of the main purposes of this work is to channel biochemical knowledge about the spliceosome into a formalized description, suitable for computational analysis. One major difficulty is the handling of non-standardized identifiers of the involved proteins, which exacerbates the combination of smaller models, initially devised from individual reports and successively combined into a larger network of interactions. Thus, the power of predictive modeling will increase as more submodels become integrated, covering more details of the spliceosome assembly pathway. We establish a network of ordered basic interactions leading to the assembly of an active spliceosome, including also the example of a discard pathway, which was previously suggested [Villa and Guthrie, 2005]. In total, about 100 proteins where integrated into the model. Many proteins, participating in
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
273
the spliceosomal assembly pathway, are themselves alternatively spliced, hence may occur in several isoforms. For example, the U2 snRNP specific component SF3b14 shows five predicted alternative splice forms of different types (source ASD [Thanaraj et al., 2004]). The interactions of SF3b14 are well described [Spadaccini et al., 2006], including the location of functional domains within the protein sequence. Since more and more spliceosomal factors are described in such detail, it should be possible in the near future to estimate the impact of alternative splicing on the spliceosome, which poses an interesting example of combinatorial complexity. Suppose that in the four stages of spliceosome assembly only one protein factor occurs in two functionally different isoforms, then about 2 4 = 16 different spliceosomes could be assembled and contribute to different alternative splicing decisions (neglecting that some alternative splice forms do not reach the protein level). This is a rough estimate of the lower boundary as many more spliceosomal proteins exist, and most of their genes are likely to be alternatively spliced. The network of spliceosomal assembly as presented here, serves as a basic scaffold to successively map the occurrence and impact of alternative splice events on the assembly pathway. This can lead to interesting hypotheses about which alternative splice event contributes to which spliceosomal state, making spliceosomes classifiable and probably even attributable to specific splicing patterns. For example, if the ability to become phosphorylated of an SR protein splicing factor, like ASF/SF2, is impaired due to alternative splicing, this will influence its contribution to splice site recognition. As consequence the spliceosome will possibly fail to recognize weak splice sites under this condition. Additionally, since other splicing factors for example, SC35 or TIA1 influence E-complex assembly, redundancy in recognition of pre-mRNA signals by interchangeable factors must be taken into account. Concerning the presented Petri net modeling approach, we can summarize the following achievements: 1. Translation of different lines of evidence for modular subsystems of the spliceosomal assembly pathway from experimental literature into a unique mathematical formalism. 2. Compilation of T-Invariants (P-invariants) based on the commonly applied steady state assumption for biochemical networks. Assignment of a biological meaning to each T-invariant (P-invariant). 3. Model validation resulting in a network completely covered by T-invariants. 4. Representation of combined partial pathways, each supported by experimental reports, allowing for model expansion and testing of new hypotheses. 5. Inclusion of special aspects of 5’ splice site recognition during E-complex formation as well as the potential activation of a discard pathway as simplified model for a kinetic proofreading mechanism during C-complex formation. 6. Easier identification of discrepancies in current experimental data by the combinatorial arrangement of the subpathways. For example, the activation of UAP56 by U2AF stands in contradiction to the apparent requirement of UAP56 for transition from the E-complex to the A-complex within the U2AF independent A-complex assembly pathway. 7. Comprehensive and condensed visualization of the spliceosomal assembly process, allowing the global inspection of similar and distinct routes. This facilitates the apprehension of a large network such as the spliceosomal assembly pathway and its further extension. The clustering of T-invariants, representing signaling pathways and participating in spliceosome formation, indicates that there exists a variety of similar pathways leading to the same intermediate complexes. Although each T-invariant, describing one of these routes, is minimal in that it would fail with the loss of one reaction, it is clearly visible that there exists a redundancy in routes leading to the formation of intermediate states. This observation provides the interesting aspect of a backup failure mechanism, ensuring
274
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
that, independently from alternating conditions, the spliceosome reaches critical assembly checkpoints with different protein complements. Alternatively, this might extend our view to different spliceosomes in dependence on different cellular or environmental conditions. The existence of a major and a minor spliceosome [Will and Lu¨ hrmann, 2005] with different mRNA substrate specificities could support this notion, but the major spliceosome as considered here suggests for itself a highly dynamic assembly process. The flood of different mechanistic examples of individual and sometimes quite different intermediate steps makes it clear that there is no “one spliceosome”. Hence, there can be no single model of spliceosome assembly. Although the current model represents a higher coverage of experimentally supported subsystems in the early (E- and A-complex) in comparison to later assembly stages, it is tempting to hypothesize that the number of different routes increases with the importance of the intermediate complex for the overall assembly process. A further purpose of this model was to demonstrate how knowledge gained by experiments on the spliceosome can be transformed into formalized descriptions, which are suitable for computational analysis. Biochemical reactions, describing all steps of spliceosome assembly in more or less detail and having accumulated over the past two decades, were extracted and assigned to reactions that can be used for structural modeling. Hereby, the use of standardized identifiers for protein factors will greatly ease the combination of smaller models and their successive integration into larger systems. The power of predictive modeling is increasing as more submodels can be integrated, covering more details of the spliceosome assembly pathway. Several requirements to future works on spliceosome analysis can be asserted from this work, addressing both, experimental biologists and computational scientists. First, experimental data should immediately be stored in a structured pre-formated way, making use of existing formalisms and avoiding unnecessary naming morphisms. Experimental data provides precious facts, which are necessary to prepare subsequent in silico analyses. Second, theoretical and computational contributions can still be improved in the supply of data collection tools as well as integrated pipelines for their global evaluation and analysis. For example, text mining tools at the level of network design and statistical measures at the level of substructure analysis (e.g., T-invariants, MCTSs) can enhance the output of this modeling approach. Finally, one needs to keep in mind that structural properties depend at first hand on the knowledge put into the model. In light of the wealth of biological information, a direct consequence is the possibility that parts of the model are better covered by data than others, and therefore exhibiting a higher complexity. Consequently, these parts are stronger represented by T-invariants. Nevertheless, the fact to observe a stronger representation of individual aspects of a biological network justifies a model, because it captures relations and trends that are hard to detect using detailed mechanistic studies, which moreover are too numerous for individual analyses. The first mathematical model of spliceosomal assembly pathway presented in this paper could serve as the basis for further investigations, experimental as well as theoretical ones. ACKNOWLEDGEMENTS We want to thank the anonymous reviewers for their helpful comments on the manuscript. Financial support from the German Ministry for Research and Education (BMBF) to RHB within the HepatoSys framework is gratefully acknowledged.
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
275
REFERENCES • Abovich, N. and Rosbash, M. (1997). Cross-intron bridging interactions in the yeast commitment complex are conserved in mammals. Cell 89, 403-412. • Baumgarten, B. (1996). Petri-Netze, Grundlagen und Anwendungen. Second edition, Spektrum Akademischer Verlag, Heidelberg. • Beggs, J. D. (2005). Lsm proteins and RNA processing. Biochem. Soc. Trans. 33, 433-438. • Behzadnia, N., Hartmuth, K., Will, C. L. and L u¨ hrmann, R. (2006). Functional spliceosomal A complexes can be assembled in vitro in the absence of a penta-snRNP. RNA 12, 1738-1746. • Bentley, D. (2002). The mrna assembly line: transcription and processing machines in the same factory. Curr. Opin. Cell. Biol. 14, 336-342. • Berglund, J. A., Abovich, N. and Rosbash, M. (1998). A cooperative interaction between U2AF65 and mBBP/SF1 facilitates branchpoint region recognition. Genes Dev. 12, 858-867. • Boehringer, D., Makarov, E. M., Sander, B., Makarova, O. V., Kastner, B., L¨uhrmann, R. and Stark, H. (2004). Threedimensional structure of a pre-catalytic human spliceosomal complex B. Nat. Struct. Mol. Biol. 11, 463-468. • Bortfeldt, R., Schindler, S., Szafranski, K., Schuster, S. and Holste, D. (2008). Comparative analysis of sequence features involved in the recognition of tandem splice sites. BMC Genomics 9, 202. • Brett, D., Hanke, J., Lehmann, G., Haase, S., Delbru¨ ck, S., Krueger, S., Reich, J. and Bork, P. (2000). EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett. 474, 83-86. • Brow, D. A. (2002). Allosteric cascade of spliceosome activation. Annu. Rev. Genet. 36, 333-360. • Burgess, S. M. and Guthrie, C. (1993). A mechanism to enhance mRNA splicing fidelity: the RNA-dependent ATPase Prp16 governs usage of a discard pathway for aberrant lariat intermediates. Cell 73, 1377-1391. • Cao, W. and Garcia-Blanco, M. A. (1998). A serine/arginine-rich domain in the human U1 70k protein is necessary and sufficient for ASF/SF2 binding. J. Biol. Chem. 273, 20629-20635. • Chaouiya, C. (2007). Petri net modelling of biological networks. Brief. Bioinform. 8, 210-219. • Chen, J. Y.-F., Stands, L., Staley, J. P., Jackups, R. R., Latus, L. J. and Chang, T.-H. (2001). Specific alterations of U1-C protein or U1 small nuclear RNA can eliminate the requirement of Prp28p, an essential DEAD box splicing factor. Mol. Cell. 7, 227-232. • Chen, Y.-I. G., Moore, R. E., Ge, H. Y., Young, M. K., Lee, T. D. and Stevens, S. W. (2007). Proteomic analysis of in vivo-assembled pre-mRNA splicing complexes expands the catalog of participating factors. Nucleic Acids Res. 35, 3928-3944. • Cormack, R. M. (1971). A review of classification. J. R. Stat. Soc. 134, 321-367. • Crispino, J. D. and Sharp, P. A. (1995). A U6 snRNA:pre-mRNA interaction can be rate-limiting for U1-independent splicing. Genes Dev. 9, 2314-2323. • Crispino, J. D., Mermoud, J. E., Lamond, A. I. and Sharp, P. A. (1996). Cis-acting elements distinct from the 5’ splice site promote U1-independent pre-mRNA splicing. RNA 2, 664-673. • Das, R., Zhou, Z. and Reed, R. (2000). Functional association of U2 snRNP with the ATP-independent spliceosomal complex E. Mol. Cell. 5, 779-787. • Del Gatto-Konczak, F., Bourgeois, C. F., Le Guiner, C., Kister, L., Gesnel, M.-C., St e´ venin, J. and Breathnach, R. (2000). The RNA-binding protein TIA-1 is a novel mammalian splicing regulator acting through intron sequences adjacent to a 5’ splice site. Mol. Cell. Biol. 20, 6287-99. • D¨onmez, G., Hartmuth, K., Kastner, B., Will, C. L. and L u¨ hrmann, R. (2007). The 5’ end of U2 snRNA is in close proximity to U1 and functional sites of the pre-mRNA in early spliceosomal complexes. Mol. Cell. 25, 399-411. • Du, H. and Rosbash, M. (2002). The U1 snRNP protein U1C recognizes the 5’ splice site in the absence of base pairing. Nature 419, 86-90. • Dybkov, O., Will, C. L., Deckert, J., Behzadnia, N., Hartmuth, K. and Lu¨ hrmann, R. (2006). U2 snRNA-protein contacts in purified human 17S U2 snRNPs and in spliceosomal A and B complexes. Mol. Cell. Biol. 26, 2803-2816. • Fleckner, J., Zhang, M., Valc´arcel, J. and Green, M. R. (1997). U2AF65 recruits a novel human DEAD box protein required for the U2 snRNPbranchpoint interaction. Genes Dev. 11, 1864-1872. • F¨orch, P., Puig, O., Mart´enez, C., S´eraphin, B. and Valc´arcel, J. (2002). The splicing regulator TIA-1 interacts with U1-C to promote U1 snRNP recruitment to 5’ splice sites. EMBO J. 21, 6882-6892. • Fortes, P., Bilbao-Cort´es, D., Fornerod, M., Rigaut, G., Raymond, W., S´eraphin, B. and Mattaj, I. W. (1999). Luc7p, a novel yeast U1 snRNP protein with a role in 5’ splice site recognition. Genes Dev. 13, 2425-2438. • Gottschalk, A., Kastner, B., Lu¨ hrmann, R. and Fabrizio, P. (2001). The yeast U5 snrnp coisolated with the U1 snrnp has an unexpected protein composition and includes the splicing factor Aar2p. RNA 7, 1554-1565. • Gozani, O., Potashkin, J. and Reed, R. (1998). A potential role for U2AF-SAP 155 interactions in recruiting U2 snRNP to the branch site. Mol. Cell. Biol. 18, 4752-4760.
276
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
• Grafahrend-Belau, E., Schreiber, F., Heiner, M., Sackmann, A., Junker, B., Grunwald, S., Speer, A., Winder, K. and Koch, I. (2008). Modularization of biochemical networks based on classification of Petri net t-invariants. BMC Bioinformatics 9, 90. • Graveley, B. R. (2001). Alternative splicing: increasing diversity in the proteomic world. Trends Genet. 17, 100-107. • G¨ornemann, J., Kotovic, K. M., Hujer, K. and Neugebauer, K. M. (2005). Cotranscriptional spliceosome assembly occurs in a stepwise fashion and requires the cap binding complex. Mol. Cell. 19, 53-63. • Grunwald, S., Speer, A., Ackermann, J. and Koch, I. (2008). Petri net modelling of gene regulation of the Duchenne muscular dystrophy. Biosystems 92, 189-205. • Haddad, J. J. (2002). Cytokines and related receptor-mediated signaling pathways. Biochem. Biophys. Res. Commun. 297, 700-713. • Heiner, M., Koch, I. and Will, J. (2004). Model validation of biological pathways using Petri nets-demonstrated for apoptosis. Biosystems 75, 15-28. • Hiller, M., Huse, K., Szafranski, K., Jahn, N., Hampe, J., Schreiber, S., Backofen, R. and Platzer, M. (2004). Widespread occurrence of alternative splicing at NAGNAG acceptors contributes to proteome plasticity. Nat. Genet. 36, 1255-1257. • Hiller, M., Nikolajewa, S., Huse, K., Szafranski, K., Rosenstiel, P., Schuster, S., Backofen, R. and Platzer, M. (2007). TassDB: a database of alternative tandem splice sites. Nucleic Acids Res. 35, D188-D192. • Hofest¨adt, R. (1994). Petri Net application of metabolic processes. Systems Analysis Modelling Simulation 16, 113-122. • Hofest¨adt, R. and Thelen, S. (1998). Quantitative modeling of biochemical networks. In Silico Biol. 1, 0006. • House, A. E. and Lynch, K. W. (2007). Regulation of alternative splicing: More than just the ABCs. J. Biol. Chem. 283, 1217-1221. • Jin, Z. and El-Deiry, W. S. (2005). Overview of cell death signaling pathways. Cancer Biol. Ther. 4, 139-163. • Johnson, J. M., Castle, J., Garrett-Engele, P., Kan, Z., Loerch, P. M., Armour, C. D., Santos, R., Schadt, E. E., Stoughton, R. and Shoemaker, D. D. (2003). Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science 302, 2141-2144. • Jurica, M. S. and Moore, M. J. (2003). Pre-mRNA splicing: awash in a sea of proteins. Mol. Cell. 12, 5-14. • Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T. and Yamanishi, Y. (2008). KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480-D484. • Kerrien, S., Alam-Faruque, Y., Aranda, B., Bancarz, I., Bridge, A., Derow, C., Dimmer, E., Feuermann, M., Friedrichsen, A., Huntley, R., Kohler, C., Khadake, J., Leroy, C., Liban, A., Lieftink, C., Montecchi-Palazzi, L., Orchard, S., Risse, J., Robbe, K., Roechert, B., Thorneycroft, D., Zhang, Y., Apweiler, R. and Hermjakob, H. (2007). IntAct – open source resource for molecular interaction data. Nucleic Acids Res. 35, D561-D565. • Kielbassa, J., Bortfeldt, R., Schuster, S. and Koch, I. (2008). Modeling of the U1 snRNP assembly pathway in alternative splicing in human cells using Petri nets. Comput. Biol. Chem. 33, 46-61. • Koch, I. and Heiner, M. (2008). Petri nets in analysis of biological networks. In: Wiley Book Series in Bioinformatics, Junker, B. H. and Schreiber, F. (eds.), Pan, Y. and Zomaya, A. Y. (series eds.), Wiley, pp. 139-180. • Koch, I., Junker, B. H. and Heiner, M. (2005). Application of Petri net theory for modelling and validation of the sucrose breakdown pathway in the potato tuber. Bioinformatics 21, 1219-1226. • Konarska, M. M. and Query, C. C. (2005). Insights into the mechanisms of splicing: more lessons from the ribosome. Genes Dev. 19, 2255-2260. • Krull, M., Pistor, S., Voss, N., Kel, A., Reuter, I., Kronenberg, D., Michael, H., Schwarzer, K., Potapov, A., Choi, C., Kel-Margoulis, O. and Wingender, E. (2006). Transpath: an information resource for storing and visualizing signaling pathways and their pathological aberrations. Nucleic Acids Res. 34, D546-D551. • Kyriakopoulou, C., Larsson, P., Liu, L., Schuster, J., So¨ derbom, F., Kirsebom, L. A. and Virtanen, A. (2006). U1-like snRNAs lacking complementarity to canonical 5’ splice sites. RNA 12, 1603-1611. • Laggerbauer, B., Achsel, T. and L¨uhrmann, R. (1998). The human U5-200kD DEXH-box protein unwinds U4/U6 RNA duplices in vitro. Proc. Natl. Acad. Sci. USA 95, 4188-4192. • Liu, S., Rauhut, R., Vornlocher, H.-P. and Lu¨ hrmann, R. (2006). The network of protein-protein interactions within the human U4/U6.U5 tri-snRNP. RNA 12, 1418-1430. • Lybarger, S., Beickman, K., Brown, V., Dembla-Rajpal, N., Morey, K., Seipelt, R. and Rymond, B. C. (1999). Elevated levels of a U4/U6.U5 snRNP-associated protein, Spp381p, rescue a mutant defective in spliceosome maturation. Mol. Cell. Biol. 19, 577-584. • MacMillan, A. M., McCaw, P. S., Crispino, J. D. and Sharp, P. A. (1997). SC35-mediated reconstitution of splicing in U2AF-depleted nuclear extract. Proc. Natl. Acad. Sci. USA 94, 133-136. • Makarov, E. M., Makarova, O. V., Urlaub, H., Gentzel, M., Will, C. L., Wilm, M. and L u¨ hrmann, R. (2002). Small nuclear ribonucleoprotein remodeling during catalytic activation of the spliceosome. Science 298, 2205-2208. • Makarova, O. V., Makarov, E. M., Urlaub, H., Will, C. L., Gentzel, M., Wilm, M. and L u¨ hrmann, R. (2004). A subset of human 35S U5 proteins, including Prp19, function prior to catalytic step 1 of splicing. EMBO J. 23, 2381-2391.
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
277
• Malca, H., Shomron, N. and Ast, G. (2003). The U1 snRNP base pairs with the 5’ splice site within a penta-snRNP complex. Mol. Cell. Biol. 23, 3442-3455. • Matsuno, H., Doi, A., Nagasaki, M. and Miyano, S. (2000). Hybrid Petri net representation of gene regulatory network. Pac. Symp. Biocomput. 5, 338-349. • Matsuno, H., Inouye, S.-I. T., Okitsu, Y., Fujii, Y. and Miyano, S. (2006). A new regulatory interaction suggested by simulations for circadian genetic control mechanism in mammals. J. Bioinform. Comput. Biol. 4, 139-153. • Mayas, R. M., Maita, H. and Staley, J. P. (2006). Exon ligation is proofread by the DExD/H-box ATPase Prp22p. Nat. Struct. Mol. Biol. 13, 482-490. • Modrek, B., Resch, A., Grasso, C. and Lee, C. (2001). Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res. 29, 2850-2859. • Nottrott, S., Urlaub, H. and L¨uhrmann, R. (2002). Hierarchical, clustered protein interactions with U4/U6 snRNA: a biochemical role for U4/U6 proteins. EMBO J. 21, 5527-5538. • Pandit, S., Lynn, B. and Rymond, B. C. (2006). Inhibition of a spliceosome turnover pathway suppresses splicing defects. Proc. Natl. Acad. Sci. USA 103, 13700-13705. • Papin, J. A., Stelling, J., Price, N. D., Klamt, S., Schuster, S. and Palsson, B. O. (2004). Comparison of network-based pathway analysis methods. Trends Biotechnol. 22, 400-405. • Park, J. W., Parisky, K., Celotto, A. M., Reenan, R. A. and Graveley, B. R. (2004). Identification of alternative splicing regulators by RNA interference in Drosophila. Proc. Natl. Acad. Sci. USA 101, 15974-15979. • Pawson, T. and Nash, P. (2000). Protein-protein interactions define specificity in signal transduction. Genes Dev. 14, 1027-1047. • Perriman, R., Barta, I., Voeltz, G. K., Abelson, J. and Ares, M. jr. (2003). ATP requirement for Prp5p function is determined by Cus2p and the structure of U2 small nuclear RNA. Proc. Natl. Acad. Sci. USA 100, 13857-13862. • Pfeiffer, T., S´anchez-Valdenebro, I., Nu˜no, J. C., Montero, F. and Schuster, S. (1999). METATOOL: for studying metabolic networks. Bioinformatics 15, 251-257. • Prieto, C. and De Las Rivas, J. (2006). APID: Agile Protein Interaction DataAnalyzer. Nucleic Acids Res. 34, W298W302. • Puig, O., Gottschalk, A., Fabrizio, P. and S´eraphin, B. (1999). Interaction of the U1 snRNP with nonconserved intronic sequences affects 5’ splice site selection. Genes Dev. 13, 569-580. • The R Development Core Team (2005). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. • Reddy, V. N., Mavrovouniotis, M. L. and Liebman, M. N. (1993). Petri net representations in metabolic pathways. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1, 328-336. • Reddy, V. N., Liebman, M. N. and Mavrovouniotis, M. L. (1996). Qualitative analysis of biochemical reaction systems. Comput. Biol. Med. 26, 9-24. • Rohr, C., Marwan, W. and Heiner, M. (2010). Snoopy – a unifying petri net framework to investigate biomolecular networks. Bioinformatics 26, published ahead of print. • Sackmann, A., Heiner, M. and Koch, I. (2006). Application of Petri net based analysis techniques to signal transduction pathways. BMC Bioinformatics 7, 482. • Sackmann, A., Formanowicz, D., Formanowicz, P., Koch, I. and Blazewicz, J. (2007). An analysis of the Petri net based model of the human body iron homeostasis process. Comput. Biol. Chem. 31, 1-10. • Schaffert, N., Hossbach, M., Heintzmann, R., Achsel, T. and L¨uhrmann, R. (2004). RNAi knockdown of hPrp31 leads to an accumulation of 36 U4/U6 di-snRNPs in Cajal bodies. EMBO J. 23, 3000-3009. • Schilling, C. H., Edwards, J. S. and Palsson, B. O. (1999). Toward metabolic phenomics: analysis of genomic data using flux balances. Biotechnol. Prog. 15, 288-295. • Schuster, S. and Hilgetag, C. (1994). On elementary flux modes in biochemical reaction systems at steady state. J. Biol. Syst. 2, 165-182. • Schuster, S., Dandekar, T. and Fell, D. A. (1999). Detection of elementary flux modes in biochemical networks: a promising tool for pathway analysis and metabolic engineering. Trends Biotechnol. 17, 53-60. • Schuster, S., Klipp, E. and Marhl, M. (2006). The Predictive Power of Molecular Network Modelling. In: Discovering biomolecular mechanisms with computational biology, Eisenhaber, F. (ed.), Springer, pp. 95-103. • Schuster, S., von Kamp, A. and Pachkov, M. (2007). Understanding the roadmap of metabolism by pathway analysis. Methods Mol. Biol. 358, 199-226. • Schwer, B. and Guthrie, C. (1991). PRP16 is an RNA-dependent ATPase that interacts transiently with the spliceosome. Nature 349, 494-499. • Shen, H. and Green, M. R. (2004). A pathway of sequential arginine-serine-rich domain-splicing signal interactions during mammalian spliceosome assembly. Mol. Cell. 16, 363-373. • Shen, J., Zhang, L. and Zhao, R. (2007). Biochemical characterization of the ATPase and helicase activity of UAP56, an essential pre-mRNA splicing and mRNA export factor. J. Biol. Chem. 282, 22544-22550.
278
R.H. Bortfeldt et al. / The Modular Structure of the Spliceosomal Assembly Network
• Sim˜ao, E., Remy, E., Thieffry, D. and Chaouiya, C. (2005). Qualitative modelling of regulated metabolic pathways: application to the tryptophan biosynthesis in E. coli. Bioinformatics 21 Suppl. 2, ii190-ii196. • Small, E. C., Leggett, S. R., Winans, A. A. and Staley, J. P. (2006). The EF-G-like GTPase Snu114p regulates spliceosome dynamics mediated by Brr2p, a DExD/H box ATPase. Mol. Cell. 23, 389-399. • Spadaccini, R., Reidt, U., Dybkov, O., Will, C., Frank, R., Stier, G., Corsini, L., Wahl, M. C., L u¨ hrmann, R. and Sattler, M. (2006). Biochemical and NMR analyses of an SF3b155-p14-U2AF-RNA interaction network involved in branch point definition during pre-mRNA splicing. RNA 12, 410-425. • Staley, J. P. and Guthrie, C. (1998). Mechanical devices of the spliceosome: motors, clocks, springs, and things. Cell 92, 315-326. • Stamm, S. (2002). Signals and their transduction pathways regulating alternative splicing: a new dimension of the human genome. Hum. Mol. Genet. 11, 2409-2416. • Starke, P. (1998). INA – Integrated Net Analyzer Manual, Berlin. HU Berlin, Dept. of CS, www2.informatik.huberlin.de/lehrstuehle/automaten/ina/. • Stevens, S. W., Barta, I., Ge, H. Y., Moore, R. E., Young, M. K., Lee, T. D. and Abelson, J. (2001). Biochemical and genetic analyses of the U5, U6, and U4/U6 x U5 small nuclear ribonucleoproteins from Saccharomyces cerevisiae. RNA 7, 1543-1553. • Stevens, S. W., Ryan, D. E., Ge, H. Y., Moore, R. E., Young, M. K., Lee, T. D. and Abelson, J. (2002). Composition and functional characterization of the yeast spliceosomal penta-snRNP. Mol. Cell. 9, 31-44. • Takai-Igarashi, T. (2005). Ontology based standardization of Petri net modeling for signaling pathways. In Silico Biol. 5, 0047. • Tarn, W.-Y. and Steitz, J. A. (1994). SR proteins can compensate for the loss of U1 snRNP functions in vitro. Genes Dev. 8, 2704-2717. • Thanaraj, T. A., Stamm, S., Clark, F., Riethoven, J.-J., Le Texier, V. and Muilu, J. (2004). ASD: the Alternative Splicing Database. Nucleic Acids Res. 32, D64-D69. • Thormann, A., Rudolph, K. and Koch, I. (2009). TInA (T-invariant analysis) – a tool box for exploring pathways in biochemical systems at steady state. In: German Conference on Bioinformatics 2009, Short Papers and Abstracts, Grosse, I., Neumann, S., Posch, S., Schreiber, F. and Stadler, P. (eds.), Halle, pp. 157-158. • Turner, I. A., Norman, C. M., Churcher, M. J. and Newman, A. J. (2004). Roles of the U5 snRNP in spliceosome dynamics and catalysis. Biochem. Soc. Trans. 32, 928-931. • Valadkhan, S., Mohammadi, A., Wachtel, C. and Manley, J. L. (2007). Protein-free spliceosomal snRNAs catalyze a reaction that resembles the first step of splicing. RNA 13, 2300-2311. • van Nues, R. W. and Beggs, J. D. (2001). Functional contacts with a range of splicing proteins suggest a central role for Brr2p in the dynamic control of the order of events in spliceosomes of Saccharomyces cerevisiae. Genetics 157, 1451-1467. • Venables, J. P. (2004). Aberrant and alternative splicing in cancer. Cancer Res. 64, 7647-7654. • Villa, T. and Guthrie, C. (2005). The Isy1p component of the NineTeen complex interacts with the ATPase Prp16p to regulate the fidelity of pre-mRNA splicing. Genes Dev. 19, 1894-1904. • von Mering, C., Jensen, L. J., Kuhn, M., Chaffron, S., Doerks, T., Kr¨uger, B., Snel, B. and Bork, P. (2007). STING 7 – recent developments in the integration and prediction of protein interactions. Nucleic Acids Res. 35, D358-D362. • Will, C. L. and Lu¨ hrmann, R. (2001). Spliceosomal UsnRNP biogenesis, structure and function. Curr. Opin. Cell Biol. 13, 290-301. • Will, C. L. and Lu¨ hrmann, R. (2005). Splicing of a rare class of introns by the U12-dependent spliceosome. Biol. Chem. 386, 713-724. • Will, C. L., Urlaub, H., Achsel, T., Gentzel, M., Wilm, M. and L u¨ hrmann, R. (2002). Characterization of novel SF3b and 17S U2 snRNP proteins, including a human Prp5p homologue and an SF3b DEAD-box protein. EMBO J. 21, 4978-4988. • Xie, J., Beickman, K., Otte, E. and Rymond, B. C. (1998). Progression through the spliceosome cycle requires Prp38p function for U4/U6 snRNA dissociation. EMBO J. 17, 2938-2946. • Xiong, M., Zhao, J. and Xiong, H. (2004). Network-based regulatory pathways analysis. Bioinformatics 20, 2056-2066. • Xu, Y.-Z., Newnham, C. M., Kameoka, S., Huang, T., Konarska, M. M. and Query, C. C. (2004). Prp5 bridges U1 and U2 snRNPs and enables stable U2 snRNP association with intron RNA. EMBO J. 23, 376-385. • Zevedei-Oancea, I. and Schuster, S. (2003). Topological analysis of metabolic networks based on Petri net theory. In Silico Biol. 3, 0029. • Zhang, S., Jin, G., Zhang, X.-S. and Chen, L. (2007). Discovering functions and revealing mechanisms at molecular level from biological networks. Proteomics 7, 2856-2869. • Zhou, Z., Licklider, L. J., Gygi, S. P. and Reed, R. (2002). Comprehensive proteomic analysis of the human spliceosome. Nature 419, 182-185.
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2010, 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved. doi:10.3233/978-1-60750-704-8-279
279
Modelling the Molecular Interactions in the Flower Developmental Network of Arabidopsis thaliana Kerstin Kaufmanna , Masao Nagasakib and Ruy J´aureguic,∗ a
Business Unit Bioscience, Plant Research International, Wageningen, The Netherlands Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan c BIOBASE GmbH, Wolfenb u¨ ttel, Germany b
ABSTRACT: We present a dynamical model of the gene network controlling flower development in Arabidopsis thaliana. The network is centered at the regulation of the floral organ identity genes (AP1, AP2, AP3, PI and AG) and ends with the transcription factor complexes responsible for differentiation of floral organs. We built and simulated the regulatory interactions that determine organ specificity using an extension of hybrid Petri nets as implemented in Cell Illustrator. The network topology is characterized by two main features: (1) the presence of multiple autoregulatory feedback loops requiring the formation of protein complexes, and (2) the role of spatial regulators determining floral patterning. The resulting network shows biologically coherent expression patterns for the involved genes, and simulated mutants produce experimentally validated changes in organ expression patterns. The requirement of heteromeric higher-order protein complex formation for positive autoregulatory feedback loops attenuates stochastic fluctuations in gene expression, enabling robust organ-specific gene expression patterns. If autoregulation is mediated by monomers or homodimers of proteins, small variations in initial protein levels can lead to biased production of homeotic proteins, ultimately resulting in homeosis. We also suggest regulatory feedback loops involving miRNA loci by which homeotic genes control the activity of their spatial regulators. KEYWORDS: Dynamical model, flower development, gene network
INTRODUCTION Complex regulatory interactions between transcription factors and the prevalence of autoregulation are common themes in gene regulatory networks. Recent evidence suggests that direct molecular interactions form the basis of most regulatory processes: transcription factors generally bind a high number of sites in the genome, and can thus potentially directly influence the expression of hundreds of genes [Farnham, 2009]. This paradigm may hold for all eukaryotes, since it is supported by data from animals (see, e.g. Li et al., 2008], and plants [e.g. Kaufmann et al., 2009; Oh et al., 2009]. Floral development is initiated in response to a variety of internal and environmental stimuli, such as temperature and light (reviewed in Putterill et al., 2004). Different floral induction pathways are ∗
Corresponding author. E-mail: [email protected].
280
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
controlled by a small set of flowering time genes, which in turn activate the floral meristem identity genes APETALA1 (AP1) and LEAFY (LFY). These meristem identity genes stimulate the expression of floral organ identity genes. Organ identity genes act in a combinatorial fashion to specify the different types of floral organs: sepals, petals, stamens and carpels. According to the ‘floral quartet model’, the proteins encoded by floral homeotic genes assemble into distinct multimeric complexes in an organtype specific manner [Theissen and Saedler, 2001]. All of the floral homeotic proteins present in those complexes belong to the MADS-box family of transcription factors. Evidence from yeast n-hybrid studies suggests that higher-order complex formation is mediated mostly by members of the SEPALLATA (SEP) subfamily of MADS-domain proteins [Honma and Goto, 2001; Immink et al., 2009], which are required for specification of the identities of all 4 types of floral organs. The SEPALLATA subfamily consists of 4 largely redundant genes (SEP1-SEP4). Combined loss-of-function mutations in each of the 4 genes lead to homeotic conversion of all types of floral organ to leaf-like organs [Ditta et al., 2004]. The formation of multimeric protein complexes seems to be not only required for the regulation of downstream targets, but may also play a role in positive autoregulation of floral homeotic genes. In addition to initial upregulation by upstream factors, autoregulation has been observed for key floral homeotic genes, like APETALA3 (AP3) and PISTILLATA (PI) [Jack et al., 1994; Krizek and Meyerowitz, 1996; Hill et al., 1998; Honma and Goto, 2000], AGAMOUS (AG) [Gomez-Mena et al., 2005] and SEPALLATA3 (SEP3) [Kaufmann et al., 2009]. The requirement for autoregulation involving heterodimer formation has so far been characterized primarily for AP3 and PI, which are the two floral homeotic B function proteins specifying petal and stamen identity. However, genome-wide binding data for SEP3 indicate that it binds to the promoters of almost all of the floral homeotic genes, and induction experiments also show that it can upregulate the expression of these genes [Kaufmann et al., 2009]. Since SEP3 is a key mediator of heteromeric higher-order complex formation between floral homeotic proteins, and autoregulation is observed for nearly all floral homeotic proteins, this likely indicates that autoregulation is mediated by higher-order complexes, although the initial expression of floral homeotic genes is unaffected in sepallata triple mutants [Pelaz et al., 2000]. Thus, the combination of results from protein-protein and protein-DNA interaction studies as well as genetic evidence suggest a complex scenario for the establishment of the different floral organ identities by multiple direct protein-protein and regulatory interactions. Despite the fact that not all interactions have been confirmed in planta yet, current evidence allows us to generate a model for interactions during early flower development. Furthermore, other recent evidence suggests post-transcriptional control mechanisms in the network, such as the role of the microRNA miR172 in the translational repression of the spatial regulator APETALA2 (AP2) [Chen, 2004]. The complexity of direct molecular interactions necessitates the use of novel computational tools to understand the flowering process, optimally those which would allow for the explicit modelling of transcription, translation and protein binding reactions. A limitation at this point is a general lack of quantitative data for these different processes, restricting the modelling to generic estimates. Ordinary differential equations are widely accepted as a modelling method for biological pathways [Sun and Zhao, 2004]. However, this method carries the disadvantage of being difficult to represent schematic information such as pathway models illustrated using biological elements such as mRNAs and proteins. The estimation of the required parameters for a simulation, especially in the case of a gene regulatory network, is also an open issue. In this regard, Petri nets offer an attractive alternative for simple construction, visualization and simulation of gene regulatory networks. A Petri net is a mathematical model used for the representation and analysis of concurrent processes. Petri nets are described in part by the visual elements “place”, “transition”, “arc”, and “token” (Fig. 1).
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
281
Fig. 1. In Cell Illustrator Petri nets are constructed using three kinds of symbols for entities, processes, and connectors. Discrete, continuous, and generic types are associated to entities and processes, and direct, associate and inhibitory types are associated to connectors. Additionally, entities and processes can be replaced with pictures of biological ontology terms from the Cell System Ontology [Jeong et al., 2007]. This information enhances both the human readability and machine readability of the HFPNe biological pathway models built in Cell Illustrator.
Arcs are directed connections from places to transitions (input) and from transitions to places (output). Places can contain tokens, and a transition which has all places connected by input arcs with tokens, will transfer these tokens (in discrete units) to the places connected by output arcs. Since the proposal of the original Petri net [Petri, 1962], various types of Petri nets have been developed, e.g. timed Petri net, continuous Petri net and Hybrid Petri net (HPN) [Nagasaki et al., 2005]. Hybrid Petri nets have new types of place and transition that can receive continuous token values, along with the classic discrete ones. Like other proposed Petri nets, Hybrid Petri nets also expand the original Petri net by introducing the concepts of “inhibitory arc” and “test arc”; the inhibitory arc will stop a transition from working if the value on the place it is connected to (its input) is higher than a certain threshold (weight in Petri net terms), which in biology might easily represent a transcriptional repressor or an enzyme inhibitor. The test arc will not consume any tokens from the place it is connected to, making an easy analogy to an enzyme, which is not consumed by the reaction it catalyses. Pioneering works by the groups of Reddy and Hofest a¨ dt were among the first to apply Petri nets to the modelling of biological pathways [Reddy et al., 1993; Hofest a¨ dt and Thelen, 1998]. An advanced HPN has been also applied to the modelling of lambda phage pathway [Matsuno et al., 2000]. Furthermore, Hybrid Functional Petri nets (HFPN) and its extension (HFPNe) were developed by expanding the HPN to be more powerful and suitable for biological pathway modelling and simulation [Matsuno et al., 2003; Nagasaki et al., 2004]. HFPN cannot only handle both discrete and continuous events at once, but also allows any kind of functions to be assigned to the delay, weight and speed parameters. HFPNe was introduced to facilitate the handling of any kind of objects within the concept of a Petri net. To handle these objects, new generic elements for place and transition were introduced (see Fig. 1). Using HFPNe, complicated biological pathway processes can be modelled, for example, networks involving gene regulation, signal transduction and metabolical reactions, as well as other biological processes that are not normally treated in biological pathways, such as alternative splicing and frame-shifting [Nagasaki et al., 2004]. Differential equations can be easily modelled using a subset of HFPNe elements, by assigning continuous values for place and transition and suspending the weight parameters evaluation. Nagasaki et al.,
282
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
have described a detailed formal definition and the properties of HFPNe [Nagasaki et al., 2004; Nagasaki et al., 2005]. In HFPNe, to bridge the gap between computer science and biology, the Petri net terms of place, transition, arc, and token are renamed to the more intuitive terms entity, process, connector, and content, respectively. Cell Illustrator is a software implementation that includes HFPNe as well as an extended graphical user interface for building and simulating biological networks. By using Cell Illustrator a researcher can directly draw a network map using icons to represent Petri net elements (entities, processes and connectors), assign speed rules to the processes and directly simulate the dynamics of the network. A plot with the concentration change of the different entities is displayed during simulation time. Models of the flowering network have been described and simulated in the past, originally as a Boolean gene network [Mendoza and Alvarez-Buylla, 1998] which included only 10 genes, but was later refined to a logical network including 15 genes [Espinosa-Soto et al., 2004]. These network analyses could correctly identify the steady state gene activation patterns for each of the floral organs, and demonstrated the importance of the network architecture above of that of the initial parameters assumed. In the model presented here, we include both direct regulation between genes (mediated by a protein) and the formation and regulatory effect of heterodimeric transcription factor complexes, thus creating a larger and more complex network. Explicit translation reactions in the model also allow for the inclusion of post-transcriptional regulation, such as translational inhibition by miRNAs. Individual transcription reactions dependent upon different regulatory elements can be associated with each gene, enabling us to distinguish distinct transcription factor binding events, allowing promoter elements bound by different transcription factors to be separated and providing the ability to model binding-site competition. Different floral homeotic protein complexes may compete for binding sites, as is suggested by overlapping DNAbinding preferences of different floral homeotic factors (reviewed in Melzer et al., 2006). Our model assumes that organ-specific developmental programmes are stabilized by autoregulatory loops involving all members of a floral homeotic protein complex, which is further supported by our perturbation analysis results. METHODS Modelling and simulation software: Cell Illustrator The Cell Illustrator software implements the HFPNe architecture with highly tuned modelling and simulation graphical user interfaces [Nagasaki et al., 2003; 2009b]. Publicly available models created on Cell Illustrator are maintained in two websites (http://www.csml.org/ and http://genome.ib.sci.yamaguchiu.ac.jp/˜gon). The genetic network controlling flower development was implemented using the latest Cell Illustrator Online 4.0 version (http://www.cellillustrator.com/ [Nagasaki et al., 2009a]). Construction of the network The blueprint of the regulatory network was built by compiling information from current literature, and ensuring that each proposed entity and connector is qualitatively supported by genetic data and, whenever possible, by molecular data that confirms a direct physical interaction. The basic module used in this network was composed of a transcription reaction producing an mRNA, connected to a translation reaction producing a protein and degradation reactions for both of the mRNA and protein products
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
283
Fig. 2. Basic network structure implemented in Cell Illustrator. The basic regulatory module used to build this model consists of a transcription reaction producing an mRNA, followed by a translation reaction producing a protein. A protein can form complexes through a binding reaction, and either single proteins or protein complexes can act as activators of transcription. All entities involved in the simulation have a degradation reaction associated.
(Fig. 2). The network was then made using the underlying HFPNe architecture in Cell Illustrator by first drawing the entities involved on a canvas, and then connecting them by intermediary processes such as transcription, translation, activation, inhibition and degradation. Finally speed rules were set to reproduce biologically meaningful simulations. All entities can receive any floating-point value, and all reactions can receive any given speed rule [Doi et al., 2004; Nagasaki et al., 2004]. As additional elements, the network included binding reactions between proteins and the degradation of the resulting protein complexes. Regulatory connections (activation, inhibition) were made between transcription factors and the transcription of mRNAs (treated as reactions), or, in the case of the miRNA miR172, between the miRNA and the regulated translation reaction. When evidence of different regulatory elements was available, independent transcription reactions were added corresponding to each of the regulatory elements in play; as an example, AG is known to be regulated by LFY and MADS transcription factor complexes, so in the final model AG had independent transcription reactions for each regulating factor or complex. Competition between different MADS complexes for the same sites in positive regulatory interactions was implemented by assuming that the sites can become saturated, and the production of mRNA would reach a certain maximal transcription speed, beyond which different levels of competing transcription factors would make no further contribution to the process. In order to reflect this, a threshold value for transcription speed was set on transcription reactions regulated by common factors in order to make the different contributions additive only before the saturation threshold was reached. The network includes static entities, whose concentration does not change during the simulation, and dynamic entities, which are actually playing an active role in the simulation. The flowering time complexes FD/FT and SOC1/AGL24 are static entities which are set as the starting point of the network (Fig. 3). Their concentration is set to a constant value of 1, and their activity is set to act as a pulse, activating the downstream entities only during a short time interval (see methods for details). All other entities in the network with the exception of the spatial regulators SUP, UFO and miR172 are dynamic: the meristem identity genes AP1/CAL, LFY and AP2 are the first entities activated in the simulation, followed downstream by the organ identity genes SEP, AP3, PI and AG. The production of these organ
284
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
Fig. 3. Simplified representation of the network considered in this work, in which only proteins, protein complexes and miRNAs are depicted. The SEP entity has been duplicated for the sake of visual clarity. Reaction processes are depicted by 2 kinds of yellow diamonds, transcription (black and white icon) and protein binding (red arrow). Dashed arrows correspond to associative processes, solid black arrows correspond to direct processes and solid lines with blunt ends correspond to inhibitory reactions. The organ specified by each end transcription factor complex is indicated at the bottom of the figure. The individual protein and miRNA names depicted in the network are, from top left to bottom right: FD/FT: FLOWERING LOCUS D/FLOWERING LOCUS T, UFO: UNUSUAL FLORAL ORGANS, SOC1/AGL4: SUPPRESSOR OF CONSTANS OVEREXPRESSION/AGAMOUS-LIKE 24, miR172: microRNA 172, AP1/CAL: APETALA1/CAULIFLOWER, LFY: LEAFY, AP2: APETALA2, SUP: SUPERMAN, SEP: SEPALLATA, AP3: APETALA3, PI: PISTILLATA, AG: AGAMOUS.
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
285
identity transcription factors leads to the formation of protein dimers (SEP/AP1, AP3/PI, SEP/AG) and higher-order complexes (SEP/AP1/AP3/PI and SEP/AP3/PI/AG). Even though some of the factors that control spatial expression domains of organ identity genes in flower development are already known (e.g. SUP, UFO, miR172), the upstream regulation of these spatial control genes remains to be elucidated. In this network model, the simulation started from a set of 4 different initial conditions that replicate the known activities of these spatial regulators. These initial conditions depend on the value of the 3 spatial regulators UFO, SUP and miR172. The concentration values for the entities of the spatial regulators SUP and miR172 were set to 0 or 1 as on/off states. The translation speed of UFO was chosen to maintain the biological congruence of the network, at 0.5 units per simulation cycle in active state, and 0 when inactive. According to the model network, high concentrations of UFO would sequester the protein LFY and alter the expression patterns of AP1/CAL and SEP, which are regulated by LFY alone; thus the chosen speed allows the presence of free LFY protein. The initial conditions sets were then chosen as follows: – SUP on, miR172 off and UFO off leading to sepal formation: The presence of AP2 inhibits expression of AG. Since UFO is off and SUP is on, there is no AP3/PI production, leading to the expression of SEP/AP1 as steady-state TF complex. – SUP off, miR172 off and UFO on leading to petal formation: The expression of UFO leads to the formation of the dimer UFO/SEP which activates the expression of AP3/PI. The presence of AP2 inhibits expression of AG and allows expression of AP1, leading to the formation of the steady-state TF complex SEP/AP1/AP3/PI. – SUP off, miR172 on and UFO on leading to stamen formation: inhibition of AP2 by miR172 allows AG to be expressed, and the presence of UFO activates the expression of AP3/PI. AP1 is transiently expressed before the TF complexes that inhibit it are formed, but once AG complexes are formed, AP1 is inhibited, leading to the formation of the steady-state TF SEP/AP3/PI/AG. – SUP on, miR172 on and UFO off leading to carpel formation: SUP inhibits AP3/PI, and the inhibition of AP2 by miR172 allows expression of AG, leading to SEP/AG formation. This factor also inhibits AP1 when formed but allows a transient expression in early simulation time. The final network is provided as a file in Cell Illustrator’s CSML format, and can be inspected and simulated by using the free Cell Illustrator Player program launched from a web browser (https://cionline.hgc.jp/cifileserver/apps/usersman/main). Experimental support for interactions described in the network Floral meristem identity genes. The closely related MADS-box genes APETALA1 and CAULIFLOWER (AP1/CAL) as well as the NonMADS transcription factor LEAFY (LFY) control the initial specification of flowers in response to different floral induction pathways [Huala and Sussex, 1992; Mandel et al., 1992; Weigel et al., 1992]. All three genes are expressed at the earliest stages of floral meristem development and LFY and AP1 are known to positively upregulate each others’ expression [Liljegren et al., 1999]. AP1 and CAL are two closely related paralogous MADS-box genes with highly redundant function [Kempin et al., 1995], and are therefore treated as one functional molecule in our network. AP1 has a second role at later stages of flower development in the specification of sepal and petal identity [Mandel et al., 1992]. AP1 and CAL form dimers with SEPALLATA (SEP) MADS-domain proteins [Pelaz et al., 2001; Castillejo et al., 2005] and interact in a higher-order complex with APETALA3 (AP3) and PISTILLATA (PI) [Honma and Goto, 2001]. According to the ‘quartet model’ of flower development, the
286
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
AP1/SEP protein complex establishes sepal identity, while the AP1/SEP/AP3/PI complex specifies petal identity [Theissen and Saedler, 2001]. At later stages of flower development, AP1/CAL is inhibited by AGAMOUS (AG) in stamens and carpels [Gustafson-Brown et al., 1994]. Since, according to the ‘floral quartet’ model, the principal functional AG complexes in these floral organs are AG/SEP/AP3/PI and AG/SEP, respectively, we assume in our model that both of these complexes can repress the expression of AP1/CAL. Genetic data and the results of gene expression microarray experiments suggest that LFY and AP1 positively regulate the expression of floral homeotic genes AP3, PI and AG [Huala and Sussex, 1992; Weigel and Meyerowitz, 1993; Busch et al., 1999; Honma and Goto, 2000; Lohmann et al., 2001; Wellmer et al., 2006; Chae et al., 2008]. For the upregulation of AP3 and PI, LFY requires UNUSAL FLORAL ORGANS (UFO) as a cofactor [Wilkinson and Haughn, 1995; Lee et al., 1997]. LFY and UFO form functional protein complexes in plants, which are required for binding of LFY to the promoter of AP3 [Chae et al., 2008]. The direct binding of a LFY/UFO complex to the promoter of PI has not yet been demonstrated. However, since LFY and UFO also activate PI expression via a common promoter region [Honma and Goto, 2000], we assume in our model that the LFY/UFO complex acts in a similar manner on the PI promoter. Floral homeotic genes. Aside from the initial activation by AP1 and LFY, there are two major aspects of regulation of floral homeotic genes: (1) autoregulation via multiprotein complexes, and (2) the presence of spatial factors that permit or prohibit the expression of floral homeotic genes in certain tissues within the floral meristem. We implemented 3 known factors that spatially modulate homeotic gene expression: (a) UFO, which is required together with LEAFY for the activation of APETALA3 and PISTILLATA. (b) APETALA2 (AP2), which negatively regulates AG in sepal and petal primordia [Drews et al., 1991; Bomblies et al., 1999]. (c) SUPERMAN (SUP), which negatively regulates AP3 and PI in carpel primordia [Bowman et al., 1992; Yun et al., 2002]. The SEPALLATA genes (SEP1-SEP4) are closely related, highly redundant MADS-box genes which are required for the specification of the identities of all types of floral organ due to upregulation of other floral organ identity genes [Pelaz et al., 2000; Ditta et al., 2004]. Protein-protein interaction data suggest that they form larger complexes with all other floral organ identity proteins belonging to the MADS-box transcription factor family [Honma and Goto, 2001]. According to the current model of flower development, each of the complexes is specific for a certain type of floral organ: the SEP/AP1 complex for sepals, the SEP/AP1/AP3/PI complex for petals, SEP/AG/AP3/PI for stamens and SEP/AG for carpels (Fig. 1) [Honma and Goto, 2001; Theissen and Saedler, 2001]. Only little is known about the regulation of SEP gene expression, however expression microarray data suggest that SEP genes are activated by AP1/CAL, LFY and AG [Schmid et al., 2003; Gomez-Mena et al., 2005; Wellmer et al., 2006]. There are several indications from genetic data that floral homeotic genes can positively upregulate their own expression, and that SEP genes are required for this upregulation. AP3 and PI, which act together in the specification of petal and stamen identity, depend on each other in the autoregulatory process [Jack et al., 1994]. Heterodimerization of the two gene products is required for the positive autoregulation. Since also SEP, AP1 (petals) and AG (stamen) gene products are able to upregulate of AP3 and PI [Gustafson-Brown et al., 1994; Pelaz et al., 2000; Gomez-Mena et al., 2005], we assume in our model that the AP1/SEP/AP3/PI (petal) and AG/SEP/AP3/PI (stamen) protein complexes are the functional complexes for upregulation of AP3 and PI in planta. Gomez-Mena et al., 2005, demonstrated that AG can upregulate its own expression. Since both, the AG and SEP gene products can upregulate AG expression, and AG and SEP proteins can interact with each other as well as with AP3 and PI in a higher-order protein complex [Honma and Goto, 2001], and
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
287
SEP3 and AG bind to regulatory elements in the AG locus [Kaufmann et al., 2009], we assume in the model that the AG/SEP (carpel) and the AG/SEP/PI/AP3 (stamen) protein complexes are functional in this process. Simulation parameters The initial activation of the network, through the entities FD/FT, SOC1/AGL24 and a generic transcription reaction acting on AP2 was given the speed 1 during the first 10 simulation cycles (a simulation cycle corresponds to one Petri net time unit), and 0.01 thereafter as basal activity. Transcription speeds were set so that the sum of transcription activation speeds from reactions simultaneously acting on an entity was equal or lower than 1, and each activation speed depended linearly on the concentration of the activation factor under this limit. For every factor activating a given entity, an independent transcription reaction was set, and the maximal speed of each reaction was chosen so that the maximal combined speed of all the reactions that could be active simultaneously did not exceed 1. As an example, the AG mRNA is transcriptionally induced by LFY and, independently, also by the complexes SEP/AG and SEP/AP3/PI/AG, so the maximal activation speed for each transcription reaction was set to 0.33. Translation speeds were set as the mRNA concentration divided by 5. Binding speeds were set as the product of the concentration of the monomers divided by a constant, which was chosen to better represent the biological implications of the network. In this case, the binding reactions were faster for the petal and stamen complexes (the product of the monomer’s concentration divided by 2) than for the sepal and carpel complexes (the monomer’s concentration divided by 5) following the supposition that the heterodimers may have higher binding affinity than the homodimers [de Folter et al., 2005]. The degradation speeds were set as the concentration of an mRNA divided by 5, and the concentration of a protein divided by 10, under the assumption that proteins are more stable than mRNAs [Matsuno et al., 2003]. The degradation speed of a protein complex was set as the concentration of the complex divided by 15. Exception from these rules were implemented to address the dependence of AP3 and PI on each other. AP3 and PI stabilize each others expression and function as obligate heterodimers [Jack et al., 1994; Hill et al., 1998; Tilly et al., 1998]. To represent this dependence in the network, we assume that the single monomer is extremely unstable and degrades fast, so we set the degradation speed of the protein to be 10 times the protein translation speed if the partner was absent. Reaction thresholds were set to 0.1, which implies the requirement of a low protein concentration before the reaction becomes active. The only exceptions to this rule were applied in the cases of the inhibition of AG by the petal complex and the activation of AG by LFY, where the inhibition and activation thresholds were set to 1. In the first case, the fact that low level presence of the petal complex in early stamen conditions would block the stamen complex formation by inhibiting the transcription of AG. In the second case, LFY tends to be expressed at a very low level, but still over the 0.1 threshold set in general. Under conditions where AG is also expressed, the activity of low level LFY on AG would lead to an overproduction of this protein, while the network structure suggests that AG depends on feedback from the protein complexes in which it is present to keep its expression level. These threshold changes would have the biological implication that the petal complex and LFY have a lower affinity to bind to the AG promoters than AP2 or the stamen and carpel complexes. Further experimental data is necessary to elucidate the binding affinity differences of these transcription factor complexes. Additionally, AG expression is also induced by other factors, e.g. WUSCHEL [Lenhard et al., 2001; Lohmann et al., 2001], which were not considered in our simple model.
288
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
Fig. 4. Simulation results for each of the 4 types of floral organs, dependent on the initial conditions described in the methods. The expression (Y axis) corresponds to a protein concentration value (arbitrary units), and time corresponds to the simulation cycle number. The color coded lines correspond to single and complex factors as described in the bottom table. In every case, the highest expression value in the steady state corresponds to the complex known to be responsible for the differentiation of each organ.
RESULTS Network simulation The network was simulated under the 4 initial conditions described above, until the expression levels of the proteins reached a steady state (Fig. 4). The relative protein expression levels of each simulation are in accordance with those currently known for each floral organ, and provide a qualitative description of the dynamics behind the specification of floral organ identities [Krizek and Fletcher, 2005]. Figure 4 presents results of simulations for each type of floral organ. In all cases, the higher-order protein complex specific for a certain type of floral organ can be recovered as the predominant component in the system after reaching the equilibrium. According to the simulation, the petal and sepal identity factor AP1(/CAL) is transiently expressed during carpel- and stamen initiation, leading also to transient formation of high-order complexes involving these proteins. This is consistent with experimentally determined mRNA and protein expression patterns indicating that AP1 is found in all floral whorls at the earliest stages of flower development [Mandel et al., 1992; Urbanus et al., 2009], and reflects the
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
289
Fig. 5. Simulation of the network under different mutant contexts. The mutant described in each plot was simulated by setting the translation speed of the mutated gene to 0. The steady state expression profiles coincide with the phenotypes of experimentally characterized mutants.
dual role of AP1 and CAL as flower meristem identity genes. Thus, the model predicts that different higher-order complexes may transiently coexist at early time-points of flower meristem development. In order to further test our model, simulations of various known mutants were performed. The networks of mutant scenarios were simulated by setting the translation speed of the targeted entity to 0. As can be seen in Fig. 5, mutation of the LFY gene under the stamen conditions results in a transient low-level activation of AP1/CAL and SEP proteins, which is not sufficient to trigger persistent upregulation of floral homeotic genes. In agreement with this, the lfy mutant produces leaf-like organs instead of stamens. Similar results were obtained for the other floral organs. In contrast, ag and ap3 mutants form petal (AP1/SEP/AP3/PI) and carpel (AG/SEP) complexes in the third whorl, respectively, which is consistent with the homeotic mutant phenotypes that are described in the literature [Bowman et al., 1989; Yanofsky et al., 1990; Jack et al., 1992]. Network robustness and response to stochastic oscillations The response of the network to stochastic oscillations was measured by setting the value of independent entities to oscillate at random within a given interval during the simulation time. It was observed that the requirement of protein complexes as end regulators of the network leads to dampening of noise and to a more stable concentration of the complexes themselves. As a result, the regulatory effect of these
290
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
Fig. 6. Protein complexes mediate dampening of stochastic oscillations. By setting the value of a given entity to a random number, the overall response of the network to stochastic oscillations is observed. In this case the value of a given entity is set to oscillate at random between 0 and 1 under the given initial conditions. The only cases found to be sensitive to these oscillations are the random activation of AP2, which does not form a protein complex under sepal and stamen initial conditions, a situation which disrupts expression patterns.
oscillations on target entities is also reduced. This effect arises if one of the partners of a dimer complex has an aberrant expression, but the other remains under normal control, the concentration of the dimer will show an oscillation whose magnitude is reduced with respect to the monomer. This dampening effect is even more pronounced for higher-order complexes. The corresponding variation of the concentration of the final transcription factor complexes is found to be about one fifth of the variation of the chosen single monomer. Changes in organ-specific steady states of gene expression were observed when random noise on the AP2 protein concentration was introduced. Under the initial conditions for sepal development, AG was transiently activated because the concentration of AP2 reached a value under the activity threshold that is required for inhibition of AG. AG, in turn, inhibited the production of AP1 which resulted in the disappearance of the sepal complex (Fig. 6). In the case of petal and stamen conditions, the petal complex is formed at relatively low concentration. Here the petal complex further inhibits the expression of AG, resulting in an indirect activation effect on AP1 which, while it does not recover the expression pattern of the original model, allows for sustained AP1 production and a low level formation of the petal complex SEP/AP1/AP3/PI. It is important to note that if a basal level of AP2 protein expression is kept, the inhibitory effect of stochastic fluctuations of AP2 on sepal development is absent. Stochastic fluctuation of AG leads to defects in organ-specific protein complexes: the stamen complex is formed instead of the petal complex and the carpel complex instead of the sepal complex, leading to floral homeotic conversions. The simulation thus mimics a situation in which AG is ectopically expressed. Notably, the sensitivity of petal and sepal steady-states of gene expression are in line with data suggesting that AG is repressed by multiple mechanisms in developing sepals and petals, which have for sake of focus and simplicity not been included in our model [see, e.g., Krizek et al., 2000; Bao et al., 2004]. The role of transcription factor complexes in stabilizing network dynamics According to the ‘floral quartet model’ [Theissen and Saedler, 2001], floral homeotic proteins belonging to the MADS-box family assemble in a combinatorial fashion into organ-specific, higher-order protein complexes. In order to test whether heteromeric protein complexes could play a role in the stabilization
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
291
Fig. 7. Instability of the stamen network assuming simple feedback. The presence of feedback loops of AP3/PI and AG to their respective own genes and the fact that these assemble into a higher-order complex makes the network extremely sensitive to differences in the production of the monomers. A) assuming standard parameters, the production of AG is slightly higher than that of AP3/PI, the sequestering of AP3/PI by the SEP/AG complex leads to the collapse of its expression, causing a final expression pattern corresponding to the carpel, with the accumulation of the SEP/AG complex. B) If the speed of AP3/PI production is adjusted to be higher than that of AG, the expression of AG collapses leading to the accumulation of AP3/PI.
and robustness of the regulatory network, an attempt was made to rebuild the whole network so that the autoregulatory links we redirected to start at their single monomers instead of protein complexes. In this case the complexes were still produced as end products but they played no further role in regulation within the network. Simulations were made following the same set of starting conditions as in the previous model. While the correct steady state expression profiles for sepal, petal and carpel were observed (data not shown), stable expression of the stamen complex was unattainable (Fig. 7). This was traced to the presence of two separate positive autoregulatory feedback loops of the AP3/PI complex and AG acting on the expression of their respective genes. Since AP3/PI and SEP/AG heterodimers assemble into the higher-order SEP/AP3/PI/AG complex, the concentration of any free heterodimer depends on the concentration of the other heterodimer. For instance, if the production of AG is higher than that of AP3 and PI, the SEP/AG dimer will bind to all the AP3/PI dimers available to form the higher-order complex, and since AP3 and PI are positively autoregulated, the titration of the AP3/PI complex by SEP/AG will lead to a decrease and eventual halt of the transcriptional activation of AP3 and PI. This decrease in AP3 and PI transcription causes formation of the carpel complex SEP/AG. On the other hand, if the production speed of AP3 and PI is higher than that of AG, free AG, which under this assumption activates its own expression, will be sequestered by SEP/AP3/PI and will be unable to activate its own expression, leading to the loss of the stamen complex (Fig. 7). In this case feedback by the higher-order complex, not by the single monomers, is a prerequisite in order to maintain a stable expression of the steady-state transcription factor complex. Extension of the network: Feedback loops controlling spatial regulators Recently published ChIP-seq data [Kaufmann et al., 2009] supports the idea that some of the spatial patterning genes might be under feedback control from MADS-box transcription factor complexes. In particular, binding of SEP3 to the promoters of AP2 and three miR172 orthologs suggest that SEP3 complexes (e.g. with AP1) could stabilize the expression of AP2 in the outer whorls and that other SEP3 complexes might stabilize miR172 expression in the inner whorls. In order to test the feasibility
292
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
of such potential regulatory interactions, we extended the network to include regulatory reactions from the sepal and petal MADS complexes to activate the transcription of AP2 and from the stamen and carpel complexes to activate the transcription of miR172. This change implies that the miR172 entity, which was not playing an active role in the previous simulations, now is included in the modelling dynamics. However, the previous requirements on the initial conditions set to direct the formation of each flower organ, namely, its initial value is set to 1 in stamen and carpel conditions and 0 in sepal and petal, is maintained. Simulations of the extended network suggest that the inclusion of these interactions maintains the correct expression patterns of each floral organ, and may act as redundant control mechanisms to further stabilize the organ-specific gene expression patterns. For example, in the case of the protein AG, the sustained presence of AP2 would insure that the inhibition of AG remains constant until the sepal or petal complexes can be formed (Fig. 8). On the other hand, feedback activation of miR172 allows the stabilization of the spatial boundary between the outer and inner whorls through the translational inhibition of AP2. DISCUSSION A dynamic molecular regulatory network of early flower development The model presented here predicts relative concentrations of mRNA, proteins and protein complexes at the earliest time points of the flower developmental programme, when the identities of the different types of floral organs are specified. We model transitory and final states of expression of the different components in each floral whorl. A limitation of the current model is the lack of experimentally determined quantitative data for estimating the production and degradation speeds of the different components, allowing us to use only generic estimates. However, we find that our model, which is based on (qualitative) genetic as well as molecular evidence, is capable of reproducing current knowledge on the timing of expression and organ-specific complex formation of transcription factors. The model is also able to correctly reproduce the outcome of different loss-of function mutants, suggesting that model simulations can be used to formulate biological hypotheses which subsequently can be experimentally tested. Robustness through interaction: the role of heteromeric protein complexes in network stability MADS-domain proteins form an “intrafamily” protein-protein interaction network ([de Folter et al., 2005]; reviewed in [Kaufmann et al., 2005]). This network evolved by a series of duplication events, associated with rounds of whole-genome duplication [Veron et al., 2007]. The family presumably originated from a homodimerizing ancestral protein which was present in green algae [Tanabe et al., 2005]. Autoregulation and positive regulation by interaction partners are common features among MADS-box transcription factors, with the most well-known examples characterized for floral homeotic proteins, which were analyzed in our model. In order to analyze the role of higher-order MADS protein complexes in autoregulation, we designed the model in a manner that allows for autoregulation by heteromeric complexes as opposed to single proteins. We find that allowing autoregulation by single proteins can destabilize the network under certain conditions. This is especially the case when the formation of a higher-order complex relies mostly on two independently regulated heterodimers. If additional factors stabilize the expression of both heterodimers, the correct organ-type specific higher-order complex may still be formed. This
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
293
Fig. 8. Inclusion of feedback regulatory reactions on AP2 and miR172. A) SEP3 binding patterns at 3 miR172 loci as revealed by ChIP-seq. The ChIP-seq peaks are shown in the upper panel in each figure, with genomic loci indicated beneath. In all cases, the main peak is downstream of and close to the locus. B) Potential regulatory interactions of different complexes on AP2 and miR172 loci which were added to our model. For simplification, the 3 different miRNA loci are treated as one entity in our model. C) (Bottom left) A transcriptional activation process was added between the petal complex (AP1/SEP/AP3/PI) and AP2, leading to the sustained expression of AP2 in the petal. The presence of AP2 in sepals and petals would add a redundant repression control on AG. (Bottom right) Activation of miR172 by the stamen complex (AG/SEP3/AP3/PI). The explicit maintenance of miR172 expression by the stamen complex allows the translational inhibition of AP2, and thus stabilizes the expression of AG.
suggests that SEP proteins as mediators of higher-order complex formation are required to integrate and balance the expression of different homeotic proteins, in order to produce stable floral structures. Experimental quantification of differences in protein interaction preferences and DNA-binding affinity, as well as specific levels of activation by certain proteins/complexes, would help to clarify the role of
294
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
these complexes at the regulatory level. Evidence for the idea that regulatory circuits of some floral organs may be more sensitive to loss of higher-order complex formation than others comes from the pistillata-5 (pi-5) mutant, in which the formation of the AP3/PI/SEP3 complex is reduced due to a mutation in the protein interaction domain of the PISTILLATA protein. The pi-5 mutant shows floral homeotic conversion of petals into sepals, but has no defect in stamen identity [Yang et al., 2003]. In addition, our model supports the idea that different higher-order complexes coexist, at least transiently, and it also proposes that higher-order complexes and heterodimers may coexist. Different DNA binding affinities of MADS-domain proteins would allow for the co-occurrence of different complexes without disrupting the expression patterns of the target genes. This becomes relevant since under petal and stamen conditions, which depend upon the formation of the heterotetramer complexes SEP/AP1/AP3/PI and SEP/AP3/PI/AG, the formation of sepal and carpel complexes, SEP/AP1 and SEP/AG respectively, cannot be ruled out. A goal of future research should be to obtain more molecular in planta support as well as quantitative data for the physical interactions that are present in our model, in order improve the modelling of the flowering process. Feedback on spatial regulators Recent evidence from whole-genome approaches to identify DNA-binding sites of transcription factors in vivo (ChIP-seq and ChIP-CHIP) suggests a multitude of direct cross-talk between different transcriptional regulators and feedback/feedforward loops. One example is the binding of SEP3 complexes to the promoter of AP2 and the miRNA loci which negatively regulate AP2. These findings suggest the presence of feedback loops acting on the spatial regulator AP2, in addition to yet unknown processes that achieve the early activation of AP2 and its miRNA repressors. The result of our stochastic response simulations on AP2, the only entity that disrupted the network’s expression pattern, points to the fact that in the current model the regulation of AP2 is rather simple, in contrast to almost every other entity. Given the central role of AP2 in spatial patterning, it is likely that more complex elements and interactions are involved in the transcriptional control of the AP2 gene. Complex feedback mechanisms involving miRNA loci were recently also described for the control of floral transition [Wu, G., et al., 2009], and may play a general role in developmental transitions and pattern formation. Which other processes might control the expression of spatial regulators remains an open question. In animals, concentration gradients of morphogens are important to set developmental pre-patterns. In plants, hormones like auxin have been suggested to act in similar fashion to morphogens in animals, however a role of this hormone in orchestrating floral homeotic gene expression has not yet been demonstrated. Thus a major challenge in the future will be to unravel the upstream processes driving the expression of spatial regulators of floral homeotic genes. ACKNOWLEDGEMENTS This work was supported by the European Union FP6 Marie Curie training network grant TRANSISTOR. We thankfully acknowledge Raymond DiDonato (BIOBASE Corp., Beverly/MA) for useful comments on the manuscript. REFERENCES • Bao, X., Franks, R. G., Levin, J. Z. and Liu, Z. (2004). Repression of AGAMOUS by BELLRINGER in floral and inflorescence meristems. Plant Cell 16, 1478-1489.
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
295
• Bomblies, K., Dagenais, N. and Weigel, D. (1999). Redundant enhancers mediate transcriptional repression of AGAMOUS by APETALA2. Dev. Biol. 216, 260-264. • Bowman, J. L., Smyth, D. R. and Meyerowitz, E. M. (1989). Genes directing flower development in Arabidopsis. Plant Cell 1, 37-52. • Bowman, J. L., Sakai, H., Jack, T., Weigel, D., Mayer, U. and Meyerowitz, E. M. (1992). SUPERMAN, a regulator of floral homeotic genes in Arabidopsis. Development 114, 599-615. • Busch, M. A., Bomblies, K. and Weigel, D. (1999). Activation of a floral homeotic gene in Arabidopsis. Science 285, 585-587. • Castillejo, C., Romera-Branchat, M. and Pelaz, S. (2005). A new role of the Arabidopsis SEPALLATA3 gene revealed by its constitutive expression. Plant J. 43, 586-596. • Chae, E., Tan, Q. K.-G., Hill, T. A. and Irish, V. F. (2008). An Arabidopsis F-box protein acts as a transcriptional co-factor to regulate floral development. Development 135, 1235-1245. • Chen, X. (2004). A microRNA as a translational repressor of APETALA2 in Arabidopsis flower development. Science 303, 2022-2025. • de Folter, S., Immink, R. G. H., Kieffer, M., Paˇrenicov´a L., Henz, S. R., Weigel, D., Busscher, M., Kooiker, M., Colombo, L., Kater, M. M., Davies, B. and Angenent, G. C. (2005). Comprehensive interaction map of the Arabidopsis MADS box transcription factors. Plant Cell 17, 1424-1433. • Ditta, G., Pinyopich, A., Robles, P., Pelaz, S. and Yanofsky, M. F. (2004). The SEP4 gene of Arabidopsis thaliana functions in floral organ and meristem identity. Curr. Biol. 14, 1935-1940. • Doi, A., Fujita, S., Matsuno, H., Nagasaki, M. and Miyano, S. (2004). Constructing biological pathway models with hybrid functional Petri nets. In Silico Biol 4, 0023. • Drews, G. N., Bowman, J. L. and Meyerowitz, E. M. (1991). Negative regulation of the Arabidopsis homeotic gene AGAMOUS by the APETALA2 product. Cell 65, 991-1002. • Espinosa-Soto, C., Padilla-Longoria, P. and Alvarez-Buylla, E. R. (2004). A gene regulatory network model for cell-fate determination during Arabidopsis thaliana flower development that is robust and recovers experimental gene expression profiles. Plant Cell 16, 2923-2939. • Farnham, P. J. (2009). Insights from genomic profiling of transcription factors. Nat. Rev. Genet. 10, 605-616. • G`omez-Mena, C., de Folter, S., Costa, M. M. R., Angenent, G. C. and Sablowski, R. (2005). Transcriptional program controlled by the floral homeotic gene AGAMOUS during early organogenesis. Development 132, 429-438. • Gustafson-Brown, C., Savidge, B. and Yanofsky, M. F. (1994). Regulation of the Arabidopsis floral homeotic gene APETALA1. Cell 76, 131-143. • Hardy, S. and Robillard, P. N. (2008). Petri net-based method for the analysis of the dynamics of signal propagation in signaling pathways. Bioinformatics 24, 209-217. • Hill, T. A., Day, C. D., Zondlo, S. C., Thackeray, A. G. and Irish, V. F. (1998). Discrete spatial and temporal cis-acting elements regulate transcription of the Arabidopsis floral homeotic gene APETALA3. Development 125, 1711-1721. • Hofest¨adt, R. and Thelen, S. (1998). Quantitative modelling of biochemical networks. In Silico Biol. 1, 0006. • Honma, T. and Goto, K. (2000). The Arabidopsis floral homeotic gene PISTILLATA is regulated by discrete cis-elements responsive to induction and maintenance signals. Development 127, 2021-2030. • Honma, T. and Goto, K. (2001). Complexes of MADS-box proteins are sufficient to convert leaves into floral organs. Nature 409, 525-529. • Huala, E. and Sussex, I. M. (1992). LEAFY interacts with floral homeotic genes to regulate Arabidopsis floral development. Plant Cell 4, 901-913. • Immink, R. G. H., Tonaco, I. A. N., de Folter, S., Shchennikova, A., van Dijk, A. D. J., Busscher-Lange, J., Borst, J. W. and Angenent, G. C. (2009). SEPALLATA3: the ‘glue’ for MADS box transcription factor complex formation. Genome Biol. 10, R24. • Jack, T., Brockman, L. L. and Meyerowitz, E. M. (1992). The homeotic gene APETALA3 of Arabidopsis thaliana encodes a MADS box and is expressed in petals and stamens. Cell 68, 683-697. • Jack, T., Fox, G. L. and Meyerowitz, E. M. (1994). Arabidopsis homeotic gene APETALA3 ectopic expression: transcriptional and posttranscriptional regulation determine floral organ identity. Cell 76, 703-716. • Jeong, E., Nagasaki, M., Saito, A. and Miyano, S., (2007). Cell system ontology: representation for modelling, visualizing, and simulating biological pathways. In Silico Biol. 7, 623-638. • Kaufmann, K., Melzer, R. and Theissen, G. (2005). MIKC-type MADS-domain proteins: structural modularity, protein interactions and network evolution in land plants. Gene 347, 183-198. • Kaufmann, K., Mui˜no, J. M., Jauregui, R., Airoldi, C. A., Smaczniak, C., Krajewski, P. and Angenent, G. C. (2009). Target genes of the MADS transcription factor SEPALLATA3: integration of developmental and hormonal pathways in the Arabidopsis flower. PLoS Biol. 7, e1000090. • Kempin, S. A., Savidge, B. and Yanofsky, M. F. (1995). Molecular basis of the cauliflower phenotype in Arabidopsis. Science 267, 522-525.
296
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
• Koh, G., Teong, H. F. C., Cl´ement, M.-V., Hsu, D. and Thiagarajan, P. S. (2006). A decompositional approach to parameter estimation in pathway modelling; a case study of the Akt and MAPK pathways and their crosstalk. Bioinformatics 22, e271-e280. • Krizek, B. A. and Meyerowitz, E. M. (1996). Mapping the protein regions responsible for the functional specificities of the Arabidopsis MADS domain organ-identity proteins. Proc. Natl. Acad. Sci. USA 93, 4063-4070. • Krizek, B. A. and Fletcher, J. C. (2005). Molecular mechanisms of flower development: An armchair guide. Nat. Rev. Genet. 6, 688-698. • Krizek, B. A., Prost, V. and Macias, A. (2000). AINTEGUMENTA promotes petal identity and acts as a negative regulator of AGAMOUS. Plant Cell 12, 1357-1366. • Lee, I., Wolfe, D. S., Nilsson, O. and Weigel, D. (1997). A LEAFY co-regulator encoded by UNUSUAL FLORAL ORGANS. Curr. Biol. 7, 95-104. • Lenhard, M., Bohnert, A., J¨urgens, G. and Laux, T. (2001). Termination of stem cell maintenance in Arabidopsis floral meristems by interactions between WUSCHEL and AGAMOUS. Cell 105, 805-814. • Li, X.-Y., MacArthur, S., Bourgon, R., Nix, D., Pollard, D. A., Iyer, V. N., Hechmer, A., Simirenko, L., Stapleton, M., Luengo Hendriks, C. L., Chu H. C., Ogawa, N., Inwood, W., Sementchenko, V., Beaton, A., Weiszmann, R., Celniker, S. E., Knowles, D. W., Gingeras, T., Speed, T. P., Eisen, M. B. and Biggin, M. D. (2008). Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm. PLoS Biol. 6, e27. • Liljegren, S. J., Gustafson-Brown, C., Pinyopich, A., Ditta, G. S. and Yanofsky, M. F. (1999). Interactions among APETALA1, LEAFY, and TERMINAL FLOWER1 specify meristem fate. Plant Cell 11, 1007-1018. • Lohmann, J. U., Hong, R. L., Hobe, M., Busch, M. A., Parcy, F., Simon, R. and Weigel, D. (2001). A molecular link between stem cell regulation and floral patterning in Arabidopsis. Cell 105, 793-803. • Mandel, M. A., Gustafsonbrown, C., Savidge, B. and Yanofsky, M. F. (1992). Molecular characterization of the Arabidopsis floral homeotic gene APETALA1. Nature 360, 273-277. • Matsuno, H., Doi, A., Nagasaki, M. and Miyano, S. (2000) Hybrid Petri net representation of gene regulatory network. Pac. Symp. Biocomput. 5, 338-349. • Matsuno, H., Tanaka, Y., Aoshima, H., Doi, A., Matsui, M. and Miyano, S. (2003). Biopathways representation and simulation on hybrid functional Petri net. In Silico Biol. 3, 0032. • Melzer, R., Kaufmann, K. and Theißen, G. (2006). Missing links: DNA-binding and target gene specificity of floral homeotic proteins. In: Advances in Botanical Research, Vo. 44, Developmental genetics of the flower, Soltis, D., Soltis P. and Leebens-Mack, J. H. (eds.), Academic Press. pp. 209-236. • Mendoza, L. and Alvarez-Buylla, E. R. (1998). Dynamics of the genetic regulatory network for Arabidopsis thaliana flower morphogenesis. J. Theor. Biol. 193, 307-319. • Nagasaki, M., Doi, A., Matsuno, H. and Miyano, S. (2003). Genomic Object Net: I. A platform for modelling and simulating biopathways. Appl. Bioinformatics 2, 181-184. • Nagasaki, M., Doi, A., Matsuno, H. and Miyano, S. (2004). A versatile petri net based architecture for modelling and simulation of complex biological processes. Genome Inform. 15, 180-197. • Nagasaki, M., Doi, A., Matsuno, H. and Miyano, S. (2005) Computational modeling of biological processes with Petri net-based architecture, Bioinformatics Technologies, Springer, Berlin Heidelberg, pp. 179-242. • Nagasaki, M., Doi, A., Matsuno, H. and Miyano, S. (2009a). Foundations of Systems Biology – Using Cell Illustrator and Pathway Databases. Springer, Berlin Heidelberg. • Nagasaki, M., Saito A. and Miyano, S. (2009b). Cell Illustrator 4.0: A Computational Platform for Systems Biology. In Silico Biology 10, 0002. • Oh, E., Kang, H., Yamaguchi, S., Park, J., Lee, D., Kamiya, Y. and Choi, G. (2009). Genome-wide analysis of genes targeted by PHYTOCHROME INTERACTING FACTOR 3-LIKE5 during seed germination in Arabidopsis. Plant Cell 21, 403-419. • Pelaz, S., Ditta, G. S., Baumann, E., Wisman, E. and Yanofsky, M. F. (2000). B and C floral organ identity functions require SEPALLATA MADS-box genes. Nature 405, 200-203. • Pelaz, S., Gustafson-Brown, C., Kohalmi, S. E., Crosby, W. L. and Yanofsky, M. F. (2001). APETALA1 and SEPALLATA3 interact to promote flower development. Plant J. 26, 385-394. • Peterson, J. L. (1981) Petri Net Theory and the Modelling of Systems. Prentice Hall. • Petri, C. A. (1962). Fundamentals of a theory of asynchronous information flow. IFIP Congress 1962, 386-390. • Putterill, J., Laurie, R. and Macknight, R. (2004). It’s time to flower: the genetic control of flowering time. Bioessays 26, 363-373. • Reddy, V. N., Mavrovouniotis, M. L. and Liebman, M. N. (1993). Petri net representations in metabolic pathways. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1, 328-336. • Sato, Y., Hashiguchi, Y. and Nishida, M. (2009). Evolution of multiple phosphodiesterase isoforms in stickleback involved in cAMP signal transduction pathway. BMC Syst. Biol. 3, 23.
K. Kaufmann et al. / Modelling the Molecular Interactions in the Flower Developmental Network
297
• Schmid, M., Uhlenhaut, N. H., Godard, F., Demar, M., Bressan, R., Weigel, D. and Lohmann, J. U. (2003). Dissection of floral induction pathways using global expression analysis. Development 130, 6001-6012. • Sun, N. and Zhao, H. (2004). Genomic approaches in dissecting complex biological pathways. Pharmacogenomics 5, 163-179. • Tanabe, Y., Hasebe, M., Sekimoto, H., Nishiyama, T., Kitani, M., Henschel, K., M¨unster, T., Theissen, G., Nozaki, H. and Ito, M. (2005). Characterization of MADS-box genes in charophycean green algae and its implication for the evolution of MADS-box genes. Proc. Natl. Acad. Sci. USA 102, 2436-2441. • Theißen, G. and Saedler, H. (2001). Plant biology. Floral quartets. Nature 409, 469-471. • Tilly, J. J., Allen, D. W. and Jack, T. (1998). The CArG boxes in the promoter of the Arabidopsis floral organ identity gene APETALA3 mediate diverse regulatory effects. Development 125, 1647-1657. • Troncale, S., Tahi, F., Campard, D., Vannier, J.-P. and Guespin, J. (2006). Modelling and simulation with Hybrid Functional Petri Nets of the role of interleukin-6 in human early haematopoiesis. Pac. Symp. Biocomput. 11, 427-438. • Urbanus, S. L., de Folter, S., Shchennikova, A. V., Kaufmann, K., Immink, R. G. H. and Angenent, G. C. (2009). In planta localisation patterns of MADS domain proteins during floral development in Arabidopsis thaliana. BMC Plant Biol. 9, 5. • Veron, A. S., Kaufmann, K. and Bornberg-Bauer, E. (2007). Evidence of interaction network evolution by whole-genome duplications: A case study in MADS-box proteins. Mol. Biol. Evol. 24, 670-678. • Walhout, A. J. M. (2006). Unraveling transcription regulatory networks by protein-DNA and protein-protein interaction mapping. Genome Res. 16, 1445-1454. • Weigel, D. and Meyerowitz, E. M. (1993). Activation of Floral Homeotic Genes in Arabidopsis. Science 261, 1723-1726. • Weigel, D., Alvarez, J., Smyth, D. R., Yanofsky, M. F. and Meyerowitz, E. M. (1992). LEAFY controls floral meristem identity in Arabidopsis. Cell 69, 843-859. • Wellmer, F., Alves-Ferreira, M., Dubois, A., Riechmann, J. L. and Meyerowitz, E. M. (2006). Genome-wide analysis of gene expression during early Arabidopsis flower development. PLoS Genet. 2, e117. • Wilkinson, M. D. and Haughn, G. W. (1995). UNUSUAL FLORAL ORGANS Controls Meristem Identity and Organ Primordia Fate in Arabidopsis. Plant Cell 7, 1485-1499. • Wu, G., Park, M. Y., Conway, S. R., Wang, J. W., Weigel, D. and Poethig, R. S. (2009) The sequential action of miR156 and miR172 regulates developmental timing in Arabidopsis. Cell 138, 750-9. • Yang, Y., Xiang, H. and Jack, T. (2003). pistillata-5, an Arabidopsis B class mutant with strong defects in petal but not in stamen development. Plant J. 33, 177-188. • Yanofsky, M. F., Ma, H., Bowman, J. L., Drews, G. N., Feldmann, K. A. and Meyerowitz, E. M. (1990). The protein encoded by the Arabidopsis homeotic gene agamous resembles transcription factors. Nature 346, 35-9. • Yun, J.-Y., Weigel, D. and Lee, I. (2002). Ectopic expression of SUPERMAN suppresses development of petals and stamens. Plant Cell Physiol. 43, 52-57.
This page intentionally left blank
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved.
299
Subject Index alternative splicing amphetamine apoptosis Arabidopsis thaliana biochemical networks biochemical system theory (BST) bioinformatics biological network editor biological pathway biopathway cell cycle Cell Illustrator cell-to-cell communication cellular rhythm CI circadian rhythms conflict resolution CSML CSO database integration delay dopamine signaling dynamic modeling dynamical model elementary flux mode elementary mode firing delay time flower development flower morphogenesis gene expression gene network gene regulation gene regulatory network Genomic Object Net graph theory HFPN high-level Petri net hybrid functional Petri net
244 222 77 143 3, 113 222 3 182 92, 130 160 236 160 182 182 160 77 204 160 160 182 222 222 182 279 17 56 204 279 143 236 279 38 143 77, 92 113 222 56 92, 130
hybrid modeling 222 incidence matrix 17 interleukin-1 (IL-1) signaling pathway 204 Java 160 JWS 160 knowledge representation 122 lac operon 92 MCTS 244 metabolic network 17, 38, 113 metabolic pathway 56 modeling 77, 160 modeling and simulation 3, 38 mRNA gestation 236 mRNA senescence 236 noise 236 ODE 160 ontology 122, 160 p53 130 Parkinson’s disease 222 path search with constraints 113 pathway analysis 244 pathway database 160 Petri net theory 244 Petri net 17, 38, 77, 113, 122, 143, 160, 182, 204, 222 petrinets 3 P-invariant 17 potato tuber 113 Presburger arithmetic 143 quantitative model 38 quorum sensing 182 regulated splicing 244 schizophrenia 222 signal transduction networks 113, 244 signaling pathway 122 simulation 77, 92, 130, 160, 182 S-invariant 56 spliceosome 244
300
stationary state steady state stochastic decision rule stochastic Petri nets stochasticity sucrose-to-starch breakdown
143 56 204 236 222 113
systems biology T-clusters T-invariant urea cycle VANESA yeast
113, 236 244 17, 56, 244 38 182 236
Biological Petri Nets E. Wingender (Ed.) IOS Press, 2011 © 2011 The authors, Bioinformation Systems e.V. and IOS Press. All rights reserved.
301
Author Index Aoshima, H. Borck, D. Bortfeldt, R.H. Chen, M. Csikász-Nagy, A. Doi, A. Friesen, R. Fujita, S. Gambin, A. Ge, Q.-W. Haugen, P. Heiner, M. Hippe, K. Hofestädt, R. Ikeda, E. Janowski, S. Jáuregui, R. Jeong, E. Kaufmann, K. Koch, I. Kojima, K. Kormeier, B. Lasota, S.
77 182 244 38, 182 236 77, 92, 130 182 92 143 204 182 56, 113 182 1, 3, 38, 182 160 182 279 160 279 56, 113, 244 160 182 143
Li, C. Matsui, M. Matsuno, H. Miwa, Y. Miyano, S. Mura, I. Nagasaki, M. Qi, Z. Rubert, S. Rutkowski, M. Saito, A. Schüler, M. Schuster, S. Takai-Igarashi, T. Tanaka, Y. Thelen, S. Töpel, T. Voit, E.O. Voss, K. Willassen, N. Wingender, E. Wu, J. Zevedei-Oancea, I.
160, 204 77 77, 92, 130, 204 204 77, 92, 130, 160, 204 236 92, 130, 160, 279 222 182 143 160 113 17, 244 122 77 3 182 222 56 182 vii 222 17